hierarchical microarray clustering

Created: 2008-12-05 18:31:53 Last updated: 2008-12-05 20:33:37

Download Workflow

To illustrate our caGrid plug-in’s application, we tested it with a microarray hierarchical clustering workflow that involves services hosted at multiple institutions.
Microarrays are a high-throughput technology used to measure the expression of tens of thousands of genes in different tissues or cells. Scientists represent the data from each microarray via a vector (profile) in which each element represents a gene’s expression level. They use clustering analysis to identify similar expression profiles across genes or samples.10 In particular, hierarchical clustering is popular for grouping microarrays into a multilevel hierarchy in which, at each level, arrays in the same cluster are more similar to each other than those in different clusters. To cluster data, the user must identify and retrieve relevant microarrays, preprocess them, and then invoke the hierarchical clustering program. In the past, we might have programmed this sequence of steps using a scripting language such as Perl. Instead, we use Taverna and the caGrid plug-in to identify relevant services, compose those services with additional building blocks (for data transformation), and orchestrate their execution. Our workflow involves three major steps:
1.    Identify and retrieve the microarray data of interest. We used CQL, the query language that caGrid Data Services uses, to specify this data and retrieve it from a caArray data service hosted at Columbia University. (http://cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/CaArrayScrub)
2.    Preprocess, or normalize, the microarray data before clustering them. We used a GenePattern analytical service (http://node255.broad.mit.edu:6060/wsrf/services/cagrid/PreprocessDatasetMAGEService), which provides normalization, floor and ceiling thresholding, variation filtering, and other preprocessing functions. We used an instance of this service hosted at MIT’s Broad Institute.
3.    Run hierarchical clustering on the preprocessed data. We invoked the geWorkbench analytical service Columbia University hosts. (http://cagridnode.c2b2.columbia.edu:8080/wsrf/services/cagrid/HierarchicalClusteringMage).

The Taverna workflow contains an input processor to store the CQL expression, an output processor to store the clustered microarray data (both input and output processors are blue), three caGrid processors (green) representing the three caGrid services just listed, and a few “shim” processors, such as XML splitters and beanshell scripts, to deal with data transformation between services.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/599/download?version=1
[ More Info Expand ]

Workflow Components

Inputs (1)

Name	Description
CQL File Name

Processors (15)

Name	Type	Description
Extract_elements_from_a_list	local
Merge_string_list_to_string	local
Beanshell_scripting_host	beanshell
Beanshell_scripting_host1	beanshell
parametersXML1	local
parametersXML	local
HierarchicalClusteringParameterXML1	local
PreprocessDatasetParameterSetXML1	local
preprocessDatasetParameterSetXML	local
hierarchicalClusteringParameterXML	local
parametersXML4	local
parametersXML3	local
execute	arbitrarywsdl
execute2	arbitrarywsdl
performAnalysis	arbitrarywsdl

Beanshells (2)

Name	Description	Inputs	Outputs
Beanshell_scripting_host		input	output
Beanshell_scripting_host1		input	output

Outputs (1)

Name	Description
output

Links (16)

Source	Sink
Beanshell_scripting_host1:output	parametersXML3:bioAssay
Beanshell_scripting_host:output	parametersXML4:bioAssay
CQL File Name	parametersXML:cqlFile
Extract_elements_from_a_list:outputlist	Merge_string_list_to_string:stringlist
HierarchicalClusteringParameterXML1:output	hierarchicalClusteringParameterXML:HierarchicalClusteringParameter
Merge_string_list_to_string:concatenated	Beanshell_scripting_host1:input
PreprocessDatasetParameterSetXML1:output	preprocessDatasetParameterSetXML:PreprocessDatasetParameterSet
execute:parameters	Beanshell_scripting_host:input
hierarchicalClusteringParameterXML:output	parametersXML3:hierarchicalClusteringParameter
parametersXML1:BioAssay	Extract_elements_from_a_list:inputlist
parametersXML3:output	execute2:parameters
parametersXML4:output	performAnalysis:parameters
parametersXML:output	execute:parameters
performAnalysis:parameters	parametersXML1:input
preprocessDatasetParameterSetXML:output	parametersXML4:preprocessDatasetParameterSet
execute2:parameters	output

Coordinations (0)

Information Workflow Type

Taverna 1

Information Uploader

Wei Tan

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (1)

(People/Groups)

Wei Tan

Information Attributions (0)

(Workflows/Files)

None

Information Tags (8)

Uploader tags

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

2526 viewings

2053 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

hierarchical microarray clustering

Created by Wei Tan on Friday 05 December 2008 18:31:53 (UTC)

Last edited by Wei Tan on Friday 05 December 2008 20:33:39 (UTC)

Reviews (0)

No reviews yet

Be the first to review!

Comments (0)

View Timeline

No comments yet

Log in to make a comment

Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.