This workflow uses one or more services that are deprecated as of 31st December 2012 (over 12 years ago), and may no longer function. Show details...

Create_SNP_Set

Created: 2012-08-21 01:50:23

Download Workflow

The purpose of the workflow is to determine SNPs in the vicinity of the genes and create a SNP set for a given set of genes. The user has the freedom to choose the flanking width around the gene for determining the SNPs. The input is in the form of entrez gene ids. Biomart services are used to determine the chromosome and position of the gene as well as determining Affy gene chip 6k ids. The final report is stored as a tab-delimited text file with Affy 6 gene chip ids for the SNP and Kegg info for the gene that it is associated with.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/3108/download?version=1
[ More Info Expand ]

Workflow Components

Authors (1)

Titles (1)

Descriptions (1)

Dependencies (0)

Inputs (3)

Name	Description
Entrez_ID	This takes an input of a list of genes in the form of entrez gene ids.
set_width	The allows the user to set the flanking width for the gene for determining SNPs
path_to_output_file	This takes the input of the local directory where the user wants to store the output result of the workflow.

Processors (12)

Name	Type	Description
Split_string_into_string_list_by_regular_expression	localworker	This service splits the string value into a string list at every occurrence of the specified regular expression. The regular expression provided in this case is a new line: "\n" Script List split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } }
regex_value	stringconstant	Value \n
hsapiens_gene_ensembl	biomart	This is a Biomart service that takes the entrez gene id as an input and returns the chromosome name and start and end position of the gene.
Set_width	beanshell	The flanking region around the gene is calculated in this beanshell. The input is provided by the user. Script import java.util.*; List tmp_end = new ArrayList(); List tmp_start = new ArrayList(); width = Integer.parseInt(in1.get(0)); int value_end=0; int value_start=0; int out_end=0; int out_start=0; for(int i=0; i < end_in.size(); i++) { value_end = Integer.parseInt(end_in.get(i)); value_start = Integer.parseInt(start_in.get(i)); if (value_start < value_end ) { out_end = value_end + width; out_start = value_start - width; } else { out_start = value_start + width; out_end = value_end - width; } tmp_end.add(out_end); tmp_start.add(out_start); } end_out = tmp_end; start_out = tmp_start; chr_out = chr_in;
Convert_to_Kegg_id	beanshell	The input to btit (which provides gene information from the kegg database) is a Kegg gene id. To convert entrez gene ids to Kegg ids "hsa:" is concatenated to the entrez gene id. Script out1="hsa:"+in1;
btit	wsdl	The input for this service is Kegg gene id in the form of (for example) hsa:1234. It returns the gene name and definitions of the given entry id. Wsdl http://soap.genome.jp/KEGG.wsdl Wsdl Operation btit
hsapiens_snp	biomart	This is a Biomart service that takes the chromosome name, and start and end positions and finds all the Affy genechip 6k SNP ids present in the region. The Affy6 ids can easily be changed to any other desired gene chip id like Illumina chip ids or other version of the Affymetrix gene chip.
Flatten_List_2	localworker	This service flattens the inputlist by one level. It returns the result of the flattening. Script flatten(inputs, outputs, depth) { for (i = inputs.iterator(); i.hasNext();) { element = i.next(); if (element instanceof Collection && depth > 0) { flatten(element, outputs, depth - 1); } else { outputs.add(element); } } } outputlist = new ArrayList(); flatten(inputlist, outputlist, 1);
Create_Final_Report	beanshell	This aligns the SNP ID with the associated gene id and gene information in a tab-delimited format. Script import java.util.List; import java.util.ArrayList; List tmp = new ArrayList(); for(int i=0; i < in2.size(); i++) { if(!in2.get(i).toString().equals("")) { tmp.add(in2.get(i).toString() + "\t" + in1); } } result = tmp;
Flatten_List	localworker	This service flattens the inputlist by one level. It returns the result of the flattening. Script flatten(inputs, outputs, depth) { for (i = inputs.iterator(); i.hasNext();) { element = i.next(); if (element instanceof Collection && depth > 0) { flatten(element, outputs, depth - 1); } else { outputs.add(element); } } } outputlist = new ArrayList(); flatten(inputlist, outputlist, 1);
Create_and_populate_temporary_file_2	beanshell	This service creates a temporary file in a local tmp directory. Script File f = File.createTempFile("taverna", ".tmp"); BufferedWriter writer = new BufferedWriter(new FileWriter(f)); writer.write(content); writer.close(); filepath = f.getCanonicalPath();
Concatenate_Files_2	localworker	This service examines the files whose paths or URLs are specified in the filelist. The content of those files is concatenated. Script BufferedReader getReader (String fileUrl) throws IOException { InputStreamReader reader; try { reader = new FileReader(fileUrl); } catch (FileNotFoundException e) { // try a real URL instead URL url = new URL(fileUrl); reader = new InputStreamReader (url.openStream()); } return new BufferedReader(reader); } String NEWLINE = System.getProperty("line.separator"); boolean displayResults = false; if (displayresults != void) { displayResults = Boolean.valueOf(displayresults).booleanValue(); } StringBuffer sb = new StringBuffer(2000); if (outputfile == void) { throw new RuntimeException("The 'outputfile' parameter cannot be null"); } if (filelist == null) { throw new RuntimeException("The 'filelist' parameter cannot be null"); } String str = null; Writer writer = new FileWriter(outputfile); for (int i = 0; i < filelist.size(); i++) { BufferedReader reader = getReader(filelist.get(i)); while ((str = reader.readLine()) != null) { writer.write(str); writer.write(NEWLINE); if (displayResults) { sb.append(str); sb.append(NEWLINE); } } reader.close(); } writer.flush(); writer.close(); if (displayResults) { results= sb.toString(); }

Beanshells (4)

Name	Description	Inputs	Outputs
Set_width	The flanking region around the gene is calculated in this beanshell. The input is provided by the user.	chr_in end_in start_in in1	chr_out end_out start_out
Convert_to_Kegg_id	The input to btit (which provides gene information from the kegg database) is a Kegg gene id. To convert entrez gene ids to Kegg ids "hsa:" is concatenated to the entrez gene id.	in1	out1
Create_Final_Report	This aligns the SNP ID with the associated gene id and gene information in a tab-delimited format.	in1 in2	result
Create_and_populate_temporary_file_2	This service creates a temporary file in a local tmp directory.	content	filepath

Outputs (0)

Datalinks (19)

Source	Sink
Entrez_ID	Split_string_into_string_list_by_regular_expression:string
regex_value:value	Split_string_into_string_list_by_regular_expression:regex
Split_string_into_string_list_by_regular_expression:split	hsapiens_gene_ensembl:hsapiens_gene_ensembl.entrezgene_filter
hsapiens_gene_ensembl:hsapiens_gene_ensembl.chromosome_name	Set_width:chr_in
hsapiens_gene_ensembl:hsapiens_gene_ensembl.end_position	Set_width:end_in
hsapiens_gene_ensembl:hsapiens_gene_ensembl.start_position	Set_width:start_in
set_width	Set_width:in1
Split_string_into_string_list_by_regular_expression:split	Convert_to_Kegg_id:in1
Convert_to_Kegg_id:out1	btit:string
Set_width:chr_out	hsapiens_snp:hsapiens_snp.chr_name_filter
Set_width:end_out	hsapiens_snp:hsapiens_snp.chrom_end_filter
Set_width:start_out	hsapiens_snp:hsapiens_snp.chrom_start_filter
btit:return	Flatten_List_2:inputlist
Flatten_List_2:outputlist	Create_Final_Report:in1
hsapiens_snp:hsapiens_snp.affy6	Create_Final_Report:in2
Create_Final_Report:result	Flatten_List:inputlist
Flatten_List:outputlist	Create_and_populate_temporary_file_2:content
Create_and_populate_temporary_file_2:filepath	Concatenate_Files_2:filelist
path_to_output_file	Concatenate_Files_2:outputfile

Coordinations (0)

Information Workflow Type

Taverna 2

Information Uploader

Harish Dharuri

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (1)

(People/Groups)

Harish Dharuri

Information Attributions (0)

(Workflows/Files)

None

Information Tags (0)

None

Log in to add Tags

Information Shared with Groups (1)

BioSemantics

Information Featured In Packs (1)

A workflow approach to mine pathway databases provides novel biological insight into the genetics of metabolomics data

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

945 viewings

730 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

Create_SNP_Set

Created by Harish Dharuri on Tuesday 21 August 2012 01:50:22 (UTC)

Reviews (0)

No reviews yet

Be the first to review!

Comments (0)

View Timeline

No comments yet

Log in to make a comment

Other workflows that use similar services (93)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.

Taverna 2

Uploader

Alibukhari

NCBI Gi to Kegg Pathways (1)

Download

"This workflow gets a series of information relating to a list of KEGG genes supplied to it. It also removes any null values from a list of strings."This workflow gets a series of information relating to a list of KEGG genes supplied to it. It also removes any null values from a list of strings.

Created: 2011-03-28 | Last updated: 2011-03-28

Credits: Alibukhari

Taverna 1

Uploader

Paul Fisher

Cow-Human Ortholog Pathways and Gene annot... (2)

Download

This workflow searches for genes which reside in a QTL (Quantitative Trait Loci) region in the cow, Bos taurus. The workflow requires an input of: a chromosome name or number; a QTL start base pair position; QTL end base pair position. Data is then extracted from BioMart to annotate each of the genes found in this region. As the Cow genome is currently unfinished, the workflow subsequently maps the cow ensembl gene ids to human orthologues. Entrez and UniProt identifiers are then identified...

Created: 2007-10-03 | Last updated: 2009-12-03

Create_SNP_Set

Preview

Run

Run this Workflow in the Taverna Workbench...

Workflow Components

Script

Value

Script

Script

Wsdl

Wsdl Operation

Script

Script

Script

Script

Script

Reviews (0)

Comments (0)

Other workflows that use similar services (93)