This workflow uses one or more services that
are deprecated as of 31st December 2012
(about 12 years ago), and may no longer function.
Show details...
Affected service WSDL:
- http://soap.genome.jp/KEGG.wsdl
Details:
KEGG will be moving from a WSDL/SOAP interface to REST. Details of the new REST services can be found here.
Working examples that use the new REST service can be viewed here, here and here.
KEGG Pathway Analysis
Created: 2010-03-19 13:46:37
The KEGG pathway analysis of the workflow takes a list of UniProt accession numbers in any of the following formats with the following prefixes:
External database Database prefix
----------------- ---------------
NCBI GI ncbi-gi:
NCBI GeneID ncbi-geneid:
GenBank genbank:
UniGene unigene:
UniProt uniprot:
It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.
A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression:
.{3}:.*
The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (2)
Haakon Berven and Monica Bhaskar. |
Baywatch Solutions |
Titles (1)
Descriptions (6)
Uses workflows made by: Franck Tanoh. Paul Fischer and |
This workflow receives a list of UniProt accession numbers as input. These are then converted to KEGG IDs, which are entered into a separate list in a tabular format. |
The KEGG pathway analysis of the workflow takes a list of UniProt accession Ids in any of the following formats with the following prefixes:External database Database prefix----------------- ---------------NCBI GI ncbi-gi:NCBI GeneID ncbi-geneid:GenBank genbank:UniGene unigene:UniProt uniprot:It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression: .{3}:.* The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange. |
This workflow receives a list of UniProt accession numbers as input. These are then converted to KEGG IDs using the bconv service provided by , which are entered into a separate list in a tabular format. |
Uses workflows made by: Franck Tanoh. Paul Fischer and Michael Gerlich. |
The KEGG pathway analysis of the workflow takes a list of UniProt accession numbers in any of the following formats with the following prefixes:External database Database prefix ----------------- ---------------NCBI GI ncbi-gi:NCBI GeneID ncbi-geneid:GenBank genbank:UniGene unigene:UniProt uniprot:It performs this using the web service bconv, provided by the KEGG database (Kanehisa et al., 2010), described in the KEGG API available at: http://www.genome.jp/kegg/docs/keggapi_manual.html#label:42.A list of KEGG Ids in a tabular format is produced, the first element contains the input ID, the second element is the KEGG ID and the third element is a string confirming the corresponding existence of the proteins in both databases used. This tabular format is then split into three segments using white-space as a regular expression. Each element from each line is then entered into a new separate list. The next step in the workflow is to remove the confirmation string and the NCBI-GI ID, leaving the KEGG ID of the proteins. This is done by using the regular expression: .{3}:.* The get_pathways_by_genes web service from the KEGG database then queries the KEGG database and retrieves the pathways the protein participates in. The mark_pathway_by_objects method is used to mark the input proteins from the filtered list in their respective KEGG pathways found by get_pathways_by_genes. This method then generates a list of URLs as an output. The URLs retrieved corresponds to the images of the KEGG pathways. In these images the target proteins are marked in orange. For this procedure the Get_Image_From_URL method is used. The final output is a list of images with the target proteins in their respective KEGG pathways highlighted in orange. |
Dependencies (0)
Inputs (4)
Name |
Description |
minP |
This input provides the minimum p-value for the parsing of the blast file generated from the search between dissimilar proteins and the drug target database.
|
maxE |
This
This input provides the maximum e-value for the parsing of the blast file generated from the search between dissimilar proteins and the drug target database.
This input provides the maximum e-value for the parsing of the blast file generated
|
blast_output_file |
This input provides the full Blast result from a blast search comparing dissimilar proteins to the drug target database.
This file provides the blast result
|
protein_id_list |
This input provides a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. It feeds this into bconv, which then translates the accession numbers into KEGG IDs in a tabular format.
This is a list UniProt accession numbers generated from the
|
Processors (11)
Name |
Type |
Description |
bconv_2 |
wsdl |
Wsdlhttp://soap.genome.jp/KEGG.wsdlWsdl Operationbconv |
Split_string_into_string_list_by_regular_expression |
localworker |
ScriptList split = new ArrayList();if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); }} |
regex_value |
stringconstant |
Value\s |
Filter_List_of_Strings_by_regex |
localworker |
Scriptfilteredlist = new ArrayList();StringBuffer sb = new StringBuffer();for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); if (item.matches(regex)) { filteredlist.add(item); }} |
regex_value_1 |
stringconstant |
Value.{3}:.* |
get_pathways_by_genes |
wsdl |
Wsdlhttp://soap.genome.jp/KEGG.wsdlWsdl Operationget_pathways_by_genes |
Get_Image_From_URL |
localworker |
ScriptURL inputURL = new URL(url);byte[] contents = new byte[4];if (url == null) return;if (inputURL.openConnection().getContentLength() == -1) { // Content size unknown, must read first... byte[] buffer = new byte[1024]; int bytesRead = 0; int totalBytesRead = 0; InputStream is = inputURL.openStream(); while (bytesRead != -1) { totalBytesRead += bytesRead; bytesRead = is.read(buffer, 0, 1024); } contents = new byte[totalBytesRead];} else { contents = new byte[inputURL.openConnection().getContentLength()];}int bytesRead = 0;int totalBytesRead = 0;InputStream is = inputURL.openStream();while (bytesRead != -1) { bytesRead = is.read(contents, totalBytesRead, contents.length - totalBytesRead); totalBytesRead += bytesRead; if (contents.length==totalBytesRead) break;}image = contents; |
mark_pathway_by_objects |
wsdl |
Wsdlhttp://soap.genome.jp/KEGG.wsdlWsdl Operationmark_pathway_by_objects |
parse_blast_results |
beanshell |
Script// "uniprot:P02745 "// takes a string of \t separated results and \n //String minP = "30.0";//String maxE = "2.0";StringBuffer sb1 = new StringBuffer();StringBuffer sb2 = new StringBuffer(); double minPercent = Double.parseDouble(minP);double maxEvalue = Double.parseDouble(maxE);int count = 0;protein_list = new ArrayList();sb2.append("thresholds: minP=" + minP + ", maxE=" + maxE + "\n========================\n");String [] rows = blast_results.split("\n"); for(int i = 0; i < rows.length; ++i) { String [] cols = rows[i].split("\t"); if(cols != null && cols.length > 9) { String [] query = cols[0].split("[|]"); String uniProtId = cols[1].replaceAll("[|]","").trim(); String percent = cols[2].trim(); String e_val = cols[10].trim(); double max1 = 0; double max2 = 0; if( Double.parseDouble(percent) >= minPercent && Double.parseDouble(e_val) <= maxEvalue ) { //sb1.append("uniprot:" + uniProtId + " "); protein_list.add("uniprot:" + uniProtId); sb2.append(">>> query id=" + query[1]); sb2.append(", name=" + query[2]); sb2.append(", hit uniProt id=" + uniProtId); sb2.append(", % identity=" + percent); sb2.append(", e value =" + e_val + "\n"); } } }// "uniprot:P00734 uniprot:P00737"; // probably on works if both have same pathway id// sb1.toString(); records = sb2.toString(); |
load_blast_results |
beanshell |
Script//String blast_results = "sp|Q2FH34|ACYP_STAA3 |P00734 66.67 9 3 0 47 55 441 449 1.1 17.7\n" +//"sp|Q2FEF3|3MGH_STAA3 |P00533 25.00 72 46 2 40 103 862 933 0.18 21.9";import java.io.*;StringBuffer sb = new StringBuffer();try { BufferedReader br = new BufferedReader(new FileReader(path + filename)); String line = br.readLine(); while(line != null) { sb.append(line + "\n"); line = br.readLine(); } br.close();}catch(Exception ex) { System.out.println(ex); }blast_results = sb.toString(); |
path |
stringconstant |
ValueD:/DATA/demo/ |
Beanshells (2)
Name |
Description |
Inputs |
Outputs |
parse_blast_results |
|
blast_results
minP
maxE
|
protein_list
records
|
load_blast_results |
|
path
filename
|
blast_results
|
Outputs (7)
Name |
Description |
Filtered |
This output shows the filtered list generated from the tabular format made by bconv. Every line generated by bconv is split into three parts and added to a new list, by using whitespace as a regular expression. The three different types of values are KEGG IDs, UniProt accession numbers and a string confirming the existence of the protein in question in both databases.
|
KEGG_ID |
This output shows a list of KEGG IDs generated by filtering the "Filtered" output value list specifically for KEGG IDs using .{3}:.* as a regular expression.
|
Pathway_ID |
This output retrieves the KEGG pathway IDs as a list for the UniProt accession numbers input in the protein_id_list parameter.
|
image |
This output retrieves a KEGG pathway image pinpointing the location of the proteins input from the protein_id_list input parameter.
This output retrieves a KEGG pathway image pinpointing the location of the proteins input from the protein_id_list parameter.
|
blast_hits |
This output retrieves the Blast hits from a blast search comparing dissimilar proteins to the drug target database.
This output retrieves the full Blast result from a blast search comparing dissimilar proteins to the drug target database. The blast result has been filtered to remove proteins with a higher e-value than entered in the input "maxE" and a p-value higher than the one entered in "minP".
This output
This output retrieves the full Blast result from a blast search comparing dissimilar proteins to the drug target database.
|
protein_list |
This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast.
This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. The blast result has been filtered to remove proteins with a higher e-value than entered in the input "maxE" and a p-value higher than the one entered in "minP".
This output retrieves a list of the Uniprot accession numbers for the proteins
This output retrieves a list of the UniProt accession numbers for the dissimilar proteins compared against the drug target database by Blast. The blast result has been filtered to remove proteins with a higher
|
img_url |
This output retrieves the URL for the KEGG pathway image pinpointing the location of the proteins input from the protein_id_list parameter.
|
Datalinks (21)
Source |
Sink |
protein_id_list |
bconv_2:string |
bconv_2:return |
Split_string_into_string_list_by_regular_expression:string |
regex_value:value |
Split_string_into_string_list_by_regular_expression:regex |
Split_string_into_string_list_by_regular_expression:split |
Filter_List_of_Strings_by_regex:stringlist |
regex_value_1:value |
Filter_List_of_Strings_by_regex:regex |
Filter_List_of_Strings_by_regex:filteredlist |
get_pathways_by_genes:genes_id_list |
mark_pathway_by_objects:return |
Get_Image_From_URL:url |
Filter_List_of_Strings_by_regex:filteredlist |
mark_pathway_by_objects:object_id_list |
get_pathways_by_genes:return |
mark_pathway_by_objects:pathway_id |
load_blast_results:blast_results |
parse_blast_results:blast_results |
maxE |
parse_blast_results:maxE |
minP |
parse_blast_results:minP |
path:value |
load_blast_results:path |
blast_output_file |
load_blast_results:filename |
Split_string_into_string_list_by_regular_expression:split |
Filtered |
Filter_List_of_Strings_by_regex:filteredlist |
KEGG_ID |
get_pathways_by_genes:return |
Pathway_ID |
Get_Image_From_URL:image |
image |
parse_blast_results:records |
blast_hits |
parse_blast_results:protein_list |
protein_list |
mark_pathway_by_objects:return |
img_url |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (1)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (1)
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(93)
Only the first 2 workflows that use similar services are shown. View all workflows that use these services.
Cow-Human Ortholog Pathways and Gene annot...
(2)
This workflow searches for genes which reside in a QTL (Quantitative Trait Loci) region in the cow, Bos taurus. The workflow requires an input of: a chromosome name or number; a QTL start base pair position; QTL end base pair position. Data is then extracted from BioMart to annotate each of the genes found in this region. As the Cow genome is currently unfinished, the workflow subsequently maps the cow ensembl gene ids to human orthologues. Entrez and UniProt identifiers are then identified...
Created: 2007-10-03
| Last updated: 2009-12-03
Comments (0)
No comments yet
Log in to make a comment