Pathway and Gene to Pubmed

Created: 2011-02-10 16:10:52 Last updated: 2011-02-18 13:47:08

Download Workflow

This workflow takes in a list of gene names and KEGG pathway descriptions, and searches the PubMed database for corresponding articles. Any matches to the genes are then retrieved (abstracts only). These abstracts are then used to calculate a cosine vector space between two sets of corpora (gene and phenotype, or pathway and phenotype). The workflow counts the number of articles in the pubmed database in which each term occurs, and identifies the total number of articles in the entire PubMed database. It also identified the total number of articles within pubmed so that a term enrichment score may be calculated. The workflow also takes in a document containing abstracts that are related to a particular phenotype. Scientiifc terms are then extracted from this text and given a weighting according to the number of terms that appear in the document. The higher the value the better the score. This is given as: X = log((a / b) / (c / d)) where: a = number of occurnaces of individual terms in phenotype corpus b = number of abstracts in entire phenotype corpus c = number of occurnaces of individual terms in entire pubmed d = number of articles in entire pubmed Once this has been created, the pathways obtained from the QTL and microarray pathway analysis workflows are analysed. The documents from a search of each pathway in pubmed are merged into a single document of pathway abstracts. The (unweighted) phenotype terms are then searched in the pathways corpus. This will determine if the phenotype term is listed with the given pathway. The higher the value the better the score. Each term is then assigned a weight as: Y = log((e / f) / (c /d)) where: a = number of occurnaces of individual terms in pathway corpus b = number of abstracts in pathway corpus (per pathway) c = number of occurnaces of individual terms in entire pubmed d = number of articles in entire pubmed The weighted terms are then given a link score. This is the total of: X + Y. This gives the link between the pathway and the phenotype a score / significance value. The higher the score the more "appropriate/interesting" the link between the pathway and the phenotype. The terms are also ranked according to the number of pathways which have been given a weight. This is calculated as: W = ( X + Y). The higher the value the better the score.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/1846/download?version=2
[ More Info Expand ]

Workflow Components

Authors (1)

Titles (2)

Descriptions (3)

This workflow takes in a list of gene names and searches the PubMed database for corresponding articles. Any matches to the genes are then retrieved (abstracts only). These abstracts are then returned to the user.

Dependencies (0)

Inputs (3)

Name	Description
gene_names
Phenotype_search_term
pathway_descriptions

Processors (73)

Name	Type	Description
regex	stringconstant	Value \n
remove_nulls	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++){ if (!(split[i].equals(""))) { nonEmpty.add(split[i].trim()); } } String[] non_empty = new String[nonEmpty.size()]; for (int i = 0; i < non_empty.length; i ++) { non_empty[i] = nonEmpty.elementAt(i); } String output = ""; for (int i = 0; i < non_empty.length; i++) { output = output + (String) (non_empty[i] + "\n"); }
gene_and_abstract	beanshell	Script String[] split = abstracts.split("\n"); String pathway_name = pathway; Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++) { String trimmed = split[i].trim(); nonEmpty.add(trimmed); } String output = ">> " + pathway_name + "\n"; for (int i = 0; i < nonEmpty.size(); i++) { output = output + (String) (nonEmpty.elementAt(i) + "\n"); }
split_search_terms	localworker	Script List split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } }
merge_outputs_2	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
Search_PubMed	workflow
clean_text	workflow
regular_expression	stringconstant	Value \n
Remove_duplicate_genes	localworker	Script List strippedlist = new ArrayList(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); if (strippedlist.contains(item) == false) { strippedlist.add(item); } }
split_gene_names	localworker	Script List split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } }
remove_Nulls_2	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++){ if (!(split[i].equals(""))) { nonEmpty.add(split[i].trim()); } } String[] non_empty = new String[nonEmpty.size()]; for (int i = 0; i < non_empty.length; i ++) { non_empty[i] = nonEmpty.elementAt(i); } String output = ""; for (int i = 0; i < non_empty.length; i++) { output = output + (String) (non_empty[i] + "\n"); }
Merge_string_list_to_string	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
xpath	stringconstant	Value /[local-name(.)='eSearchResult']/[local-name(.)='IdList']/*[local-name(.)='Id']
run_eSearch	wsdl	Wsdl http://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdl Wsdl Operation run_eSearch
min_date	stringconstant	Value 2000/01/01
extractPMID	localworker	Script import org.dom4j.Document; import org.dom4j.Node; import org.dom4j.io.SAXReader; SAXReader reader = new SAXReader(false); reader.setIncludeInternalDTDDeclarations(false); reader.setIncludeExternalDTDDeclarations(false); Document document = reader.read(new StringReader(xmltext)); List nodelist = document.selectNodes(xpath); // Process the elements in the nodelist ArrayList outputList = new ArrayList(); ArrayList outputXmlList = new ArrayList(); String val = null; String xmlVal = null; for (Iterator iter = nodelist.iterator(); iter.hasNext();) { Node element = (Node) iter.next(); xmlVal = element.asXML(); val = element.getStringValue(); if (val != null && !val.equals("")) { outputList.add(val); outputXmlList.add(xmlVal); } } List nodelist=outputList; List nodelistAsXML=outputXmlList;
remove_Nulls_4	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++){ if (!(split[i].equals(""))) { nonEmpty.add(split[i].trim()); } } String[] non_empty = new String[nonEmpty.size()]; for (int i = 0; i < non_empty.length; i ++) { non_empty[i] = nonEmpty.elementAt(i); } String output = ""; for (int i = 0; i < non_empty.length; i++) { output = output + (String) (non_empty[i] + "\n"); }
concat_abstract_ids	beanshell	Script String id = id.trim(); String abstract_text = abstract_text.trim(); String[] abstract_array = abstract_text.split("\n"); String abstract_amended = ""; if(abstract_array.length > 1) { abstract_amended = abstract_array[0] + " "; for(int i = 1; i < abstract_array.length; i++) { abstract_amended = abstract_amended + abstract_array[i] + " "; } } else { abstract_amended = abstract_array[0]; } String date_text = date_text.trim(); String output = ""; output = id + "\t" + date_text + "\t" + abstract_amended;
max_return_phenotype	stringconstant	Value 5000
pubmed_database	stringconstant	Value pubmed
merge_phenotype_abstracts	beanshell	Script String[] split = abstracts.split("\n"); String phenotype_term = phenotype.trim(); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++) { String trimmed = split[i].trim(); // String mytext = split[i].substring(split[i].indexOf(0), split[i].indexOf(" AND ")); nonEmpty.add(trimmed); } String output = ">> " + phenotype_term + "\n"; for (int i = 0; i < nonEmpty.size(); i++) { output = output + (String) (nonEmpty.elementAt(i) + "\n"); }
merge_abstracts	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
max_date	stringconstant	Value 2011/02/18
parametersXML_eSearch	xmlsplitter
merge_abstract_ids	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
merge_dates	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
Fetch_Abstracts	workflow
clean_text_3	workflow
Encode_byte_to_base64	localworker	Script import org.apache.commons.codec.binary.Base64; base64 = new String(Base64.encodeBase64(bytes));
merge_strings_2	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
regex_3	stringconstant	Value \n
xpath_2	stringconstant	Value /[local-name(.)='generateTerminologyResponse']/[local-name(.)='return']
extract_abstracts	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 1; i < split.length; i++) { String trimmed = split[i].trim(); String[] split_2 = trimmed.split("\t"); if(split_2.length == 3) { nonEmpty.add(split_2[2]); } } String output_search = split[0] + "\n"; String output = ""; for (int i = 0; i < nonEmpty.size(); i++) { output = output + (String) (nonEmpty.elementAt(i) + "\n"); }
merge_search_and_terms	beanshell	Script String term_input = terms.trim(); String search_input = search_term.trim(); String output = ""; output = search_input + "\n" + term_input;
extract_Terms_2	localworker	Script import org.dom4j.Document; import org.dom4j.Node; import org.dom4j.io.SAXReader; SAXReader reader = new SAXReader(false); reader.setIncludeInternalDTDDeclarations(false); reader.setIncludeExternalDTDDeclarations(false); Document document = reader.read(new StringReader(xmltext)); List nodelist = document.selectNodes(xpath); // Process the elements in the nodelist ArrayList outputList = new ArrayList(); ArrayList outputXmlList = new ArrayList(); String val = null; String xmlVal = null; for (Iterator iter = nodelist.iterator(); iter.hasNext();) { Node element = (Node) iter.next(); xmlVal = element.asXML(); val = element.getStringValue(); if (val != null && !val.equals("")) { outputList.add(val); outputXmlList.add(xmlVal); } } List nodelist=outputList; List nodelistAsXML=outputXmlList;
merge_strings	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
remove_Nulls_5	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++){ if (!(split[i].equals(""))) { nonEmpty.add(split[i].trim()); } } String[] non_empty = new String[nonEmpty.size()]; for (int i = 0; i < non_empty.length; i ++) { non_empty[i] = nonEmpty.elementAt(i); } String output = ""; for (int i = 0; i < non_empty.length; i++) { output = output + (String) (non_empty[i] + "\n"); }
Remove_duplicate_strings_2	localworker	Script List strippedlist = new ArrayList(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); if (strippedlist.contains(item) == false) { strippedlist.add(item); } }
split_by_regex	localworker	Script List split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } }
clean_text_4	soaplab	Endpoint http://phoebus.cs.man.ac.uk:1977/axis/services/text_mining.clean_text
pubmed_database_2	stringconstant	Value pubmed
xpath_3	stringconstant	Value /[local-name(.)='eSearchResult']/[local-name(.)='Count']
count	stringconstant	Value count
xpath_count	stringconstant	Value /[local-name(.)='eInfoResult']/[local-name(.)='DbInfo']/*[local-name(.)='Count']
eSearch_database	stringconstant	Value pubmed
regular_expression_2	stringconstant	Value \n
extract_terms_3	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 1; i < split.length; i++) { String trimmed = split[i].trim(); // if((trimmed.contains("=")) \|\| (trimmed.contains("-"))) // { // next; // } // else // { // String[] trimmed_array = trimmed.split("\t"); // String term = trimmed_array[0]; nonEmpty.add(trimmed); // } } String output = ""; for (int i = 0; i < nonEmpty.size(); i++) { output = output + (String) (nonEmpty.elementAt(i) + "\n"); }
merge_term_count	beanshell	Script String term_input = term.trim(); String count_input = count.trim(); String output = ""; output = term_input + "\t" + count_input;
split_extracted_terms	localworker	Script List split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } }
merge_pubmed_count	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
merge_extracted	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
merge_list	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
extractCount_2	localworker	Script import org.dom4j.Document; import org.dom4j.Node; import org.dom4j.io.SAXReader; SAXReader reader = new SAXReader(false); reader.setIncludeInternalDTDDeclarations(false); reader.setIncludeExternalDTDDeclarations(false); Document document = reader.read(new StringReader(xmltext)); List nodelist = document.selectNodes(xpath); // Process the elements in the nodelist ArrayList outputList = new ArrayList(); ArrayList outputXmlList = new ArrayList(); String val = null; String xmlVal = null; for (Iterator iter = nodelist.iterator(); iter.hasNext();) { Node element = (Node) iter.next(); xmlVal = element.asXML(); val = element.getStringValue(); if (val != null && !val.equals("")) { outputList.add(val); outputXmlList.add(xmlVal); } } List nodelist=outputList; List nodelistAsXML=outputXmlList;
extractCount	localworker	Script import org.dom4j.Document; import org.dom4j.Node; import org.dom4j.io.SAXReader; SAXReader reader = new SAXReader(false); reader.setIncludeInternalDTDDeclarations(false); reader.setIncludeExternalDTDDeclarations(false); Document document = reader.read(new StringReader(xmltext)); List nodelist = document.selectNodes(xpath); // Process the elements in the nodelist ArrayList outputList = new ArrayList(); ArrayList outputXmlList = new ArrayList(); String val = null; String xmlVal = null; for (Iterator iter = nodelist.iterator(); iter.hasNext();) { Node element = (Node) iter.next(); xmlVal = element.asXML(); val = element.getStringValue(); if (val != null && !val.equals("")) { outputList.add(val); outputXmlList.add(xmlVal); } } List nodelist=outputList; List nodelistAsXML=outputXmlList;
parametersXML_1	xmlsplitter
run_eInfo	wsdl	Wsdl http://eutils.ncbi.nlm.nih.gov/soap/v2.0/eutils.wsdl Wsdl Operation run_eInfo
run_eSearch_2	wsdl	Wsdl http://eutils.ncbi.nlm.nih.gov/soap/v2.0/eutils.wsdl Wsdl Operation run_eSearch
run_eSearch_request	xmlsplitter
GENE_RankPhenotypeTerms	workflow
max_return_gene	stringconstant	Value 500
PATHWAY_RankPhenotypeTerms	workflow
split_search_terms_2	localworker	Script List split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } }
pathway_and_abstract	beanshell	Script String[] split = abstracts.split("\n"); String pathway_name = pathway; Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++) { String trimmed = split[i].trim(); nonEmpty.add(trimmed); } String output = ">> " + pathway_name + "\n"; for (int i = 0; i < nonEmpty.size(); i++) { output = output + (String) (nonEmpty.elementAt(i) + "\n"); }
remove_nulls_3	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++){ if (!(split[i].equals(""))) { nonEmpty.add(split[i].trim()); } } String[] non_empty = new String[nonEmpty.size()]; for (int i = 0; i < non_empty.length; i ++) { non_empty[i] = nonEmpty.elementAt(i); } String output = ""; for (int i = 0; i < non_empty.length; i++) { output = output + (String) (non_empty[i] + "\n"); }
merge_outputs_2_2	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
regex_2	stringconstant	Value \n
extract_terms	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++) { String mytext = split[i].substring(split[i].indexOf(" "), split[i].lastIndexOf(" - ")); nonEmpty.add(mytext); } String output = ""; for (int i = 0; i < nonEmpty.size(); i++) { output = output + (String) (nonEmpty.elementAt(i) + "\n"); }
add_MeSH_to_string	beanshell	Script String[] split = input.split("\n"); Vector nonEmpty = new Vector(); for (int i = 0; i < split.length; i++) { String trimmed = split[i].trim(); nonEmpty.add(trimmed); } String output = ""; for (int i = 0; i < nonEmpty.size(); i++) { output = output + (String) (nonEmpty.elementAt(i) + " AND \"Metabolic Networks and Pathways\"[MeSH Terms]" + "\n"); }
Search_PubMed_2	workflow
clean_text_2	workflow
generateTerminology	wsdl	Wsdl http://projects.biotec.tu-dresden.de/DOG4DAG_TAVERNA/services/GoPubMedTermGeneration?wsdl Wsdl Operation generateTerminology
generateTerminology_input	xmlsplitter
applicationCode_value	stringconstant	Value 20110209_taverna

Beanshells (22)

Name	Inputs	Outputs
remove_nulls	input	output
gene_and_abstract	pathway abstracts	output
remove_Nulls_2	input	output
remove_Nulls_4	input	output
concat_abstract_ids	id abstract_text date_text	output
merge_phenotype_abstracts	phenotype abstracts	output
extract_abstracts	input	output output_search
merge_search_and_terms	terms search_term	output
remove_Nulls_5	input	output
extract_terms_3	input	output
merge_term_count	term count	output
pathway_and_abstract	pathway abstracts	output
remove_nulls_3	input	output
extract_terms	input	output
add_MeSH_to_string	input	output
format_rankings	ranked_terms	title_term_rankings
remove_Nulls	input	output
stringToBytes	string	bytes
concat_abstract_ids	id abstract_text date_text	output
concat_abstract_ids	id abstract_text date_text	output
stringToBytes	string	bytes
stringToBytes	string	bytes

Outputs (10)

Name	Description
gene_abstracts
phenotype_abstracts
phenotype_terms
phenotype_term_counts
pubmed_abstract_number
gene_cosine
gene_term_enrichment_scores
pathway_abstracts
pathway_cosine_vector_scores
pathway_concept_rankings

Datalinks (101)

Source	Sink
gene_and_abstract:output	remove_nulls:input
Search_PubMed:concat_data	gene_and_abstract:abstracts
split_search_terms:split	gene_and_abstract:pathway
regex:value	split_search_terms:regex
remove_Nulls_2:output	split_search_terms:string
clean_text:cleaned_text	merge_outputs_2:stringlist
split_search_terms:split	Search_PubMed:search_term
min_date:value	Search_PubMed:min_date
max_date:value	Search_PubMed:max_date
max_return_gene:value	Search_PubMed:max_return
remove_nulls:output	clean_text:input
split_gene_names:split	Remove_duplicate_genes:stringlist
gene_names	split_gene_names:string
regular_expression:value	split_gene_names:regex
Merge_string_list_to_string:concatenated	remove_Nulls_2:input
Remove_duplicate_genes:strippedlist	Merge_string_list_to_string:stringlist
parametersXML_eSearch:output	run_eSearch:parameters
xpath:value	extractPMID:xpath
run_eSearch:parameters	extractPMID:xml-text
merge_phenotype_abstracts:output	remove_Nulls_4:input
extractPMID:nodelist	concat_abstract_ids:id
merge_abstracts:concatenated	concat_abstract_ids:abstract_text
merge_dates:concatenated	concat_abstract_ids:date_text
merge_abstract_ids:concatenated	merge_phenotype_abstracts:abstracts
Phenotype_search_term	merge_phenotype_abstracts:phenotype
Fetch_Abstracts:abstracts	merge_abstracts:stringlist
max_return_phenotype:value	parametersXML_eSearch:RetMax
pubmed_database:value	parametersXML_eSearch:db
max_date:value	parametersXML_eSearch:maxdate
min_date:value	parametersXML_eSearch:mindate
Phenotype_search_term	parametersXML_eSearch:term
concat_abstract_ids:output	merge_abstract_ids:stringlist
Fetch_Abstracts:pubmed_dates	merge_dates:stringlist
extractPMID:nodelist	Fetch_Abstracts:pubmed_ids
remove_Nulls_4:output	clean_text_3:input
clean_text_3:cleaned_text	Encode_byte_to_base64:bytes
merge_strings:concatenated	merge_strings_2:stringlist
clean_text_4:output	extract_abstracts:input
merge_strings_2:concatenated	merge_search_and_terms:terms
extract_abstracts:output_search	merge_search_and_terms:search_term
xpath_2:value	extract_Terms_2:xpath
generateTerminology:parameters	extract_Terms_2:xml-text
Remove_duplicate_strings_2:strippedlist	merge_strings:stringlist
merge_search_and_terms:output	remove_Nulls_5:input
extract_Terms_2:nodelist	Remove_duplicate_strings_2:stringlist
regex_3:value	split_by_regex:regex
extract_abstracts:output	split_by_regex:string
Encode_byte_to_base64:base64	clean_text_4:file_direct_data
remove_Nulls_5:output	extract_terms_3:input
split_extracted_terms:split	merge_term_count:term
merge_extracted:concatenated	merge_term_count:count
extract_terms_3:output	split_extracted_terms:string
regular_expression_2:value	split_extracted_terms:regex
extractCount:nodelist	merge_pubmed_count:stringlist
extractCount_2:nodelist	merge_extracted:stringlist
merge_term_count:output	merge_list:stringlist
xpath_3:value	extractCount_2:xpath
run_eSearch_2:result	extractCount_2:xml-text
xpath_count:value	extractCount:xpath
run_eInfo:result	extractCount:xml-text
pubmed_database_2:value	parametersXML_1:db
parametersXML_1:output	run_eInfo:request
run_eSearch_request:output	run_eSearch_2:request
eSearch_database:value	run_eSearch_request:db
count:value	run_eSearch_request:rettype
split_extracted_terms:split	run_eSearch_request:term
min_date:value	run_eSearch_request:mindate
max_date:value	run_eSearch_request:maxdate
merge_pubmed_count:concatenated	GENE_RankPhenotypeTerms:pubmed_abstract_number
merge_list:concatenated	GENE_RankPhenotypeTerms:phenotype_term_counts
merge_outputs_2:concatenated	GENE_RankPhenotypeTerms:query_abstracts
clean_text_3:cleaned_text	GENE_RankPhenotypeTerms:phenotype_abstracts
remove_Nulls_5:output	GENE_RankPhenotypeTerms:phenotype_terms
merge_outputs_2_2:concatenated	PATHWAY_RankPhenotypeTerms:query_abstracts
clean_text_3:cleaned_text	PATHWAY_RankPhenotypeTerms:phenotype_abstracts
merge_pubmed_count:concatenated	PATHWAY_RankPhenotypeTerms:pubmed_abstract_number
merge_list:concatenated	PATHWAY_RankPhenotypeTerms:phenotype_term_counts
remove_Nulls_5:output	PATHWAY_RankPhenotypeTerms:phenotype_terms
add_MeSH_to_string:output	split_search_terms_2:string
regex_2:value	split_search_terms_2:regex
Search_PubMed_2:concat_data	pathway_and_abstract:abstracts
split_search_terms_2:split	pathway_and_abstract:pathway
pathway_and_abstract:output	remove_nulls_3:input
clean_text_2:cleaned_text	merge_outputs_2_2:stringlist
pathway_descriptions	extract_terms:input
extract_terms:output	add_MeSH_to_string:input
split_search_terms_2:split	Search_PubMed_2:Pathway_search_term
remove_nulls_3:output	clean_text_2:input
generateTerminology_input:output	generateTerminology:parameters
split_by_regex:split	generateTerminology_input:texts
applicationCode_value:value	generateTerminology_input:applicationCode
merge_outputs_2:concatenated	gene_abstracts
clean_text_3:cleaned_text	phenotype_abstracts
remove_Nulls_5:output	phenotype_terms
merge_list:concatenated	phenotype_term_counts
merge_pubmed_count:concatenated	pubmed_abstract_number
GENE_RankPhenotypeTerms:cosine_vector_scores	gene_cosine
GENE_RankPhenotypeTerms:concept_rankings	gene_term_enrichment_scores
merge_outputs_2_2:concatenated	pathway_abstracts
PATHWAY_RankPhenotypeTerms:cosine_vector_scores	pathway_cosine_vector_scores
PATHWAY_RankPhenotypeTerms:concept_rankings	pathway_concept_rankings

Coordinations (0)

Information Workflow Type

Taverna 2

Information Uploader

Paul Fisher

Information License

All versions of this Workflow are licensed under:

Information Version 2 (latest) (of 2)

Information Credits (1)

(People/Groups)

Paul Fisher

Information Attributions (8)

(Workflows/Files)

Information Tags (36)

Uploader tags

annotation
|
concept
|
concept profile
|
controlled vocabulary
|
cosine vector space
|
data integration
|
data-driven
|
dresden
|
efetch
|
enrichment
|
entity recognition
|
esearch
|
eutils
|
gene
|
gene identifier
|
geneid
|
genotype
|
kegg
|
Kegg Pathways
|
KeggID
|
literature
|
loci
|
locus
|
medline
|
mesh
|
pathway
|
pathway-driven
|
pathways
|
phenotype
|
pubmed
|
qtl
|
quanitative
|
text mining
|
text mining; term extraction; entity recognition
|
trait
|
triat

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (2)

Log in to add to one of your Packs

Information Attributed By (2)

(Workflows/Files)

Information Favourited By (0)

No one

Information Statistics

3091 viewings

1868 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

Pathway and Gene to Pubmed

Created by Paul Fisher on Thursday 10 February 2011 16:10:52 (UTC)

Last edited by Paul Fisher on Thursday 10 February 2011 16:15:43 (UTC)
Pathway and Gene to Pubmed

Created by Paul Fisher on Friday 18 February 2011 13:45:51 (UTC)

Last edited by Paul Fisher on Friday 18 February 2011 13:47:08 (UTC)

Revision comment:

Updated the description and added dates to term counting

Reviews (0)

No reviews yet

Be the first to review!

Comments (0)

View Timeline

No comments yet

Log in to make a comment

Other workflows that use similar services (35)

Only the first 2 workflows that use similar services are shown. View all workflows that use these services.

Taverna 2

Uploader

Paul Fisher

Gene to Pubmed (4)

Download

Created: 2011-02-08 | Last updated: 2011-02-10

Credits: Paul Fisher

Attributions: Cosine vector space Extract Scientific Terms Rank Phenotype Terms Cosine vector space Rank Phenotype Terms Pathway to Pubmed Extract Scientific Terms

Taverna 2

Uploader

Paul Fisher

PubMed Search (1)

Download

This workflow takes in a search term, are passed to the eSearch function and searched for in PubMed. Those abstracts found are returned to the user

Created: 2011-02-03 | Last updated: 2011-02-03

Credits: Paul Fisher

Pathway and Gene to Pubmed

Preview

Run

Run this Workflow in the Taverna Workbench...

Workflow Components

Value

Script

Script

Script

Script

Value

Script

Script

Script

Script

Value

Wsdl

Wsdl Operation

Value

Script

Script

Script

Value

Value

Script

Script

Value

Script

Script

Script

Script

Value

Value

Script

Script

Script

Script

Script

Script

Script

Endpoint

Value

Value

Value

Value

Value

Value

Script

Script

Script

Script

Script

Script

Script

Script

Wsdl

Wsdl Operation

Wsdl

Wsdl Operation

Value

Script

Script

Script

Script

Value

Script

Script

Wsdl

Wsdl Operation

Value

Reviews (0)

Comments (0)

Other workflows that use similar services (35)