Nucleotide frequency workflow
ALDH2 gene encodes enzyme which transform from acetaldehyde to acetic acid. It is known that variation of one DNA weakens the function. It is said that there are a lot of people who have the gene of such a type in Asia including Japan. However there are few people of such type of the gene in Europe or America. Therefore this workflow verifies that how type of the gene is distributed among Asian and Europe or America by using DDBJ and dbSNP data.
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/1258/download?version=1
[ More Info ]
Taverna is available from http://taverna.sourceforge.net/
If you are having problems downloading it in Taverna, you may need to provide your username and password in the URL so that Taverna can access the Workflow:
Replace http:// in the link above with http://yourusername:yourpassword@
Workflow Components
None
Nucleotide frequency workflow |
None
None
Name | Description |
---|---|
gene_name | |
function | |
Name | Type | Description |
---|---|---|
Make_query_for_searching_DDBJ | beanshell |
ScriptString url = "http://xml.nig.ac.jp/rest/Invoke?"; String queryPath = "/ENTRY/DDBJ/organism=='Homo sapiens' AND "; queryPath += "(/ENTRY/DDBJ/feature-table/feature{/f_key=='variation' AND /f_quals/qualifier{/q_name=='gene' AND /q_value=='" + gene + "'}})"; queryPath = URLEncoder.encode(queryPath); String returnPath = "/ENTRY/DDBJ/primary-accession,/ENTRY/DDBJ/definition"; returnPath = URLEncoder.encode(returnPath); String query = "service=ARSA&method=searchByXMLPath&queryPath=" + queryPath + "&returnPath=" + returnPath + "&offset=1&count=100"; url += query; |
Get_DDBJ_list_by_ARSA_searchByXMLPath | localworker |
ScriptURL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } URLConnection con = inputURL.openConnection(); InputStream in = con.getInputStream(); InputStreamReader isr = new InputStreamReader(in); Reader inReader = new BufferedReader(isr); StringBuffer buf = new StringBuffer(); int ch; while ((ch = inReader.read()) > -1) { buf.append((char)ch); } inReader.close(); contents = buf.toString(); //String NEWLINE = System.getProperty("line.separator"); // //URL inputURL = null; //if (base != void) { // inputURL = new URL(new URL(base), url); //} else { // inputURL = new URL(url); //} //StringBuffer result = new StringBuffer(); //BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream())); //String line = null; //while ((line = reader.readLine()) != null) { // result.append(line); // result.append(NEWLINE); //} // //contents = result.toString(); |
Split_string_DDBJ_list_by_enter | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Split_string_enter | stringconstant |
Value\n |
Skip_first_two_lines | beanshell |
ScriptList ddbjList = new ArrayList(); for(int i=0; i |
Split_string_tab | stringconstant |
Value\t |
Split_string_DDBJ_list_lines_by_tab | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Make_query_for_searching_DDBJ_to_get_feature_info | beanshell |
ScriptList urlList = new ArrayList(); String baseUrl = "http://xml.nig.ac.jp/rest/Invoke?"; for(int i=0; i |
Get_variation_information_by_DDBJ_getFeatureInfo | localworker |
ScriptURL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } URLConnection con = inputURL.openConnection(); InputStream in = con.getInputStream(); InputStreamReader isr = new InputStreamReader(in); Reader inReader = new BufferedReader(isr); StringBuffer buf = new StringBuffer(); int ch; while ((ch = inReader.read()) > -1) { buf.append((char)ch); } inReader.close(); contents = buf.toString(); //String NEWLINE = System.getProperty("line.separator"); // //URL inputURL = null; //if (base != void) { // inputURL = new URL(new URL(base), url); //} else { // inputURL = new URL(url); //} //StringBuffer result = new StringBuffer(); //BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream())); //String line = null; //while ((line = reader.readLine()) != null) { // result.append(line); // result.append(NEWLINE); //} // //contents = result.toString(); |
Split_string_variation_information_by_enter | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Summarize_variation_information | beanshell |
ScriptStringBuffer variation = new StringBuffer(); for(int i=0; i |
Split_string_variation_information_lines_by_tab | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Make_output_for_DDBJ | beanshell |
ScriptString output = ""; for(int i=0; i |
Make_query_for_searching_dbsnp_by_esearch | beanshell |
Scriptif( function == null || function.equals("null")) { function = ""; } String baseUrl = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?"; String db ="snp"; String term="@@gene@@[GENE] AND human[ORGN] AND \"by cluster\"[Validation] AND \"by frequency\"[Validation] AND \"@@function@@\"[Function_Class]"; String retmax="100"; String encode = "UTF8"; term = term.replaceAll("@@gene@@", gene); if(function != null && !function.equals("")) { term = term.replaceAll("@@function@@", function); } else { term = term.replaceAll(" AND \\\\\"@@function@@\\\\\"\\[Function_Class\\]", ""); } String url = baseUrl + "db="+db +"&term=" + URLEncoder.encode(term,encode) +"&retmax=" + retmax +"&email=" + email; |
Get_dbsnp_list_by_ncbieutils_esearch | localworker |
ScriptURL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } URLConnection con = inputURL.openConnection(); InputStream in = con.getInputStream(); InputStreamReader isr = new InputStreamReader(in); Reader inReader = new BufferedReader(isr); StringBuffer buf = new StringBuffer(); int ch; while ((ch = inReader.read()) > -1) { buf.append((char)ch); } inReader.close(); contents = buf.toString(); //String NEWLINE = System.getProperty("line.separator"); // //URL inputURL = null; //if (base != void) { // inputURL = new URL(new URL(base), url); //} else { // inputURL = new URL(url); //} //StringBuffer result = new StringBuffer(); //BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream())); //String line = null; //while ((line = reader.readLine()) != null) { // result.append(line); // result.append(NEWLINE); //} // //contents = result.toString(); |
Make_query_for_searching_dbsnp_by_efetch | beanshell |
ScriptList idList = new ArrayList(); List urlList = new ArrayList(); String baseUrl = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"; String report="FLT"; String db="snp"; for(int i=0; i |
Split_string_dbsnp_list_by_enter | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Get_dbsnp_summary_by_ncbieutils_efetch | localworker |
ScriptURL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } URLConnection con = inputURL.openConnection(); InputStream in = con.getInputStream(); InputStreamReader isr = new InputStreamReader(in); Reader inReader = new BufferedReader(isr); StringBuffer buf = new StringBuffer(); int ch; while ((ch = inReader.read()) > -1) { buf.append((char)ch); } inReader.close(); contents = buf.toString(); //String NEWLINE = System.getProperty("line.separator"); // //URL inputURL = null; //if (base != void) { // inputURL = new URL(new URL(base), url); //} else { // inputURL = new URL(url); //} //StringBuffer result = new StringBuffer(); //BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream())); //String line = null; //while ((line = reader.readLine()) != null) { // result.append(line); // result.append(NEWLINE); //} // //contents = result.toString(); |
Extract_not_merged_rsid | beanshell |
ScriptList notMergedRsidList = new ArrayList(); for(int i = 0; i < dbsnpEfetchResultList.size(); i++) { List oneResultList = (List)dbsnpEfetchResultList.get(i); if(oneResultList.size() < 4) { continue; } String idLine = (String)oneResultList.get(3); String id = idLine.substring(3); id = id.substring(0, id.indexOf(" ")); String nextLine = (String)oneResultList.get(4); if(nextLine.startsWith(id)) { notMergedRsidList.add(id); } else { continue; } } |
Make_query_for_retrieving_genotype | beanshell |
ScriptString baseUrl = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?"; String report="GENXML"; String db="snp"; String mode="text"; String url = baseUrl + "report=" + report + "&db=" + db + "&mode=" + mode + "&id=" + rsId.substring(2) + "&email=" + email; |
Get_dbsnp_genotype_frequency_by_ncbieutils_efetch | localworker |
ScriptURL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } URLConnection con = inputURL.openConnection(); InputStream in = con.getInputStream(); InputStreamReader isr = new InputStreamReader(in); Reader inReader = new BufferedReader(isr); StringBuffer buf = new StringBuffer(); int ch; while ((ch = inReader.read()) > -1) { buf.append((char)ch); } inReader.close(); contents = buf.toString(); //String NEWLINE = System.getProperty("line.separator"); // //URL inputURL = null; //if (base != void) { // inputURL = new URL(new URL(base), url); //} else { // inputURL = new URL(url); //} //StringBuffer result = new StringBuffer(); //BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream())); //String line = null; //while ((line = reader.readLine()) != null) { // result.append(line); // result.append(NEWLINE); //} // //contents = result.toString(); |
Extraction_of_genotype_and_frequency_of_each_population | beanshell |
ScriptList output = new ArrayList(); org.xml.sax.helpers.DefaultHandler handler = new org.xml.sax.helpers.DefaultHandler() { public String output = ""; public Stack elements = new Stack(); public HashMap popGroup = new HashMap(); public HashMap popDiversity = new HashMap(); public String ssId = ""; public String popId = ""; public void printNCBI(String id) { StringBuffer rsBuffer = new StringBuffer(); rsBuffer.append("Reference SNP ID:" + id + "\n"); for(Iterator it = popDiversity.keySet().iterator(); it.hasNext();) { boolean hasDiversity = false; String ssId = (String)it.next(); StringBuffer ssBuffer = new StringBuffer("\tSubmitter SNP ID:" + ssId + "\n"); HashMap popIdMap = (HashMap)popDiversity.get(ssId); HashMap genotypeFreq = new HashMap(); for(Iterator it2 = popIdMap.keySet().iterator(); it2.hasNext();) { String popId = (String)it2.next(); HashMap genotypeMap = (HashMap)popIdMap.get(popId); String popLabel = getPopLabel(popId); ssBuffer.append("\t\t" + popLabel + ":"); StringBuffer sb = new StringBuffer(); for(Iterator it3 = genotypeMap.keySet().iterator(); it3.hasNext();) { String genotype = (String)it3.next(); String freq = (String)genotypeMap.get(genotype); ArrayList freqList = new ArrayList(); if(genotypeFreq.containsKey(genotype)) { freqList = (ArrayList)genotypeFreq.get(genotype); if(checkDiversity(freqList, freq)) { hasDiversity = true; } } freqList.add(freq); genotypeFreq.put(genotype, freqList); sb.append(genotype + ":" + freq + ","); } // delete last character ',' sb.deleteCharAt(sb.length() - 1); ssBuffer.append(sb.toString()+"\n"); } if(hasDiversity) { rsBuffer.append(ssBuffer.toString()); } } if(rsBuffer.length() > 20) { output += rsBuffer.toString().trim() + "\n"; } } public boolean checkDiversity(ArrayList freqList, String freq) { float freqFloat = Float.parseFloat(freq); for(int i = 0; i < freqList.size(); i++) { String f = (String)freqList.get(i); float ff = Float.parseFloat(f); if(Math.abs(ff - freqFloat) > 0.3) { return true; } } return false; } public String getPopLabel(String popId) { StringBuffer label = new StringBuffer(); if(popGroup.containsKey(popId)) { HashSet set = (HashSet)popGroup.get(popId); if(set.size() > 3) { return "multiple(" + popId + ")"; } for(Iterator it = set.iterator(); it.hasNext();) { label.append((String)it.next() + ","); } } else { return "No description(" + popId + ")"; } String labelStr = label.toString(); return labelStr.substring(0, labelStr.length() - 1) + "(" + popId + ")"; } public void startElement(String uri, String localName, String qName, org.xml.sax.Attributes attributes) { elements.push(qName); String tree = getTree(); if(tree.equals("/GenoExchange/Individual/SubmitInfo")) { String popId = attributes.getValue("popId"); String subIndGroup = attributes.getValue("subIndGroup"); HashSet groupSet; if(popGroup.containsKey(popId)) { groupSet = (HashSet)popGroup.get(popId); } else { groupSet = new HashSet(); } groupSet.add(subIndGroup); popGroup.put(popId, groupSet); } else if(tree.equals("/GenoExchange/SnpInfo/SsInfo")) { ssId = attributes.getValue("ssId"); } else if(tree.equals("/GenoExchange/SnpInfo/SsInfo/ByPop")) { popId = attributes.getValue("popId"); } else if(tree.equals("/GenoExchange/SnpInfo/SsInfo/ByPop/GTypeFreq")) { String gtype = attributes.getValue("gtype"); String freq = attributes.getValue("freq"); HashMap popIdMap; if(popDiversity.containsKey(ssId)) { popIdMap = (HashMap)popDiversity.get(ssId); } else { popIdMap = new HashMap(); } setPopIdMap(popId, gtype, freq, popIdMap); popDiversity.put(ssId, popIdMap); } } public void setPopIdMap(String popId, String gtype, String freq, HashMap popIdMap) { HashMap genotypeMap; if(popIdMap.containsKey(popId)) { genotypeMap = (HashMap)popIdMap.get(popId); } else { genotypeMap = new HashMap(); } genotypeMap.put(gtype, freq); popIdMap.put(popId, genotypeMap); } public void endElement(String uri, String localName, String qName) { elements.pop(); } public String getTree() { StringBuffer tree = new StringBuffer(); for(int i = 0; i < elements.size(); i++) { String s = (String)elements.get(i); tree.append("/" + s); } return tree.toString(); } }; javax.xml.parsers.SAXParserFactory spfactory = javax.xml.parsers.SAXParserFactory.newInstance(); javax.xml.parsers.SAXParser parser = spfactory.newSAXParser(); for(int i=0; i |
Make_query_for_searching_pubmed | beanshell |
ScriptString baseUrl = "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?"; String dbfrom="snp"; String db="pubmed"; String mode="text"; String url = baseUrl + "dbfrom=" + dbfrom + "&db=" + db + "&mode=" + mode + "&id="+ rsId.substring(2) + "&email=" + email; id = rsId; |
Get_dbsnp_pubmed_cited_by_ncbieutils_elink | localworker |
ScriptURL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } URLConnection con = inputURL.openConnection(); InputStream in = con.getInputStream(); InputStreamReader isr = new InputStreamReader(in); Reader inReader = new BufferedReader(isr); StringBuffer buf = new StringBuffer(); int ch; while ((ch = inReader.read()) > -1) { buf.append((char)ch); } inReader.close(); contents = buf.toString(); //String NEWLINE = System.getProperty("line.separator"); // //URL inputURL = null; //if (base != void) { // inputURL = new URL(new URL(base), url); //} else { // inputURL = new URL(url); //} //StringBuffer result = new StringBuffer(); //BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream())); //String line = null; //while ((line = reader.readLine()) != null) { // result.append(line); // result.append(NEWLINE); //} // //contents = result.toString(); |
Summarize_cited_snp | beanshell |
ScriptList output = new ArrayList(); for(int i = 0; i < rsId.size(); i++) { List l = (List)pubmedIdList.get(i); String rs = (String)rsId.get(i); StringBuffer out = new StringBuffer(); for(int j=0; j |
Merge_output | beanshell |
ScriptStringBuffer sb = new StringBuffer(); sb.append(ddbj + "\n"); for(int i=0; i |
Extract_pubmed_id | beanshell |
Scriptboolean target = false; List pubmedIdList = new ArrayList(); BufferedReader br = new BufferedReader(new StringReader(elinkResult)); String l; while((l =br.readLine()) != null) { if(l.indexOf(" |
Split_string_dbsnp_summary_by_enter | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Name | Description | Inputs | Outputs |
---|---|---|---|
Make_query_for_searching_DDBJ | gene | url | |
Skip_first_two_lines | list | ddbjList | |
Make_query_for_searching_DDBJ_to_get_feature_info | ddbjList | urlList | |
Summarize_variation_information | list | variationAnnotation | |
Make_output_for_DDBJ |
ddbj variation |
output | |
Make_query_for_searching_dbsnp_by_esearch |
function gene |
url param |
|
Make_query_for_searching_dbsnp_by_efetch |
result |
urlList | |
Extract_not_merged_rsid | dbsnpEfetchResultList | notMergedRsidList | |
Make_query_for_retrieving_genotype |
rsId |
url | |
Extraction_of_genotype_and_frequency_of_each_population |
inputList rsIdList |
output | |
Make_query_for_searching_pubmed |
rsId |
url id |
|
Summarize_cited_snp |
pubmedIdList rsId |
output | |
Merge_output |
ddbj genotypeInfo citedPaper |
output | |
Extract_pubmed_id | elinkResult | pubmedIdList |
Name | Description |
---|---|
Nucleotide_frequency |
Source | Sink |
---|---|
gene_name | Make_query_for_searching_DDBJ:gene |
Make_query_for_searching_DDBJ:url | Get_DDBJ_list_by_ARSA_searchByXMLPath:url |
Get_DDBJ_list_by_ARSA_searchByXMLPath:contents | Split_string_DDBJ_list_by_enter:string |
Split_string_enter:value | Split_string_DDBJ_list_by_enter:regex |
Split_string_DDBJ_list_lines_by_tab:split | Skip_first_two_lines:list |
Split_string_DDBJ_list_by_enter:split | Split_string_DDBJ_list_lines_by_tab:string |
Split_string_tab:value | Split_string_DDBJ_list_lines_by_tab:regex |
Skip_first_two_lines:ddbjList | Make_query_for_searching_DDBJ_to_get_feature_info:ddbjList |
Make_query_for_searching_DDBJ_to_get_feature_info:urlList | Get_variation_information_by_DDBJ_getFeatureInfo:url |
Get_variation_information_by_DDBJ_getFeatureInfo:contents | Split_string_variation_information_by_enter:string |
Split_string_enter:value | Split_string_variation_information_by_enter:regex |
Split_string_variation_information_lines_by_tab:split | Summarize_variation_information:list |
Split_string_variation_information_by_enter:split | Split_string_variation_information_lines_by_tab:string |
Split_string_tab:value | Split_string_variation_information_lines_by_tab:regex |
Skip_first_two_lines:ddbjList | Make_output_for_DDBJ:ddbj |
Summarize_variation_information:variationAnnotation | Make_output_for_DDBJ:variation |
Make_query_for_searching_dbsnp_by_esearch:email | |
function | Make_query_for_searching_dbsnp_by_esearch:function |
gene_name | Make_query_for_searching_dbsnp_by_esearch:gene |
Make_query_for_searching_dbsnp_by_esearch:url | Get_dbsnp_list_by_ncbieutils_esearch:url |
Split_string_dbsnp_list_by_enter:split | Make_query_for_searching_dbsnp_by_efetch:result |
Make_query_for_searching_dbsnp_by_efetch:email | |
Get_dbsnp_list_by_ncbieutils_esearch:contents | Split_string_dbsnp_list_by_enter:string |
Split_string_enter:value | Split_string_dbsnp_list_by_enter:regex |
Make_query_for_searching_dbsnp_by_efetch:urlList | Get_dbsnp_summary_by_ncbieutils_efetch:url |
Split_string_dbsnp_summary_by_enter:split | Extract_not_merged_rsid:dbsnpEfetchResultList |
Extract_not_merged_rsid:notMergedRsidList | Make_query_for_retrieving_genotype:rsId |
Make_query_for_retrieving_genotype:email | |
Make_query_for_retrieving_genotype:url | Get_dbsnp_genotype_frequency_by_ncbieutils_efetch:url |
Get_dbsnp_genotype_frequency_by_ncbieutils_efetch:contents | Extraction_of_genotype_and_frequency_of_each_population:inputList |
Extract_not_merged_rsid:notMergedRsidList | Extraction_of_genotype_and_frequency_of_each_population:rsIdList |
Make_query_for_searching_pubmed:email | |
Extract_not_merged_rsid:notMergedRsidList | Make_query_for_searching_pubmed:rsId |
Make_query_for_searching_pubmed:url | Get_dbsnp_pubmed_cited_by_ncbieutils_elink:url |
Extract_pubmed_id:pubmedIdList | Summarize_cited_snp:pubmedIdList |
Make_query_for_searching_pubmed:id | Summarize_cited_snp:rsId |
Summarize_cited_snp:output | Merge_output:citedPaper |
Make_output_for_DDBJ:output | Merge_output:ddbj |
Extraction_of_genotype_and_frequency_of_each_population:output | Merge_output:genotypeInfo |
Get_dbsnp_pubmed_cited_by_ncbieutils_elink:contents | Extract_pubmed_id:elinkResult |
Split_string_enter:value | Split_string_dbsnp_summary_by_enter:regex |
Get_dbsnp_summary_by_ncbieutils_efetch:contents | Split_string_dbsnp_summary_by_enter:string |
Merge_output:output | Nucleotide_frequency |
Controller | Target |
---|---|
Split_string_dbsnp_summary_by_enter | Extract_not_merged_rsid |
Extraction_of_genotype_and_frequency_of_each_population | Merge_output |
Summarize_variation_information | Make_output_for_DDBJ |
Make_output_for_DDBJ | Merge_output |
Get_dbsnp_genotype_frequency_by_ncbieutils_efetch | Extraction_of_genotype_and_frequency_of_each_population |
Skip_first_two_lines | Make_output_for_DDBJ |
Extract_pubmed_id | Summarize_cited_snp |
Summarize_cited_snp | Merge_output |
Workflow Type
Version 1 (earliest) (of 2)
Log in to add Tags
Shared with Groups (0)
None
Statistics
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment