BlastandParse2
Created: 2010-03-19 12:20:29
This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from).
Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.
Shortcomings:
The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.
The results of the blastall are parsed to give those that are above the max e-value and minimum percent identity as indicated by the user and an output of uniprot accession numbers is created that can be searched against KEGG.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (2)
Baywatch Solutions |
Bela Tiwari |
Titles (2)
BlastandParse2 |
fetchEnsemblSeqsAndBlast |
Descriptions (2)
This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from). Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.Shortcomings:The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.The results of the blastall are parsed to give those that are above the max e-value and minimum percent identity as indicated by the user and an output of uniprot accession numbers is created that can be searched against KEGG. |
This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from). Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.Shortcomings:The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use. |
Dependencies (0)
Inputs (5)
Name |
Description |
sequenceFileName |
Provide the name, and if not in your working directory, the location of the file of fasta sequence(s) that you wish to use to search the blast database created in this workflow.
|
database_input |
the fasta filepath to be used to format a database to blastall against
|
evalue |
evalue to be used for blastall
|
minP |
minimum percent identity to parse the blast results.
minimum percent identity to parse the blast results.
|
maxE |
maximum expectation value to parse the blast results with
|
Processors (8)
Name |
Type |
Description |
runBlastSearch |
localworker |
Scriptif (command == void || command.equals("")) { throw new RuntimeException("The 'command' port cannot be null.");}Process proc = null;Runtime rt = Runtime.getRuntime();String osName = System.getProperty("os.name");String[] cmdArray = null;String qry = "";if (osName.equals("Windows NT") || osName.equals("Windows XP") || osName.equals("Windows Vista")) { cmdArray = new String[] { "cmd.exe", "/c", command };} else if (osName.equals("Windows 95")) { cmdArray = new String[] { "command.exe", "/c", command };} else {// TODO: investigate if this will work in Linux and OSX cmdArray = new String[] { command };}qry += command + " ";// concatenate the arraysif ((args == void) || (args == null)) { args = new ArrayList();}int argSize = cmdArray.length + args.size();ArrayList appArray = new ArrayList(argSize);for (int i = 0; i < cmdArray.length; i++) { appArray.add(cmdArray[i]);}for (int i = 0; i < args.size(); i++) { appArray.add(args.get(i)); qry += args.get(i); qry += " ";}String[] applist = new String[argSize];appArray.toArray(applist);proc = rt.exec(applist);// Get the input stream and read from itInputStream in = proc.getInputStream();int c;StringBuffer sb = new StringBuffer();while ((c = in.read()) != -1) { sb.append((char) c);}in.close();result = sb.toString();theQuery = qry; |
local_create_blastdb |
localworker |
Scriptif (command == void || command.equals("")) { throw new RuntimeException("The 'command' port cannot be null.");}Process proc = null;Runtime rt = Runtime.getRuntime();String osName = System.getProperty("os.name");String[] cmdArray = null;if (osName.equals("Windows NT") || osName.equals("Windows XP")) { cmdArray = new String[] { "cmd.exe", "/c", command };} else if (osName.equals("Windows 95")) { cmdArray = new String[] { "command.exe", "/c", command };} else {// TODO: investigate if this will work in Linux and OSX cmdArray = new String[] { command };}// concatenate the arraysif ((args == void) || (args == null)) { args = new ArrayList();}int argSize = cmdArray.length + args.size();ArrayList appArray = new ArrayList(argSize);for (int i = 0; i < cmdArray.length; i++) { appArray.add(cmdArray[i]);}for (int i = 0; i < args.size(); i++) { appArray.add(args.get(i));}String[] applist = new String[argSize];appArray.toArray(applist);proc = rt.exec(applist);// Get the input stream and read from itInputStream in = proc.getInputStream();int c;StringBuffer sb = new StringBuffer();while ((c = in.read()) != -1) { sb.append((char) c);}in.close();result = sb.toString(); |
create_blastall_cmdArgs |
beanshell |
ScriptList cmdArgsList = new ArrayList();cmdArgsList.add("-p");cmdArgsList.add("blastp");cmdArgsList.add("-d");cmdArgsList.add("./BlastDB");cmdArgsList.add("-i");cmdArgsList.add(sequenceFileName);cmdArgsList.add("-e");cmdArgsList.add(evalue);cmdArgsList.add("-m");cmdArgsList.add("8"); |
create_formatdb_cmdArgs |
beanshell |
ScriptList cmdArgsList = new ArrayList();cmdArgsList.add("-i");cmdArgsList.add(dbsequences);cmdArgsList.add("-p");cmdArgsList.add("T");cmdArgsList.add("-n");cmdArgsList.add("BlastDB"); |
runBlastSearch_command_defaultValue |
stringconstant |
ValueD:\BLAST\bin\blastall |
local_create_blastdb_command_defaultValue |
stringconstant |
ValueD:\BLAST\bin\formatdb |
parse_blast_results |
beanshell |
Script// "uniprot:P02745 "// takes a string of \t separated results and \n //String minP = "30.0";//String maxE = "2.0";StringBuffer sb1 = new StringBuffer();StringBuffer sb2 = new StringBuffer(); double minPercent = Double.parseDouble(minP);double maxEvalue = Double.parseDouble(maxE);int count = 0;protein_list = new ArrayList();sb2.append("thresholds: minP=" + minP + ", maxE=" + maxE + "\n========================\n");String [] rows = blast_results.split("\n"); for(int i = 0; i < rows.length; ++i) { String [] cols = rows[i].split("\t"); if(cols != null && cols.length > 9) { String [] query = cols[0].split("[|]"); String uniProtId = cols[1].replaceAll("[|]","").trim(); String percent = cols[2].trim(); String e_val = cols[10].trim(); double max1 = 0; double max2 = 0; if( Double.parseDouble(percent) >= minPercent && Double.parseDouble(e_val) <= maxEvalue ) { //sb1.append("uniprot:" + uniProtId + " "); protein_list.add("uniprot:" + query[1]); sb2.append(">>> query id=" + query[1]); sb2.append(", name=" + query[2]); sb2.append(", hit uniProt id=" + uniProtId); sb2.append(", % identity=" + percent); sb2.append(", e value =" + e_val + "\n"); } } } HashSet h = new HashSet(protein_list); protein_list.clear(); protein_list.addAll(h);// "uniprot:P00734 uniprot:P00737"; // probably on works if both have same pathway id// sb1.toString(); records = sb2.toString(); |
Merge_String_List_to_a_String |
localworker |
ScriptString seperatorString = "\n";StringBuffer sb = new StringBuffer();for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); }}concatenated = sb.toString(); |
Beanshells (3)
Name |
Description |
Inputs |
Outputs |
create_blastall_cmdArgs |
|
sequenceFileName
evalue
|
cmdArgsList
|
create_formatdb_cmdArgs |
|
dbsequences
|
cmdArgsList
|
parse_blast_results |
|
blast_results
minP
maxE
|
protein_list
records
|
Outputs (4)
Name |
Description |
Result |
resulting blast report in tabular format
|
queryEntered |
A List of blast results containing the usual information such as query id, hit id, percent identity etc
|
proteinList |
list of uniprot accession numbers in a format that can be searched against KEGG
|
Blast_hits |
A List of blast results containing the usual information such as query id, hit id, percent identity etc
|
Datalinks (15)
Source |
Sink |
create_blastall_cmdArgs:cmdArgsList |
runBlastSearch:args |
runBlastSearch_command_defaultValue:value |
runBlastSearch:command |
local_create_blastdb_command_defaultValue:value |
local_create_blastdb:command |
create_formatdb_cmdArgs:cmdArgsList |
local_create_blastdb:args |
sequenceFileName |
create_blastall_cmdArgs:sequenceFileName |
evalue |
create_blastall_cmdArgs:evalue |
database_input |
create_formatdb_cmdArgs:dbsequences |
runBlastSearch:result |
parse_blast_results:blast_results |
maxE |
parse_blast_results:maxE |
minP |
parse_blast_results:minP |
parse_blast_results:protein_list |
Merge_String_List_to_a_String:stringlist |
runBlastSearch:result |
Result |
runBlastSearch:theQuery |
queryEntered |
Merge_String_List_to_a_String:concatenated |
proteinList |
parse_blast_results:records |
Blast_hits |
Coordinations (1)
Controller |
Target |
local_create_blastdb |
runBlastSearch |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (1)
(People/Groups)
Attributions (1)
(Workflows/Files)
[ edit ]
Shared with Groups (1)
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment