BlastandParse1
Created: 2010-03-19 12:09:15
This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from).
Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.
Shortcomings:
The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.
The results are parsed using threshold values for percent identity similarity and e-value as input by the user and outputs those values above these thresholds.
Credit: Bela
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (4)
Baywatch SolutionsBela Tiwari |
Baywatch Solutions |
Bela Tiwari |
Baywatch Solutions sp|Q2FEF3|3MGH_STAA3 |P00533 25.00 72 46 2 40 103 862 933 0.18 21.9 |
Titles (2)
fetchEnsemblSeqsAndBlast |
BlastandParse1 |
Descriptions (1)
This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from). Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.Shortcomings:The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use. |
Dependencies (0)
Inputs (5)
Name |
Description |
sequenceFileName |
Provide the name, and if not in your working directory, the location of the file of fasta sequence(s) that you wish to use to search the blast database created in this workflow.
|
database_input |
The fasta format file that will be used to create the blast database
|
evalue |
e-Value for blast results
|
minP |
Minimum percent identity simularity
Minimum percent identity similarity
|
maxE |
The maximum e-Value for parsing results
|
Processors (8)
Name |
Type |
Description |
runBlastSearch |
localworker |
Scriptif (command == void || command.equals("")) { throw new RuntimeException("The 'command' port cannot be null.");}Process proc = null;Runtime rt = Runtime.getRuntime();String osName = System.getProperty("os.name");String[] cmdArray = null;String qry = "";if (osName.equals("Windows NT") || osName.equals("Windows XP") || osName.equals("Windows Vista")) { cmdArray = new String[] { "cmd.exe", "/c", command };} else if (osName.equals("Windows 95")) { cmdArray = new String[] { "command.exe", "/c", command };} else {// TODO: investigate if this will work in Linux and OSX cmdArray = new String[] { command };}qry += command + " ";// concatenate the arraysif ((args == void) || (args == null)) { args = new ArrayList();}int argSize = cmdArray.length + args.size();ArrayList appArray = new ArrayList(argSize);for (int i = 0; i < cmdArray.length; i++) { appArray.add(cmdArray[i]);}for (int i = 0; i < args.size(); i++) { appArray.add(args.get(i)); qry += args.get(i); qry += " ";}String[] applist = new String[argSize];appArray.toArray(applist);proc = rt.exec(applist);// Get the input stream and read from itInputStream in = proc.getInputStream();int c;StringBuffer sb = new StringBuffer();while ((c = in.read()) != -1) { sb.append((char) c);}in.close();result = sb.toString();theQuery = qry; |
local_create_blastdb |
localworker |
Scriptif (command == void || command.equals("")) { throw new RuntimeException("The 'command' port cannot be null.");}Process proc = null;Runtime rt = Runtime.getRuntime();String osName = System.getProperty("os.name");String[] cmdArray = null;if (osName.equals("Windows NT") || osName.equals("Windows XP")) { cmdArray = new String[] { "cmd.exe", "/c", command };} else if (osName.equals("Windows 95")) { cmdArray = new String[] { "command.exe", "/c", command };} else {// TODO: investigate if this will work in Linux and OSX cmdArray = new String[] { command };}// concatenate the arraysif ((args == void) || (args == null)) { args = new ArrayList();}int argSize = cmdArray.length + args.size();ArrayList appArray = new ArrayList(argSize);for (int i = 0; i < cmdArray.length; i++) { appArray.add(cmdArray[i]);}for (int i = 0; i < args.size(); i++) { appArray.add(args.get(i));}String[] applist = new String[argSize];appArray.toArray(applist);proc = rt.exec(applist);// Get the input stream and read from itInputStream in = proc.getInputStream();int c;StringBuffer sb = new StringBuffer();while ((c = in.read()) != -1) { sb.append((char) c);}in.close();result = sb.toString(); |
create_blastall_cmdArgs |
beanshell |
ScriptList cmdArgsList = new ArrayList();cmdArgsList.add("-p");cmdArgsList.add("blastp");cmdArgsList.add("-d");cmdArgsList.add("./BlastDB");cmdArgsList.add("-i");cmdArgsList.add(sequenceFileName);cmdArgsList.add("-e");cmdArgsList.add(evalue);cmdArgsList.add("-m");cmdArgsList.add("8"); |
create_formatdb_cmdArgs |
beanshell |
ScriptList cmdArgsList = new ArrayList();cmdArgsList.add("-i");cmdArgsList.add(dbsequences);cmdArgsList.add("-p");cmdArgsList.add("T");cmdArgsList.add("-n");cmdArgsList.add("BlastDB"); |
runBlastSearch_command_defaultValue |
stringconstant |
ValueD:\BLAST\bin\blastall |
local_create_blastdb_command_defaultValue |
stringconstant |
ValueD:\BLAST\bin\formatdb |
parse_blast_results |
beanshell |
Script// "uniprot:P02745 "// takes a string of \t separated results and \n //String minP = "30.0";//String maxE = "2.0";StringBuffer sb1 = new StringBuffer();StringBuffer sb2 = new StringBuffer(); double minPercent = Double.parseDouble(minP);double maxEvalue = Double.parseDouble(maxE);int count = 0;protein_list = new ArrayList();sb2.append("thresholds: minP=" + minP + ", maxE=" + maxE + "\n========================\n");String [] rows = blast_results.split("\n"); for(int i = 0; i < rows.length; ++i) { String [] cols = rows[i].split("\t"); if(cols != null && cols.length > 9) { String [] query = cols[0].split("[|]"); String uniProtId = cols[1].replaceAll("[|]","").trim(); String percent = cols[2].trim(); String e_val = cols[10].trim(); double max1 = 0; double max2 = 0; if( Double.parseDouble(percent) >= minPercent && Double.parseDouble(e_val) <= maxEvalue ) { //sb1.append("uniprot:" + uniProtId + " "); protein_list.add(query[1]); sb2.append(">>> query id=" + query[1]); sb2.append(", name=" + query[2]); sb2.append(", hit uniProt id=" + uniProtId); sb2.append(", % identity=" + percent); sb2.append(", e value =" + e_val + "\n"); } } } HashSet h = new HashSet(protein_list); protein_list.clear(); protein_list.addAll(h);// "uniprot:P00734 uniprot:P00737"; // probably on works if both have same pathway id// sb1.toString(); records = sb2.toString(); |
Merge_String_List_to_a_String |
localworker |
ScriptString seperatorString = "\n";StringBuffer sb = new StringBuffer();for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); }}concatenated = sb.toString(); |
Beanshells (3)
Name |
Description |
Inputs |
Outputs |
create_blastall_cmdArgs |
|
sequenceFileName
evalue
|
cmdArgsList
|
create_formatdb_cmdArgs |
|
dbsequences
|
cmdArgsList
|
parse_blast_results |
|
blast_results
minP
maxE
|
protein_list
records
|
Outputs (4)
Name |
Description |
Result |
Resulting blast report in tabular output before parsing
|
queryEntered |
The query that was entered into blastall
|
proteinList |
Protein List containing uniprot accession numbers generated from parse_blast_results
|
Blast_hits |
A List of blast results containing the usual information such as query id, hit id, percent identity etc
|
Datalinks (15)
Source |
Sink |
create_blastall_cmdArgs:cmdArgsList |
runBlastSearch:args |
runBlastSearch_command_defaultValue:value |
runBlastSearch:command |
local_create_blastdb_command_defaultValue:value |
local_create_blastdb:command |
create_formatdb_cmdArgs:cmdArgsList |
local_create_blastdb:args |
sequenceFileName |
create_blastall_cmdArgs:sequenceFileName |
evalue |
create_blastall_cmdArgs:evalue |
database_input |
create_formatdb_cmdArgs:dbsequences |
runBlastSearch:result |
parse_blast_results:blast_results |
maxE |
parse_blast_results:maxE |
minP |
parse_blast_results:minP |
parse_blast_results:protein_list |
Merge_String_List_to_a_String:stringlist |
runBlastSearch:result |
Result |
runBlastSearch:theQuery |
queryEntered |
Merge_String_List_to_a_String:concatenated |
proteinList |
parse_blast_results:records |
Blast_hits |
Coordinations (1)
Controller |
Target |
local_create_blastdb |
runBlastSearch |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (1)
(People/Groups)
Attributions (1)
(Workflows/Files)
[ edit ]
Shared with Groups (1)
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment