Extract proteins from xml blast results
Created: 2010-03-19 14:14:44
The workflow extracts a list of proteins from the target genome that may be known drugs using the blast similarity results.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (2)
Titles (3)
|
fetchEnsemblSeqsAndBlast |
Extract proteins from xml blast results |
Descriptions (3)
The workflow extracts a list of proteins from the target genome that may be known drugs using the blast similarity results. |
This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from). Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.Shortcomings:The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use. |
|
Dependencies (0)
Inputs (4)
Name |
Description |
blastFile |
filepath of blast results in order to import them in
|
tfasta |
imported fasta of target proteins
|
list_of_proteins |
filepath used to save the list of proteins of interest
|
end_file_path |
filepath used to save a fasta format of the proteins of interest
|
Processors (6)
Name |
Type |
Description |
Read_Text_File |
localworker |
ScriptBufferedReader getReader (String fileUrl) throws IOException { InputStreamReader reader; try { reader = new FileReader(fileUrl); } catch (FileNotFoundException e) { // try a real URL instead URL url = new URL(fileUrl); reader = new InputStreamReader (url.openStream()); } return new BufferedReader(reader); }StringBuffer sb = new StringBuffer(4000);BufferedReader in = getReader(fileurl);String str;String lineEnding = System.getProperty("line.separator");while ((str = in.readLine()) != null) { sb.append(str); sb.append(lineEnding);}in.close();filecontents = sb.toString(); |
Extract_unique_proteins |
beanshell |
Script//import xml format of blast results String input = xml_result; ArrayList gi_lines = new ArrayList(); while (input.contains("")){ //finds start and end of Iteration_query-def tag int start_tag1 = input.indexOf("") + 20; int end_tag1 = input.indexOf(""); //get the protein identifier String output = input.substring(start_tag1, end_tag1); //removes identifier input = input.substring(end_tag1 + 22); //finds the start of Iteration_hit tag int start_tag2 = input.indexOf(""); //finds next start of Iteration_query-def tag start_tag1 = input.indexOf("") + 20; if(start_tag1 == 19) { start_tag1 = start_tag2 + 1; } //If there is a hit for the protein, then store protein info if(start_tag2 |
Tfasta_parser |
beanshell |
Script//import target fasta String input = tfasta_in; String[] target_array = input.split("\n");//import gi ids String input2 = gi_val; String[] gi_array = input2.split("\n"); String fastas = ""; String value = ""; boolean found = false;//for each value in the gi_array, compare values for each value in the target_array for(int i = 0; i")){ found = false; } if(target_array[n].contains(gi_array[i])) { value = target_array[n] + "\n"; found = true; } if(!target_array[n].contains(">") && found) { value = value + target_array[n] + "\n"; } } fastas = fastas + value; }//end for//export fasta formatcfasta_out = fastas; |
Write_Text_File |
localworker |
ScriptBufferedWriter out = new BufferedWriter(new FileWriter(outputFile));out.write(filecontents);out.close();outputFile = filecontents; |
extract_gis |
beanshell |
Script//Extracts the gi ids from the protein infoString input = gi_lines_in;String[] gi_array = input.split("\n");String ids = "";for(int i = 0; i |
Write_Text_File_2 |
localworker |
ScriptBufferedWriter out = new BufferedWriter(new FileWriter(outputFile));out.write(filecontents);out.close();outputFile = filecontents; |
Beanshells (3)
Name |
Description |
Inputs |
Outputs |
Extract_unique_proteins |
|
xml_result
|
gi_lines_val
|
Tfasta_parser |
|
gi_val
tfasta_in
|
cfasta_out
|
extract_gis |
|
gi_lines_in
|
gis
|
Datalinks (9)
Source |
Sink |
blastFile |
Read_Text_File:fileurl |
Read_Text_File:filecontents |
Extract_unique_proteins:xml_result |
Extract_unique_proteins:gi_lines_val |
Tfasta_parser:gi_val |
tfasta |
Tfasta_parser:tfasta_in |
Tfasta_parser:cfasta_out |
Write_Text_File:filecontents |
end_file_path |
Write_Text_File:outputFile |
Extract_unique_proteins:gi_lines_val |
extract_gis:gi_lines_in |
extract_gis:gis |
Write_Text_File_2:filecontents |
list_of_proteins |
Write_Text_File_2:outputFile |
Coordinations (1)
Controller |
Target |
Read_Text_File |
Extract_unique_proteins |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (2)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (0)
None
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment