Extract proteins from xml blast results

Created: 2010-03-19 14:14:44

Download Workflow

The workflow extracts a list of proteins from the target genome that may be known drugs using the blast similarity results.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1185/download?version=1
[ More Info Expand ]

Workflow Components

Authors (2)

Titles (3)

Descriptions (3)

Dependencies (0)

Inputs (4)

Name	Description
blastFile	filepath of blast results in order to import them in
tfasta	imported fasta of target proteins
list_of_proteins	filepath used to save the list of proteins of interest
end_file_path	filepath used to save a fasta format of the proteins of interest

Processors (6)

Name	Type	Description
Read_Text_File	localworker	Script BufferedReader getReader (String fileUrl) throws IOException { InputStreamReader reader; try { reader = new FileReader(fileUrl); } catch (FileNotFoundException e) { // try a real URL instead URL url = new URL(fileUrl); reader = new InputStreamReader (url.openStream()); } return new BufferedReader(reader); }StringBuffer sb = new StringBuffer(4000);BufferedReader in = getReader(fileurl);String str;String lineEnding = System.getProperty("line.separator");while ((str = in.readLine()) != null) { sb.append(str); sb.append(lineEnding);}in.close();filecontents = sb.toString();
Extract_unique_proteins	beanshell	Script //import xml format of blast results String input = xml_result; ArrayList gi_lines = new ArrayList(); while (input.contains("")){ //finds start and end of Iteration_query-def tag int start_tag1 = input.indexOf("") + 20; int end_tag1 = input.indexOf(""); //get the protein identifier String output = input.substring(start_tag1, end_tag1); //removes identifier input = input.substring(end_tag1 + 22); //finds the start of Iteration_hit tag int start_tag2 = input.indexOf(""); //finds next start of Iteration_query-def tag start_tag1 = input.indexOf("") + 20; if(start_tag1 == 19) { start_tag1 = start_tag2 + 1; } //If there is a hit for the protein, then store protein info if(start_tag2
Tfasta_parser	beanshell	Script //import target fasta String input = tfasta_in; String[] target_array = input.split("\n");//import gi ids String input2 = gi_val; String[] gi_array = input2.split("\n"); String fastas = ""; String value = ""; boolean found = false;//for each value in the gi_array, compare values for each value in the target_array for(int i = 0; i")){ found = false; } if(target_array[n].contains(gi_array[i])) { value = target_array[n] + "\n"; found = true; } if(!target_array[n].contains(">") && found) { value = value + target_array[n] + "\n"; } } fastas = fastas + value; }//end for//export fasta formatcfasta_out = fastas;
Write_Text_File	localworker	Script BufferedWriter out = new BufferedWriter(new FileWriter(outputFile));out.write(filecontents);out.close();outputFile = filecontents;
extract_gis	beanshell	Script //Extracts the gi ids from the protein infoString input = gi_lines_in;String[] gi_array = input.split("\n");String ids = "";for(int i = 0; i
Write_Text_File_2	localworker	Script BufferedWriter out = new BufferedWriter(new FileWriter(outputFile));out.write(filecontents);out.close();outputFile = filecontents;

Beanshells (3)

Name	Inputs	Outputs
Extract_unique_proteins	xml_result	gi_lines_val
Tfasta_parser	gi_val tfasta_in	cfasta_out
extract_gis	gi_lines_in	gis

Outputs (0)

Datalinks (9)

Source	Sink
blastFile	Read_Text_File:fileurl
Read_Text_File:filecontents	Extract_unique_proteins:xml_result
Extract_unique_proteins:gi_lines_val	Tfasta_parser:gi_val
tfasta	Tfasta_parser:tfasta_in
Tfasta_parser:cfasta_out	Write_Text_File:filecontents
end_file_path	Write_Text_File:outputFile
Extract_unique_proteins:gi_lines_val	extract_gis:gi_lines_in
extract_gis:gis	Write_Text_File_2:filecontents
list_of_proteins	Write_Text_File_2:outputFile