Parse unique proteins from Blast file

Created: 2010-03-19 14:07:24 Last updated: 2010-03-19 14:09:22

Download Workflow

The workflow parses uses the blast results to determine the unique proteins found in the target genome that have no similairty to the source genome. Using these unique protein ids, and the original target protein fasta file, a fasta file of unique proteins is created.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1184/download?version=1
[ More Info Expand ]

Workflow Components

Authors (2)

Titles (2)

Descriptions (4)

Dependencies (0)

Inputs (3)

Name	Description
blastFile	inputs the xml format of the blast results
tfasta	fasta file of the target proteins to extract the sequences
cfasta_file_path	Where the workflow will save the unique proteins to the specified filepath

Processors (4)

Name	Type	Description
Read_Text_File	localworker	Script BufferedReader getReader (String fileUrl) throws IOException { InputStreamReader reader; try { reader = new FileReader(fileUrl); } catch (FileNotFoundException e) { // try a real URL instead URL url = new URL(fileUrl); reader = new InputStreamReader (url.openStream()); } return new BufferedReader(reader); } StringBuffer sb = new StringBuffer(4000); BufferedReader in = getReader(fileurl); String str; String lineEnding = System.getProperty("line.separator"); while ((str = in.readLine()) != null) { sb.append(str); sb.append(lineEnding); } in.close(); filecontents = sb.toString();
Extract_unique_proteins	beanshell	Script //import xml blast results String input = xml_result; ArrayList gi_lines = new ArrayList(); while (input.contains("")){ //finds start and end of Iteration_query-def tag int start_tag1 = input.indexOf("") + 20; int end_tag1 = input.indexOf(""); //get the protein identifier String output = input.substring(start_tag1, end_tag1); //removes identifier input = input.substring(end_tag1 + 22); //finds the start of Iteration_message tag int start_tag2 = input.indexOf(""); //finds next start of Iteration_query-def tag start_tag1 = input.indexOf("") + 20; if(start_tag1 == 19) { start_tag1 = start_tag2 + 1; } //if there are no hits for that protein, then store the protein if(start_tag2
Tfasta_parser	beanshell	Script //import the target fasta String input = tfasta_in; String[] target_array = input.split("\n"); //import gi ids String input2 = gi_val; String[] gi_array = input2.split("\n"); String fastas = ""; String value = ""; boolean found = false; //for each value in the gi_array, compare it to each value in the target_array for(int i = 0; i")){ found = false; } if(target_array[n].contains(gi_array[i])) { value = target_array[n] + "\n"; found = true; } if(!target_array[n].contains(">") && found) { value = value + target_array[n] + "\n"; } } fastas = fastas + value; }//end for //export proteins in fasta format cfasta_out = fastas;
Write_Text_File	localworker	Script BufferedWriter out = new BufferedWriter(new FileWriter(outputFile)); out.write(filecontents); out.close(); outputFile = filecontents;

Beanshells (2)

Name	Description	Inputs	Outputs
Extract_unique_proteins		xml_result	gi_lines_val
Tfasta_parser		gi_val tfasta_in	cfasta_out

Outputs (0)

Datalinks (6)

Source	Sink
blastFile	Read_Text_File:fileurl
Read_Text_File:filecontents	Extract_unique_proteins:xml_result
Extract_unique_proteins:gi_lines_val	Tfasta_parser:gi_val
tfasta	Tfasta_parser:tfasta_in
cfasta_file_path	Write_Text_File:outputFile
Tfasta_parser:cfasta_out	Write_Text_File:filecontents

Coordinations (1)

Controller	Target
Read_Text_File	Extract_unique_proteins

Information Workflow Type

Taverna 2

Information Uploader

Ian Laycock

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (2)

(People/Groups)

Information Attributions (0)

(Workflows/Files)

None

Information Tags (1)

Uploader tags

protein

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (1)

(Workflows/Files)

Extract unique proteins from blast results

Information Favourited By (0)

No one

Information Statistics

2367 viewings

1935 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

Parse unique proteins from Blast file

Created by Ian Laycock on Friday 19 March 2010 14:07:24 (UTC)

Last edited by Ian Laycock on Friday 19 March 2010 14:09:23 (UTC)