Extract unique proteins from blast results

Created: 2011-03-24 19:49:43 Last updated: 2011-04-01 12:26:27

Download Workflow

The workflow parses uses the tab-delimited BLAST results to determine the unique proteins found in the target genome that have no similarity to the source genome.

The workflow parses uses the blast results to determine the unique proteins found in the target genome that have no similairty to the source genome. Using these unique protein ids, and the original target protein fasta file, a fasta file of unique proteins is created.

This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from). Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you. Shortcomings: The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created. All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use.

Workflow outputs a list of proteins encoded by the target genomes that do not have sequences similarity to those encoded by the source genome

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/1981/download?version=4
[ More Info Expand ]

Workflow Components

Authors (3)

Titles (2)

Descriptions (5)

Dependencies (0)

Inputs (2)

Name	Description
blastFile	The URL or file path location of the tab-delimited format of the Blast results. Type the path as a string (not inputs the xml format of the blast results The URL or file path location of the tab-delimited format of the Blast results. The URL or file path location of the tab-delimited format of the Blast results. Type the path as a string (not a file location).
tfasta	Fasta file of the target proteins to extract the sequences. Fasta file of the target proteins to extract the sequences. Add as file location. fasta file of the target proteins to extract the sequences

Processors (3)

Name	Type	Description
Read_Text_File	localworker	Script BufferedReader getReader (String fileUrl) throws IOException { InputStreamReader reader; try { reader = new FileReader(fileUrl); } catch (FileNotFoundException e) { // try a real URL instead URL url = new URL(fileUrl); reader = new InputStreamReader (url.openStream()); } return new BufferedReader(reader); } StringBuffer sb = new StringBuffer(4000); BufferedReader in = getReader(fileurl); String str; String lineEnding = System.getProperty("line.separator"); while ((str = in.readLine()) != null) { sb.append(str); sb.append(lineEnding); } in.close(); filecontents = sb.toString();
extract_blast_ids	beanshell	Script //import tab-delimited blast results //split the input on new lines String[] input = xml_result.split(System.getProperty("line.separator")); ArrayList gi_lines = new ArrayList(); //for each line in the BLAST file for (int i=0; i
find_unique_proteins	beanshell	Script //import the blast target ids String[] gi_array = gi_val.split("\n"); //import the target fasta String [] target_array = tfasta_in.split("\n"); //make necessary variables List targets = new ArrayList(); List blast_gis = new ArrayList(); String fastas = ""; String value = ""; //iterate through the entries in the target fasta and add all the ids //to a List without the starting ">" for (int i =0; i")) { targets.add(line.substring(1,line.length())); } } //iterate through the blast target ids and add them to a List for (int i=0; i

Name

Type

Description

Read_Text_File

localworker

Script

BufferedReader getReader (String fileUrl) throws IOException {
		InputStreamReader reader;
		try {
			reader = new FileReader(fileUrl);
		}
		catch (FileNotFoundException e) {
			// try a real URL instead
			URL url = new URL(fileUrl);
			reader = new InputStreamReader (url.openStream());
		}
		return new BufferedReader(reader);
	}



StringBuffer sb = new StringBuffer(4000);

BufferedReader in = getReader(fileurl);
String str;
String lineEnding = System.getProperty("line.separator");

while ((str = in.readLine()) != null) {
	sb.append(str);
	sb.append(lineEnding);
}
in.close();
filecontents = sb.toString();

extract_blast_ids

beanshell

Script

//import tab-delimited blast results
	
	//split the input on new lines
	String[] input = xml_result.split(System.getProperty("line.separator"));
        ArrayList gi_lines = new ArrayList();

	//for each line in the BLAST file
	for (int i=0; i

find_unique_proteins

beanshell

Script

//import the blast target ids
String[] gi_array = gi_val.split("\n");
//import the target fasta
String [] target_array = tfasta_in.split("\n");

//make necessary variables
List targets = new ArrayList();
List blast_gis = new ArrayList();
String fastas = "";
String value = "";

//iterate through the entries in the target fasta and add all the ids
//to a List without the starting ">"
for (int i =0; i"))
	{
		targets.add(line.substring(1,line.length()));
	}
}

//iterate through the blast target ids and add them to a List
for (int i=0; i

Beanshells (2)

Name	Description	Inputs	Outputs
extract_blast_ids		xml_result	gi_lines_val
find_unique_proteins		gi_val tfasta_in	cfasta_out

Outputs (2)

Name	Description
unique_identifiers	Unique identifiers that appear in the FASTA file but not in the BLAST file. Identifiers are separated by new lines.
blasted_identifiers	Identifiers of the target genome from the BLAST results. These identifiers are those that are similar to the source proteome.

Datalinks (6)

Source	Sink
blastFile	Read_Text_File:fileurl
Read_Text_File:filecontents	extract_blast_ids:xml_result
extract_blast_ids:gi_lines_val	find_unique_proteins:gi_val
tfasta	find_unique_proteins:tfasta_in
find_unique_proteins:cfasta_out	unique_identifiers
extract_blast_ids:gi_lines_val	blasted_identifiers

Coordinations (1)

Controller	Target
Read_Text_File	extract_blast_ids

Information Workflow Type

Taverna 2

Information Uploader

Morgan Taschuk

Information License

All versions of this Workflow are licensed under:

Information Version 4 (latest) (of 4)

Information Credits (2)

(People/Groups)

Information Attributions (1)

(Workflows/Files)

Parse unique proteins from Blast file

Information Tags (8)

Uploader tags

Log in to add Tags

Information Shared with Groups (1)

A Team

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

2601 viewings

1554 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

Extract unique proteins from blast results

Created by Morgan Taschuk on Thursday 24 March 2011 19:49:43 (UTC)
Find Unique Proteins from BLAST and FASTA

Created by Morgan Taschuk on Friday 25 March 2011 20:03:43 (UTC)

Last edited by Morgan Taschuk on Tuesday 29 March 2011 16:44:51 (UTC)

Revision comment:

This one actually does what it says on the box!
Extract unique proteins from blast results

Created by Morgan Taschuk on Tuesday 29 March 2011 18:07:05 (UTC)

Revision comment:

Works on query instead of subject!

Splits lines properly!
Extract unique proteins from blast results

Created by Morgan Taschuk on Friday 01 April 2011 12:26:23 (UTC)

Revision comment:

Final version

Reviews (0)

No reviews yet

Be the first to review!

Comments (0)

View Timeline

No comments yet

Log in to make a comment

Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.