Extract proteins using a gi - output as fasta file
Created: 2010-03-19 13:54:54
The workflow uses the gi id to retrieve a xml format of the genbank entry. Using a beanscript, the workflow then parses the required data for the creation of the protein fasta file.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (1)
Titles (1)
Extract proteins using a gi - output as fasta file |
Descriptions (1)
The workflow uses the gi id to retrieve a xml format of the genbank entry. Using a beanscript, the workflow then parses the required data for the creation of the protein fasta file. |
Dependencies (0)
Inputs (2)
Name |
Description |
fasta_file_path |
This is where the workflow will save the fasta file
|
id |
The workflow uses the gi no to retrieve the xml genbank entry
|
Processors (3)
Name |
Type |
Description |
Extraction_of_info |
beanshell |
Script//Note that this bean script disregards all geneids and their products
//Import the data from the input port as a String
String input = xml_result;
//Initialize variables
ArrayList gis = new ArrayList();
ArrayList ids = new ArrayList();
ArrayList products = new ArrayList();
ArrayList seqns = new ArrayList();
String fastas = "";
String protein_prod = "";
//Loop that continues to execute while there exists a GBQualifier_name tag
while (input.contains("")){
//Finds the value associated with the qualifier name
int start = input.indexOf("") + 18;
int end = input.indexOf("");
String output = input.substring(start, end);
input = input.substring(end + 19);
//Stores the protein product
if(output.equals("product")){
start = input.indexOf("") + 19;
end = input.indexOf("");
output = input.substring(start, end);
input = input.substring(end + 20);
protein_prod = output;
}
//Stores the protein id
if(output.equals("protein_id")){
start = input.indexOf("") + 19;
end = input.indexOf("");
output = input.substring(start, end);
input = input.substring(end + 20);
ids.add(output);
}
//Stores the gi number
if(output.equals("db_xref")){
start = input.indexOf("") + 19;
end = input.indexOf("");
output = input.substring(start, end);
input = input.substring(end + 20);
String temp = output.substring(0,3);
if(temp.equals("GI:")) {
String temp2 = output.substring(3);
gis.add(temp2);
products.add(protein_prod);
}
}
//Stores the AA sequences
if(output.equals("translation")){
start = input.indexOf("") + 19;
end = input.indexOf("");
output = input.substring(start, end);
input = input.substring(end + 20);
seqns.add(output);
}
}
//Coverts the protein data into fasta format
for(int n = 0; n < gis.size(); n++) {
String value = ">gi|" + gis.get(n) + "|ref|" + ids.get(n) + "|" + products.get(n);
boolean flag1 = true;
int index = 0;
while(flag1) {
try {
seqns.get(n).toString().substring(index , index+70);
value = value + "\n" + seqns.get(n).toString().substring(index, index+70);
index = index+70;
}
catch(Exception e){
String temp = seqns.get(n).toString().substring(index);
if(temp.length()%70!=0)
{
value = value + "\n" + seqns.get(n).toString().substring(index);
}
flag1 = false;
}
}
fastas = fastas + value + "\n";
}
//Outputs the fasta data to the output port
fastas_out = fastas; |
Write_Text_File |
localworker |
ScriptBufferedWriter out = new BufferedWriter(new FileWriter(outputFile));
out.write(filecontents);
out.close();
outputFile = filecontents;
|
Get_Nucleotide_GBSeq_XML |
localworker |
Scriptif ((id == void) || (id == null) || id.equals("")) {
throw new RunTimeException("port id must have a non-empty value");
}
URL url = new URL ("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?rettype=gb&db=nucleotide&retmode=xml&id=" + id);
BufferedReader reader = new BufferedReader (new InputStreamReader(url.openStream()));
StringWriter writer = new StringWriter();
char[] buffer = new char[1024];
while (true) {
int r = reader.read(buffer);
if (r <= 0) {
break;
}
writer.write(buffer, 0, r);
}
outputText = writer.toString();
|
Beanshells (1)
Name |
Description |
Inputs |
Outputs |
Extraction_of_info |
|
xml_result
|
fastas_out
|
Outputs (1)
Name |
Description |
file_output |
protein output for specified gi in fasta format
|
Datalinks (5)
Source |
Sink |
Get_Nucleotide_GBSeq_XML:outputText |
Extraction_of_info:xml_result |
fasta_file_path |
Write_Text_File:outputFile |
Extraction_of_info:fastas_out |
Write_Text_File:filecontents |
id |
Get_Nucleotide_GBSeq_XML:id |
Write_Text_File:outputFile |
file_output |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (2)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (0)
None
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment