fetchEnsemblSeqsAndBlast
Created: 2008-04-18 11:53:19
Last updated: 2008-04-18 11:58:30
This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from).
Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.
Shortcomings:
The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.
All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Inputs (1)
Name |
Description |
sequenceFileName |
Provide the name, and if not in your working directory, the location of the file of fasta sequence(s) that you wish to use to search the blast database created in this workflow. |
Processors (6)
Name |
Type |
Description |
local_create_blastdb |
local |
|
runBlastSearch |
local |
This runs the blastall command (NCBI blast) on your local machine. This means you need to have blastall installed. The location provided is /usr/local/bin/blastall, so if your executable is not in that location, you will need to edit this. If you are working on a Bio-Linux machine, this should work for you without change.
Note that the blast results are written to your hard disk in the working directory by default. Edit the location indicated in the create_blastall_cmdArgs beanshell in the line after the one adding "-o" if you are not happy with this. |
create_blastall_cmdArgs |
beanshell |
This beanshell creates an array of strings (a list of plain text), with each element containing one element of the arguments for the blastall command line. For the familiar blastall command line arguments to be put together in such a way that this the runBlastSearch processor will understand them, they need to have the flag added as a separate element in the list, right before its argument.
Note the defaults are probably not what you want - you need to edit them!
You can also add additional arguments by configuring the beanshell. For example, to indicate that you wish to limit the hits reported to just those with e-values below 0.1, you would add the following text to the bottom of the beanshell:
cmdArgsList.add("-e");
cmdArgsList.add("0.1"); |
create_formatdb_cmdArgs |
beanshell |
This beanshell will not be run until after the fasta file is written from the Write_Fasta_File processor.
This beanshell creates an array of strings (a list of plain text), with each element containing one element of the arguments for the formatdb command line. Note that you can change the text in the beanshell in the element in the line after the one containing "-n" to set the name of the blast database you want.
If you want to add any other command line arguments, follow the system in the beanshell. E.g. add the flag, e.g. "-x" in one line, and its argument "somethingOrOther" in another line. |
fetch_seqs_from_ensembl |
biomart |
By default, this is set up to collect sequences for Danio rerio genes (ZFISH7). Of course, you can configure it for whatever you want to get from ensembl. |
Write_Fasta_File |
local |
This processor writes the fasta sequences retrieved by the fetch_seqs_from_ensembl processor to a file on the hard drive (in the working directory). This is necessary for running formatdb on the command line. |
Beanshells (2)
Name |
Description |
Inputs |
Outputs |
create_blastall_cmdArgs |
This beanshell creates an array of strings (a list of plain text), with each element containing one element of the arguments for the blastall command line. For the familiar blastall command line arguments to be put together in such a way that this the runBlastSearch processor will understand them, they need to have the flag added as a separate element in the list, right before its argument.
Note the defaults are probably not what you want - you need to edit them!
You can also add additional arguments by configuring the beanshell. For example, to indicate that you wish to limit the hits reported to just those with e-values below 0.1, you would add the following text to the bottom of the beanshell:
cmdArgsList.add("-e");
cmdArgsList.add("0.1"); |
sequenceFileName
|
cmdArgsList
|
create_formatdb_cmdArgs |
This beanshell will not be run until after the fasta file is written from the Write_Fasta_File processor.
This beanshell creates an array of strings (a list of plain text), with each element containing one element of the arguments for the formatdb command line. Note that you can change the text in the beanshell in the element in the line after the one containing "-n" to set the name of the blast database you want.
If you want to add any other command line arguments, follow the system in the beanshell. E.g. add the flag, e.g. "-x" in one line, and its argument "somethingOrOther" in another line. |
|
cmdArgsList
|
Outputs (1)
Name |
Description |
ensemblOutputFastaFile |
|
Links (5)
Source |
Sink |
create_blastall_cmdArgs:cmdArgsList |
runBlastSearch:args |
create_formatdb_cmdArgs:cmdArgsList |
local_create_blastdb:args |
fetch_seqs_from_ensembl:drerio_gene_ensembl |
Write_Fasta_File:filecontents |
sequenceFileName |
create_blastall_cmdArgs:sequenceFileName |
fetch_seqs_from_ensembl:drerio_gene_ensembl |
ensemblOutputFastaFile |
Coordinations (2)
Controller |
Target |
Write_Fasta_File |
create_formatdb_cmdArgs |
local_create_blastdb |
runBlastSearch |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (1)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (1)
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (4)
(Workflows/Files)
Favourited By (2)
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment