This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from).
Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you.
Shortcomings:
The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created.
All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use.
/usr/local/bin/formatdb
net.sourceforge.taverna.scuflworkers.io.LocalCommand
This runs the blastall command (NCBI blast) on your local machine. This means you need to have blastall installed. The location provided is /usr/local/bin/blastall, so if your executable is not in that location, you will need to edit this. If you are working on a Bio-Linux machine, this should work for you without change.
Note that the blast results are written to your hard disk in the working directory by default. Edit the location indicated in the create_blastall_cmdArgs beanshell in the line after the one adding "-o" if you are not happy with this.
/usr/local/bin/blastall
net.sourceforge.taverna.scuflworkers.io.LocalCommand
This beanshell creates an array of strings (a list of plain text), with each element containing one element of the arguments for the blastall command line. For the familiar blastall command line arguments to be put together in such a way that this the runBlastSearch processor will understand them, they need to have the flag added as a separate element in the list, right before its argument.
Note the defaults are probably not what you want - you need to edit them!
You can also add additional arguments by configuring the beanshell. For example, to indicate that you wish to limit the hits reported to just those with e-values below 0.1, you would add the following text to the bottom of the beanshell:
cmdArgsList.add("-e");
cmdArgsList.add("0.1");
List cmdArgsList = new ArrayList();
cmdArgsList.add("-p");
cmdArgsList.add("blastn");
cmdArgsList.add("-d");
cmdArgsList.add("./ensemblBlastDB");
cmdArgsList.add("-i");
cmdArgsList.add(sequenceFileName);
cmdArgsList.add("-o");
cmdArgsList.add("blast.out");
sequenceFileName
cmdArgsList
This beanshell will not be run until after the fasta file is written from the Write_Fasta_File processor.
This beanshell creates an array of strings (a list of plain text), with each element containing one element of the arguments for the formatdb command line. Note that you can change the text in the beanshell in the element in the line after the one containing "-n" to set the name of the blast database you want.
If you want to add any other command line arguments, follow the system in the beanshell. E.g. add the flag, e.g. "-x" in one line, and its argument "somethingOrOther" in another line.
List cmdArgsList = new ArrayList();
cmdArgsList.add("-i");
cmdArgsList.add("./ensemblFastaFile");
cmdArgsList.add("-p");
cmdArgsList.add("F");
cmdArgsList.add("-n");
cmdArgsList.add("ensemblBlastDB");
cmdArgsList
By default, this is set up to collect sequences for Danio rerio genes (ZFISH7). Of course, you can configure it for whatever you want to get from ensembl.
This processor writes the fasta sequences retrieved by the fetch_seqs_from_ensembl processor to a file on the hard drive (in the working directory). This is necessary for running formatdb on the command line.
ensemblFastaFile
net.sourceforge.taverna.scuflworkers.io.TextFileWriter
Provide the name, and if not in your working directory, the location of the file of fasta sequence(s) that you wish to use to search the blast database created in this workflow.
Completed
Write_Fasta_File
create_formatdb_cmdArgs
Scheduled
Running
Completed
local_create_blastdb
runBlastSearch
Scheduled
Running