tblastx non-redundant alignment
This workflow carries out alignments using TCoffee and ClustalW2 for a set of non-redundant proteins where the starting point is a particular genomic coding sequence representing only one member of the gene family in a given species.
For the BioExtract Server implementation, the necessary steps for accomplishing this task involve:
1. Selecting the NCBI tblastx tool and providing the accession number of the known nucleotide sequence record as input.
2. The output from this tool, a BLAST report along with a set of records representing similar sequences, is parsed using a formatting template to produce an initial extract (a set of matching nucleotide sequences).
3. The resulting data extract is saved
4. The resulting data extract is used as input into Vmatch (see http://www.vmatch.de/) to remove duplicate sequences.
5. The “fetchTranslation” tool is invoked. This tool is defined to use the current nucleotide sequence extract as input (in GenBank format) and returns the protein translations from the GenBank-annotated coding sequence (CDS) regions (in FASTA format).
6. The ClustalW tool is selected to create the multiple sequence alignment with the input specified as coming from the previously executed tool (i.e., the extracted protein sequences) and to define and draw a dendrogram that represents how the sequences are related.
7. The TCoffee tool is selected to create the multiple sequence alignment with the input specified as coming from the previously executed tool (i.e., the extracted protein sequences) and to define and draw a dendrogram that represents how the sequences are related.
Preview
Run
There was a problem when determining the run options of this workflow.
Workflow Components
Not available
Reviews
(0)
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
No comments yet
Log in to make a comment