Mapping OligoNucleotides to an assembly
Version info
The former version of the workflow expected that results from BioMART only report transcripts when the query (the probe in our
case) are entirely encapsulated in an exon of that transcript. However, the BioMart service also returns transcripts when the query is not or only partially overlapping with an exon in the stretch on the assembly on which a transcript is defined. This resulted in too many oligos classified as having multiple transcripts or having multiple genes.
Workflow description
We used RShell in the design process of a Zebrafish microarray
(supp. info Figure S1 and Figure S2). A microarray with 15k probes
of 60-mer oligonucleotides was designed on gene sequences from
Vega (http://vega.sanger.ac.uk/Danio_rerio) and Ensembl
(http://www.ensembl.org/Danio_rerio/) that are also known
in the Zebrafish Information Network (http://zfin.org) (for zebra
fish, the VEGA set is not a subset of the Ensembl set) of the genome
DNA-sequence assemblies and to judge the agreement that exists between
the different assembly annotations, we mapped the Vega-designed probes
onto the Ensembl assembly
It first performs an alignment using the BioMoby Blat and Blast service provided by WUR (www.bioinformatics.nl). Next, for each hit, tries to find the corresponding transcripts and genes using a biomart webservice. The final task is an analysis task using RShell. It calculates for each oligo to which class it belongs:
0 no hit
1 single hit, single transcript, single gene
2 multiple hits, single transcript, single gene, intron spanning
3 multiple hits, single transcript, single gene, possible intron spanning
4 multiple hits, single transcript, single gene, no intron spanning
5 multiple hits, multiple transcripts, single gene, intron spanning
6 multiple hits, multiple transcripts, single gene, possible intron spanning
7 multiple hits, multiple transcripts, single gene, no intron spanning
8 single hit, does not meet additional criteria **
9 multiple hits, single transcript, do not meet additional criteria **
10 multiple hits, multiple transcripts, do not meet additional criteria **
11 multiple hits, multiple genes
12 no transcript found but hit(s) meet additional criteria **
13 no transcript found and hit(s) do not meet additional criteria **
14 multiple hits, single transcript, single gene plus hit without transcript found and hits
meet additional criteria **
* Oligo below e-value cut-off 1e-12, but also intron spanning criteria met.
** Additional criteria: either e-value below 1e-12 or intron spanning.
To run this workflow, a certificate to access www.bioinformatics.nl needs to installed (Some services use an SSL connection). Look at the link below how to install this certificate.
http://www.myexperiment.org/files/148
The myExperiment pack http://www.myexperiment.org/packs/45 contains the workflow, the input and a test input. The whole input set is large. It takes about 6 hours on a 3 GHz Linux pc with 24 Gig RAM. The test input set can be run on almost any computer with Taverna and R installed. This set takes approximately 10 minutes.
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/603/download?version=5
[ More Info ]
Workflow Components
Workflow Type
Version 5 (of 7)
Log in to add Tags
Shared with Groups (0)
None
Log in to add to one of your Packs
Statistics
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment