Mapping OligoNucleotides to an assembly
No description has been set
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/603/download?version=3
[ More Info ]
Taverna is available from http://taverna.sourceforge.net/
If you are having problems downloading it in Taverna, you may need to provide your username and password in the URL so that Taverna can access the Workflow:
Replace http:// in the link above with http://yourusername:yourpassword@
Workflow Components
Name | Description |
---|---|
DataBaseName | Database name (for example Danio_rerio_Genome) |
Sequences | Sequences in fasta format, for example >ENSDART00000061775 TTGTTTCCTCATCAACACAGCAGATCGAATCATTCGAGTTTACGACGGTCGAGAGATCCT >ENSDART00000100022 TGCTGTTCAGTGGTTATGTTGTTGTTTGAATAAATGTTAAGAGCCAGTGGATGGCACAAA >OTTDART00000006800 ATCTCTTAGCACTCTGCTGACTCACAACTTCTTCAGAAATGACTTTTTGGATATCATGAA >OTTDART00000002447 AAGACTGTACGACAAGACAGTGCAAATGGCACCATAGTAAATTCAACCGCTCACCAGGAA >ENSDART00000047499 CTCTATGACGTATATTGCTATGTGGAGAACATTCATGGGGAGGTTTTTCATGGCTCAACC >ENSDART00000093312 CAGAGGGTTGCAACCTCTTCATCTATCATCTACCACAAGAGTTTGGTGACAATGAGCTTA >OTTDART00000002445 AATGTTGCTGGTATCAGTGACCCCTTTCTGCAGGTGCGCATTCTTAGATTGCTAAGGATT >ENSDART00000093311 CGGGCCTTTCTGGAGAAACGCAAACCTGTGTGGAGCAACACAGACGACTGCATTCACTGA >ENSDART00000085701 AGAATGACAATGACTGTGGAGCTTTTGTTTTGGAGTACTGTAAGTGCCTGGCCTTCATGA >OTTDART00000002443 AAACCATCACGCTTTAATTAGTTTCCCCTGTTAACCATTGTCCCACAAGTCTTATGTGGA >OTTDART00000002442 CTGAAAGGCACTTGAGTTAATCAAATCCGCTTCTATGTAAGTGTTTTGTAAGAGCAGGCT |
Restart | Whether the workflow should start from scratch. If false, the workflow will continue. This is useful if a crash has occured in Taverna (and it does sometimes when the data set is large) |
Name | Type | Description |
---|---|---|
Chunk_Size | stringconstant | |
BlastReport_Filename | stringconstant | |
OligosNotFound_Filename | stringconstant | |
BioMartReport_Filename | stringconstant | |
GeneratePlots | rshell | |
Create_Header_BlastReport | beanshell | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Read_BioMartReport | local | |
Read_BlastReport | local | |
Read_OligosNotFound | local | |
Create_Header_BioMartReport | beanshell | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Split_sequences | beanshell | |
Touch_OligosNotFound_File | beanshell | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Create_Sequence_Chunks | beanshell | |
Filter_Sequences_For_Blast | beanshell | |
Create_Semaphore | beanshell | |
Split_Blast_Report | beanshell | |
DoBioMart | workflow | |
BlatOrBlast | workflow | This workflow combines the blat and blast workflows. It takes as input a database name (Danio_rerio_Genome for Zebra Fish for example) and and a set of Fasta sequences. It first tries to perform a blat (at www.bioinformatics.nl). When this service returns nothing, a blast is done (also at www.bioinformatics.nl). The resulting reports are combined. |
Name | Description | Inputs | Outputs |
---|---|---|---|
Split_Report | input | lines | |
EmptyList | list | ||
Append_To_BioMartReport | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Filename
Content NewLine |
|
Split_Blast_Record | blastRecord |
oligoId
blastIndex chromosomeRegion dstart dstop |
|
Create_BioMart_Record |
geneId
transcriptId transcriptStart transcriptEnd |
record | |
checkIsInExon |
DStart
DStop ExonStart ExonStop |
isInExon | |
Add_OligoID_BlastIndex_Prefix |
BioMartRecord
OligoID BlastIndex |
FullRecord | |
Download_Report_and_Filter | This Beanshell downloads a file to disk. The standard download local Java widgets don't handle URLs with HTTP(S) Basic Authentication, but this Beanshell can. When a webserver uses BasicAuth, a login and password can be coded as part of the URL using the following syntax: http(s)://login:password@www.some.website/my/great/tool/result.xml. This beanshel extracts the login and password from the URL and supplies them automatically to the webserver. This prevents Taverna from showing popup dialogs requesting the login and password from the user as this will be problematic for large workflows. Please note that the path where the downloaded file will be stored must be an absolute path to a folder ended with a slash. (Slash backward on Windows or a slash forward on Linux/Unix/Mac OS X.) The filename for the result is automatically extracted from the URL. |
URL
eValue |
blatResults |
isRunning | status | isRunning | |
Correct_Moby_Object | inputXML | outputXML | |
DownloadURLWithBasicAuth | This Beanshell downloads a file to disk. The standard download local Java widgets don't handle URLs with HTTP(S) Basic Authentication, but this Beanshell can. When a webserver uses BasicAuth, a login and password can be coded as part of the URL using the following syntax: http(s)://login:password@www.some.website/my/great/tool/result.xml. This beanshel extracts the login and password from the URL and supplies them automatically to the webserver. This prevents Taverna from showing popup dialogs requesting the login and password from the user as this will be problematic for large workflows. Please note that the path where the downloaded file will be stored must be an absolute path to a folder ended with a slash. (Slash backward on Windows or a slash forward on Linux/Unix/Mac OS X.) The filename for the result is automatically extracted from the URL. | theURL | blastResults |
Append_To_BlastReport | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Filename
Content NewLine |
|
Join_Blat_Blast_Results |
list1
list2 |
outputList | |
Filter_Sequences_For_Blast |
sequences
blatResult |
sequencesForBlast | |
Append_To_OligosNotFound | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Filename
Content NewLine |
|
Filter_Sequences_Not_Found |
Report
Sequences |
SequencesNotFound | |
Add_Indices_To_BlastReport | record | record_with_index | |
isEmpty | string | isEmpty | |
EmptyList | list | ||
Create_Header_BlastReport | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Filename
Content NewLine Restart |
|
Create_Header_BioMartReport | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Filename
Content NewLine Restart |
|
Split_sequences | sequenceText | sequences | |
Touch_OligosNotFound_File | Processor to add content to a (existing) file. The content is added to the end of the file. The inputs: Filename: the file name of a file, if the file does not exists, a new file is added Content: the string to append NewLine [default = true]: if true, a newline is added to the end of the line (useful if you want to add a record each time) |
Filename
Content NewLine |
|
Create_Sequence_Chunks |
sequences
chunkSize |
chunks | |
Filter_Sequences_For_Blast |
sequences
filename |
sequencesToDo | |
Create_Semaphore | |||
Split_Blast_Report |
BlastReport
filename |
Records |
Name | Description |
---|---|
BlastReport | The blast hits |
BioMartReport | The biomart genes and transcripts |
SequencesNotFound | The sequences not found by blast and blat |
BarPlot | A bar plot of the oligos per class |
Classes | The classes, same as bar plot |
Report | The total R report |
Source | Sink |
---|---|
DataBaseName | BlatOrBlast:DataBaseName |
BioMartReport_Filename:value | Create_Header_BioMartReport:Filename |
BioMartReport_Filename:value | DoBioMart:BioMartReport_Filename |
BioMartReport_Filename:value | Read_BioMartReport:fileurl |
BlastReport_Filename:value | BlatOrBlast:BlastReport_Filename |
BlastReport_Filename:value | Create_Header_BlastReport:Filename |
BlastReport_Filename:value | Read_BlastReport:fileurl |
Chunk_Size:value | Create_Sequence_Chunks:chunkSize |
Create_Sequence_Chunks:chunks | BlatOrBlast:Sequences |
OligosNotFound_Filename:value | BlatOrBlast:OligosNotFound_Filename |
OligosNotFound_Filename:value | Read_OligosNotFound:fileurl |
OligosNotFound_Filename:value | Touch_OligosNotFound_File:Filename |
Read_BioMartReport:filecontents | GeneratePlots:biomartfile |
Read_BlastReport:filecontents | GeneratePlots:blastresultfile |
Read_BlastReport:filecontents | Split_Blast_Report:BlastReport |
Read_OligosNotFound:filecontents | GeneratePlots:oligosNotFound |
Restart | Create_Header_BioMartReport:Restart |
Restart | Create_Header_BlastReport:Restart |
Sequences | Filter_Sequences_For_Blast:sequences |
BioMartReport_Filename:value | Split_Blast_Report:filename |
BlastReport_Filename:value | Filter_Sequences_For_Blast:filename |
Filter_Sequences_For_Blast:sequencesToDo | Split_sequences:sequenceText |
Split_Blast_Report:Records | DoBioMart:blastRecord |
GeneratePlots:BarPlot | BarPlot |
GeneratePlots:Classes | Classes |
GeneratePlots:Report | Report |
Read_BioMartReport:filecontents | BioMartReport |
Read_BlastReport:filecontents | BlastReport |
Read_OligosNotFound:filecontents | SequencesNotFound |
Split_sequences:sequences | Create_Sequence_Chunks:sequences |
Controller | Target |
---|---|
Create_Header_BlastReport | BlatOrBlast |
BlatOrBlast | Read_BlastReport |
BlatOrBlast | Read_OligosNotFound |
DoBioMart | Read_BioMartReport |
Create_Header_BioMartReport | DoBioMart |
Create_Header_BlastReport | Filter_Sequences_For_Blast |
Touch_OligosNotFound_File | BlatOrBlast |
Create_Semaphore | Create_Header_BlastReport |
Create_Semaphore | Touch_OligosNotFound_File |
Create_Semaphore | Create_Header_BioMartReport |
Workflow Type
Version 3 (of 7)
Log in to add Tags
Shared with Groups (0)
None
Log in to add to one of your Packs
Statistics
In chronological order:
-
Created by Wassinki on Friday 19 December 2008 09:41:13 (UTC)
Revision comment:<meta /> <meta /> <meta /> <meta /> <link /><!--[if gte mso 9]><xml> <w:WordDocument> <w:View>Normal</w:View> <w:Zoom>0</w:Zoom> <w:PunctuationKerning /> <w:ValidateAgainstSchemas /> <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid> <w:IgnoreMixedContent>false</w:IgnoreMixedContent> <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText> <w:Compatibility> <w:BreakWrappedTables /> <w:SnapToGridInCell /> <w:WrapTextWithPunct /> <w:UseAsianBreakRules /> <w:DontGrowAutofit /> </w:Compatibility> <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel> </w:WordDocument> </xml><![endif]--><!--[if gte mso 9]><xml> <w:LatentStyles DefLockedState="false" LatentStyleCount="156"> </w:LatentStyles> </xml><![endif]-->
----------------------------------------
The newest version only takes into account the probes that have blast hits that map on exons. The BioMart sub workflow has been modified to do this by adding an extra BioMart processor and a beanshell processor to filter those blast hits that map on exons.
----------------------------------------
This workflow maps the input oligo set to an assembly.<o:p></o:p>
It first performs an alignment using the BioMoby Blat and Blast service provided by WUR (www.bioinformatics.nl). Next, for each hit, tries to find the corresponding transcripts and genes using a biomart webservice. The final task is an analysis task using RShell. It calculates for each oligo to which class it belongs:<o:p></o:p>
1 single hit
2-4 multiple hits single transcript
5-7 mulitple hits multiple transcripts
8 single hit, discarded
9 multiple hits single transcript, discarded
10 multiple transcripts, discarded*
11 multi gene, discarded
12 no transcript
13 no transcript, discarded
* classified on the criteria intron spanning only, possible intron spanning and no intron spanning.
* hit(s) do not meet high stringency threshold
* no transcript found but hit(s) meet high stringency threshold.<o:p></o:p>To run this workflow, a certificate to access www.bioinformatics.nl needs to installed (Some services use an SSL connection). Look at the link below how to install this certificate.
http://www.myexperiment.org/files/148<o:p></o:p>The myExperiment pack http://www.myexperiment.org/packs/45 contains the workflow, the input and a test input. The whole input set is large. It takes about 6 hours on a 3 GHz Linux pc with 24 Gig RAM. The test input set can be run on almost any computer with Taverna and R installed. This set takes approximately 10 minutes.<o:p></o:p>
<o:p> </o:p>
<o:p> </o:p>
-
Created by Wassinki on Tuesday 03 February 2009 15:27:18 (UTC)
Last edited by Wassinki on Tuesday 03 February 2009 15:33:01 (UTC)
Revision comment:The former version of the workflow expected that results from BioMART only report transcripts when the query (the probe in our case) are entirely encapsulated in an exon of that transcript. However, the BioMart service also returns transcripts when the query is not or only partially overlapping with an exon in the stretch on the assembly on which a transcript is defined. This resulted in too many oligos classified as having multiple transcripts or having multiple genes.
-
Created by Wassinki on Friday 13 February 2009 09:03:37 (UTC)
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment