Extract_unique_proteins_from_blast_resultsblastFile00 The URL or file path location of the tab-delimited format of the Blast results. Type the path as a string (not 2011-03-24 15:20:42.61 GMT C:\Users\You 2011-03-24 15:20:32.41 GMT inputs the xml format of the blast results 2010-03-19 03:21:20.950 GMT The URL or file path location of the tab-delimited format of the Blast results. 2011-03-24 15:20:06.729 GMT The URL or file path location of the tab-delimited format of the Blast results. Type the path as a string (not a file location). 2011-03-24 15:20:55.164 GMT C:\Users\You\Documents\my_blast_results.tab 2011-03-24 15:21:18.503 GMT tfasta00 Fasta file of the target proteins to extract the sequences. 2011-03-24 15:21:32.450 GMT File : C:\Users\You\Documents\target_fasta.faa 2011-03-24 15:22:34.14 GMT File 2011-03-24 15:21:59.270 GMT Fasta file of the target proteins to extract the sequences. Add as file location. 2011-03-24 15:21:53.195 GMT fasta file of the target proteins to extract the sequences 2010-03-19 03:19:27.444 GMT unique_identifiers gi|321313668|ref|YP_004205955.1| gi|321312432|ref|YP_004204719.1| gi|321314996|ref|YP_004207283.1| 2011-03-24 15:28:18.998 GMT Unique identifiers that appear in the FASTA file but not in the BLAST file. Identifiers are separated by new lines. 2011-03-24 15:28:07.208 GMT blasted_identifiers Identifiers of the target genome from the BLAST results. These identifiers are those that are similar to the source proteome. 2011-03-24 15:26:18.245 GMT gi|321313668|ref|YP_004205955.1| gi|321312432|ref|YP_004204719.1| gi|321314996|ref|YP_004207283.1| 2011-03-24 15:27:27.684 GMT Read_Text_Filefileurl0filecontents00net.sf.taverna.t2.activitieslocalworker-activity1.2net.sf.taverna.t2.activities.localworker.LocalworkerActivity net.sourceforge.taverna.scuflworkers.io.TextFileReader workflow java.lang.String true fileurl 0 'text/plain' 0 filecontents 0 'text/plain' net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokeextract_blast_idsxml_result0gi_lines_val00net.sf.taverna.t2.activitiesbeanshell-activity1.2net.sf.taverna.t2.activities.beanshell.BeanshellActivity workflow java.lang.String true xml_result 0 text/plain 0 gi_lines_val 0 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Invokefind_unique_proteinsgi_val0tfasta_in0cfasta_out00net.sf.taverna.t2.activitiesbeanshell-activity1.2net.sf.taverna.t2.activities.beanshell.BeanshellActivity workflow java.lang.String true gi_val 0 text/plain java.lang.String true tfasta_in 0 text/plain 0 cfasta_out 0 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.2net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeRead_Text_FilefileurlblastFileextract_blast_idsxml_resultRead_Text_Filefilecontentsfind_unique_proteinsgi_valextract_blast_idsgi_lines_valfind_unique_proteinstfasta_intfastaunique_identifiersfind_unique_proteinscfasta_outblasted_identifiersextract_blast_idsgi_lines_val 8192e733-05b6-4855-9246-997adf9d9da8 2011-03-25 20:01:56.11 GMT cc480ca8-ff58-4ff6-9949-17d47f4449ac 2011-03-29 17:40:41.746 BST A-Team 2011-04-01 11:32:13.835 BST 2010-03-19 03:14:00.733 GMT b066272d-8732-4505-a7b1-aa424a011c42 2011-03-29 17:37:52.118 BST The workflow parses uses the tab-delimited BLAST results to determine the unique proteins found in the target genome that have no similarity to the source genome. 2011-04-01 11:32:55.135 BST The workflow parses uses the blast results to determine the unique proteins found in the target genome that have no similairty to the source genome. Using these unique protein ids, and the original target protein fasta file, a fasta file of unique proteins is created. 2010-03-19 03:23:47.653 GMT 7045bed3-10bd-4d68-9c66-0135b0ed2dde 2011-03-24 15:28:27.232 GMT 6d6d19ae-76d8-4e21-a247-3a200324d0be 2011-03-29 18:05:10.408 BST 60c8aba2-6a4b-4c8c-b66a-bbbf02104f94 2011-03-29 17:56:11.11 BST 737d3624-b4d2-4a08-b274-cc14742bce4c 2011-03-29 17:38:42.291 BST 133c1ed5-3710-4e6a-956e-265beacfa6c4 2011-03-28 11:06:19.156 BST c1f4908c-1669-4b80-bba1-13a2024ef9f4 2011-03-25 19:44:27.821 GMT 9adf3d21-e3ce-44cb-b1d1-f58b89a7045d 2011-03-29 18:02:43.52 BST This workflow allows you to configure a BioMart query to fetch sequences you want from Ensembl. These sequences are retrieved and a blast database of them is created (by default, in the directory you ran taverna from). Warning: This workflow assumes that you have blastall and formatdb installed on the machine, and that by default, these are both found or linked in /usr/local/bin. It also assumes that you have write permission to the directory you have run taverna from. The beanshells "create_blastall_cmdArgs" and "create_formatdb_cmdArgs" are what you need to edit if the default locations are not appropriate for you. Shortcomings: The names of all the files created and used is hard coded in this workflow. This means that if you run this workflow more than once without editing anything, you will overwrite files you have previously created. All files created in the working directory are not yet coded to be deleted via the workflow. Ideally there would be an option that a user could choose that would set the files to be kept or deleted after use. 2010-03-15 11:30:59.109 GMT nclteamc 2010-03-19 03:15:57.858 GMT Extract unique proteins from blast results 2010-03-19 03:16:29.89 GMT c39cf70b-850d-4f5a-a83e-d6570a9c468c 2011-03-28 11:10:21.821 BST Workflow outputs a list of proteins encoded by the target genomes that do not have sequences similarity to those encoded by the source genome 2010-03-19 03:18:26.823 GMT af914252-ebc4-432f-992e-e2b0d39017f9 2011-03-29 17:39:30.204 BST 93e32a3d-4e18-465b-8d3f-d2705d281a24 2011-03-29 17:44:59.100 BST 07a797c2-caa6-48b9-bbde-148dfc94b6bd 2011-03-29 17:51:41.951 BST 4626edc0-7e50-4e5d-8b94-7a2dfcd25175 2011-03-29 17:43:35.769 BST 782c8659-5afb-4518-a8c3-0376a0884816 2011-03-24 13:52:32.878 GMT 4d837fd7-5124-4e7c-964b-17de794236cc 2011-03-24 15:25:33.719 GMT fetchEnsemblSeqsAndBlast 2010-03-15 11:30:59.109 GMT a3deee76-a760-4d32-a267-c6b198eafc6f 2011-03-29 17:31:59.467 BST 498004e2-20fa-4e68-beb8-c37363ed1d7c 2011-03-29 19:06:48.198 BST 63e1f905-8094-472f-9564-694024af6d41 2011-03-29 17:51:04.462 BST 407de167-b803-49a3-8d88-330ea6dad2cd 2011-03-29 17:53:15.739 BST 150d4e84-1ab0-445e-a68f-59c9520fcb13 2011-03-29 18:00:45.354 BST 0bf98650-2884-406a-98a0-557f9de13e1c 2011-03-29 17:35:46.991 BST Bela Tiwari 2010-03-15 11:30:59.109 GMT 4de394ba-bb1d-40bb-9091-6814ed44a022 2011-03-29 18:03:07.210 BST 1b646021-1281-4a60-87d2-30ff347c440e 2011-03-29 17:32:31.887 BST 3bf00d4e-1435-4e32-b252-cd4db7d34aa4 2011-03-29 17:30:29.191 BST b2579890-b32e-4be4-975b-28659feea68f 2011-03-24 13:12:29.54 GMT 8dd7abc2-2a03-403f-b241-90fe56b28a30 2011-04-01 11:33:12.635 BST