Data Refinement Workflow v17
The aim of the (Taxonomic) Data Refinement Workflow is to provide a streamlined workflow environment for preparing observational and specimen data sets for use in scientific analysis on the Taverna platform. The workflow has been designed in a way that, • accepts input data in a recognized format, but originating from various sources (e.g. services, local user data sets), • includes a number of graphical user interfaces to view and interact with the data, • the output of each part of the workflow is compatible with the input of each part, implying that the user is free to choose a specific sequence of actions, • allows for the use of custom-built as well as third-party tools applications and tools.
Currently, the data refinement workflow is made up of three distinct parts:
1.Taxonomic Name Resolution / Occurrence retrieval 2.Geo-temporal data selection 3.Data quality checks / filtering
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/2874/download?version=10
[ More Info ]
Taverna is available from http://taverna.sourceforge.net/
If you are having problems downloading it in Taverna, you may need to provide your username and password in the URL so that Taverna can access the Workflow:
Replace http:// in the link above with http://yourusername:yourpassword@
Workflow Components
Cherian Mathew, Vera Hernandez |
Data Refinement Workflow |
The aim of the (Taxonomic) Data Refinement Workflow is to provide a streamlined workflow environment for preparing observational and specimen data sets for use in scientific analysis on the Taverna platform. The workflow has been designed in a way that, • accepts input data in a recognized format, but originating from various sources (e.g. services, local user data sets), • includes a number of graphical user interfaces to view and interact with the data, • the output of each part of the workflow is compatible with the input of each part, implying that the user is free to choose a specific sequence of actions, • allows for the use of custom-built as well as third-party tools applications and tools. Currently, the data refinement workflow is made up of three distinct parts: 1.Taxonomic Name Resolution / Occurrence retrieval 2.Geo-temporal data selection 3.Data quality checks / filtering |
DCWorkflow.jar |
json-simple-1.1.1.jar |
None
Name | Type | Description |
---|---|---|
Select_File | localworker |
Scriptimport java.awt.CardLayout; import java.awt.Image; import java.awt.Toolkit; import java.io.File; import java.util.HashMap; import java.util.Map; import javax.swing.ImageIcon; import javax.swing.JEditorPane; import javax.swing.JFileChooser; import javax.swing.JLabel; import javax.swing.JPanel; import javax.swing.filechooser.FileFilter; class FileExtFilter extends FileFilter { public FileExtFilter(String ext, String label, boolean includeDir) { this.ext = ext; this.label = label; this.includeDir = includeDir; } public String getDescription() { return this.label; } public boolean accept(File file) { if (file.isDirectory() && includeDir) { return true; } else { return file.getName().endsWith(this.ext); } } String ext, label; boolean includeDir; } if (title == void) { title = null; } if ((fileExtensions == void) || (fileExtensions == null)) { fileExtensions = ""; } if ((fileExtLabels == void) || (fileExtLabels == null)) { fileExtLabels = ""; } JFileChooser chooser = new JFileChooser(); chooser.setDialogTitle(title); String[] fileTypeList = fileExtensions.split(","); String[] filterLabelList = fileExtLabels.split(","); if (fileTypeList != null && filterLabelList != null && fileTypeList.length != filterLabelList.length) { throw new RuntimeException("The list of extensions and file filter labels must be the same length"); } // create the file filters for (int i = 0; i < fileTypeList.length; i++) { FileExtFilter filter = new FileExtFilter(fileTypeList[i], filterLabelList[i], true); chooser.setFileFilter(filter); } chooser.showOpenDialog(null); File file = chooser.getSelectedFile(); selectedFile = file.getAbsolutePath(); |
title_value | stringconstant |
ValueChoose input file |
Read_Text_File | localworker |
ScriptBufferedReader getReader (String fileUrl, String encoding) throws IOException { InputStreamReader reader; try { if (encoding == null) { reader = new FileReader(fileUrl); } else { reader = new InputStreamReader(new FileInputStream(fileUrl),encoding); } } catch (FileNotFoundException e) { // try a real URL instead URL url = new URL(fileUrl); if (encoding == null) { reader = new InputStreamReader (url.openStream()); } else { reader = new InputStreamReader (url.openStream(), encoding); } } return new BufferedReader(reader); } StringBuffer sb = new StringBuffer(4000); if (encoding == void) { encoding = null; } BufferedReader in = getReader(fileurl, encoding); String str; String lineEnding = System.getProperty("line.separator"); while ((str = in.readLine()) != null) { sb.append(str); sb.append(lineEnding); } in.close(); filecontents = sb.toString(); |
Data_Cleaning_Worklow_Loop | workflow | Nested workflow which allows the user to plug in the output of any part of the workflow to the input of any other part of the workflow until the 'end workflow' choice is made. |
Write_Text_File | localworker |
ScriptwriteOK = "false"; BufferedWriter out; if (encoding == void) { out = new BufferedWriter(new FileWriter(outputFile)); } else { out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), encoding)); } out.write(filecontents); out.flush(); out.close(); writeOK = "true"; |
Select_Output_CSV_File | localworker |
Scriptimport java.awt.CardLayout; import java.awt.Image; import java.awt.Toolkit; import java.io.File; import java.util.HashMap; import java.util.Map; import javax.swing.ImageIcon; import javax.swing.JEditorPane; import javax.swing.JFileChooser; import javax.swing.JLabel; import javax.swing.JPanel; import javax.swing.filechooser.FileFilter; class FileExtFilter extends FileFilter { public FileExtFilter(String ext, String label, boolean includeDir) { this.ext = ext; this.label = label; this.includeDir = includeDir; } public String getDescription() { return this.label; } public boolean accept(File file) { if (file.isDirectory() && includeDir) { return true; } else { return file.getName().endsWith(this.ext); } } String ext, label; boolean includeDir; } if (title == void) { title = null; } if ((fileExtensions == void) || (fileExtensions == null)) { fileExtensions = ""; } if ((fileExtLabels == void) || (fileExtLabels == null)) { fileExtLabels = ""; } JFileChooser chooser = new JFileChooser(); chooser.setDialogTitle(title); String[] fileTypeList = fileExtensions.split(","); String[] filterLabelList = fileExtLabels.split(","); if (fileTypeList != null && filterLabelList != null && fileTypeList.length != filterLabelList.length) { throw new RuntimeException("The list of extensions and file filter labels must be the same length"); } // create the file filters for (int i = 0; i < fileTypeList.length; i++) { FileExtFilter filter = new FileExtFilter(fileTypeList[i], filterLabelList[i], true); chooser.setFileFilter(filter); } chooser.showOpenDialog(null); File file = chooser.getSelectedFile(); selectedFile = file.getAbsolutePath(); |
output_filechooser_title | stringconstant |
ValueChoose Output CSV File |
FirstElement_Depth1 | beanshell |
Scriptif(stringlist.size > 0) { firstelement = stringlist.get(0); } else { firstelement = ""; } |
Name | Description | Inputs | Outputs |
---|---|---|---|
FirstElement_Depth1 | stringlist | firstelement | |
SynCheckGUI | synreqres_list |
names synreqres |
|
DCSynExpInputParser | csvData |
synonymRequest incorrectRecords |
|
GBIFCheckListParser | gbifChkListJSON | gbifChkList | |
DCSynExpInputDialog |
gbifChkLists synonymRequest |
colSynReq gbifSynReq gbifSelChkListIDs colChosen gbifChosen |
|
Syn_Credit_Checker |
col_copyright_conditional colSynReq gbif_data_use_conditional gbifSynReq |
colSynReq gbifSynReq |
|
Merge_Syn_Responses |
colSynResList gbifSynResList |
synResList | |
slw_filter_generator | sciName | filter | |
GBIF_Agreement_Conditional | gbif_agreement_answer | gbif_agreement_conditional | |
FirstElement_Depth1 | stringlist | firstelement | |
Parse_Data_Upload | jsonStr |
jsonErr upload_ok |
|
passthrough | csvin | csvout | |
trimRESTurlResult | url | resultUrl | |
checkDataUpload | status |
dataUpload_ok dataUpload_failed uploadStatus |
|
Parse_Project | jsonStr |
jsonErr projectID percent |
|
nameParser | taxonSearchJSON | synResponse | |
Parse_Job | jsonStr |
jsonErr jobID |
|
Format_Options | jobID | options | |
Delete_Data_Options | projectID | options | |
Export_Data_Conditional | gref_answer |
save_true cancel_true |
|
FirstElement_Depth1 | stringlist | firstelement | |
Export_Data_Options | projectID | options | |
GBIF_Data_Use_Conditional | copyright_answer | gbif_data_use_conditional | |
NameStatusConditional | nameStatus |
synpass_flags synfail_flags |
|
CreateID | name | id_param | |
FirstElement_From_XPath |
xpath xmltext |
nodeVal | |
FirstElement_Depth1 | stringlist | firstelement | |
Col_Copyright_Conditional | copyright_answer | col_copyright_conditional | |
End_Workflow | csvData | csv_output | |
Empty_Response_Service |
datasetName datasetID |
emptyResponse | |
gbifNameSearchParser | gbifNameSerachJSON |
taxonIDList emptyTaxonIDList |
|
Split_GBIFChklist_Name_Id | gbifChkListNameId |
gbifChkListName gbifChkListId |
|
FirstElement_Depth1 | stringlist | firstelement | |
OccTargetConditional | sciNameList |
gbifList slwList gbifChosen |
|
Occ_Credit_Checker |
gbif_agreement_conditional gbif_names_list slw_names_list |
gbif_names_list slw_names_list |
|
DC_Choose_Sub_Flow | internalCSVData |
synExpOccRetCSVData dataSelCSVData dataQualCSVData endWFlowCSVData endWFlow csvData |
|
AssignInputOutput | in | out | |
gbifTaxonSearchParser | gbifTaxonSerachJSON |
synTaxonIDList acceptedNameResponse rank |
|
Concat_Response |
accNameRes datasetName synRes datasetID |
concatResponse | |
FirstElement_From_XPath |
xpath xmltext |
nodeVal |
Name | Description |
---|---|
csv_output | Output of the workflow in CSV format. |
file_write_ok | boolean flag which confirms if the output file was written succesfully or not. |
endWFlow |
Source | Sink |
---|---|
title_value:value | Select_File:title |
Select_File:selectedFile | Read_Text_File:fileurl |
Read_Text_File:filecontents | Data_Cleaning_Worklow_Loop:internalCSVData |
Select_Output_CSV_File:selectedFile | Write_Text_File:outputFile |
output_filechooser_title:value | Select_Output_CSV_File:title |
Data_Cleaning_Worklow_Loop:internalCSVData | FirstElement_Depth1:stringlist |
FirstElement_Depth1:firstelement | csv_output |
Write_Text_File:writeOK | file_write_ok |
Data_Cleaning_Worklow_Loop:endWFlow | endWFlow |
Controller | Target |
---|---|
FirstElement_Depth1 | Select_Output_CSV_File |
Workflow Type
Version 10 (of 17)
- biostif
- |
- catalogue of life col
- |
- data quality and filtering
- |
- edit platform for cybertaxonomy
- |
- gbif
- |
- geo-temporal data selection and filtering
- |
- google refine
- |
- historical analysis
- |
- occurrence retrieval
- |
- openrefine
- |
- pan-european species directories infrastructure pesi
- |
- spatio-temporal analysis
- |
- species distribution analysis
- |
- species occurrence
- |
- species richness and diversity
- |
- species2000
- |
- synonym expansion
- |
- taxonomic data cleaning and refinement
- |
- taxonomic name resolution
- |
- taxonomy
- |
- world register of marine species worms
None
Log in to add Tags
Shared with Groups (2)
Statistics
In chronological order:
-
Created by Cherian Mathew on Wednesday 11 April 2012 10:08:25 (UTC)
Last edited by Cherian Mathew on Wednesday 11 April 2012 10:09:19 (UTC)
-
Created by Cherian Mathew on Wednesday 13 June 2012 10:31:04 (UTC)
Last edited by Cherian Mathew on Wednesday 13 June 2012 10:32:17 (UTC)
Revision comment:Integrated new version of Synonym Expansion / Occurrence Retrieval part of the workflow
-
Created by Cherian Mathew on Tuesday 26 June 2012 12:47:53 (UTC)
-
Created by Cherian Mathew on Tuesday 26 June 2012 13:22:41 (UTC)
Last edited by Cherian Mathew on Thursday 28 June 2012 08:00:16 (UTC)
Revision comment:Added new version of BioSTIF workflow which call the web service on http, removing the need for the credentials in Taverna.
-
Created by Cherian Mathew on Thursday 26 July 2012 14:50:56 (UTC)
Last edited by Cherian Mathew on Thursday 26 July 2012 14:54:16 (UTC)
Revision comment:Added new version of taxonomic name expansion nested workflow
-
Created by Cherian Mathew on Tuesday 07 August 2012 08:14:45 (UTC)
Last edited by Cherian Mathew on Tuesday 07 August 2012 08:16:55 (UTC)
-
Created by Cherian Mathew on Friday 28 September 2012 13:56:56 (UTC)
Last edited by Cherian Mathew on Friday 28 September 2012 13:57:17 (UTC)
-
Created by Cherian Mathew on Thursday 01 November 2012 09:15:21 (UTC)
-
Created by Cherian Mathew on Thursday 08 November 2012 13:12:58 (UTC)
-
Created by Cherian Mathew on Thursday 31 January 2013 09:02:16 (UTC)
-
Created by Cherian Mathew on Tuesday 02 April 2013 11:47:04 (UTC)
Revision comment:This version of the workflow is compatible with the newly created DRF Plugin which installs the dependency jars using the plugin framework of taverna.
-
Created by Cherian Mathew on Thursday 01 August 2013 13:31:20 (UTC)
Revision comment:Added checks for Google Refine.
Changed copyright notices to be popups.
Added BGBM EDIT Platform as a target.
Misc. look and feel changes
-
Created by Cherian Mathew on Tuesday 04 March 2014 11:13:25 (UTC)
Revision comment:This new version of the data refinement workflow has:
* Access to the new GBIF API for taxonomic name resolution and for retrieval of occurrence points.
* Integration of PESI name resolution web service.
* Added search filter for aggregated name checklists
* Minor changes in GUI, mainly in the TNRS subworkflow
* Minor bugs fixed
-
Created by Biodiversity eLaboratory on Thursday 08 May 2014 12:52:48 (UTC)
-
Created by Biodiversity eLaboratory on Tuesday 02 September 2014 08:35:48 (UTC)
Revision comment:This new version of the data refinement workflow has:
* The incluision of the World Register of Marine Species (WoRMS) as a new checklist to resolve species names in the Taxonomic Name Resolution subworkflow. The WoRMS checklist is queried by using the WoRMS web service created by VLIZ.
* Minor bugs fixed
-
Created by Biodiversity eLaboratory on Wednesday 17 December 2014 12:51:15 (UTC)
Revision comment:Change the BioSTIF services from Fraunhofer to EGI
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment