Data Refinement Workflow v17
Created: 2012-04-11 10:08:25
Last updated: 2014-12-17 12:51:15
The aim of the (Taxonomic) Data Refinement Workflow is to provide a streamlined workflow environment for preparing observational and specimen data sets for use in scientific analysis on the Taverna platform. The workflow has been designed in a way that,
• accepts input data in a recognized format, but originating from various sources (e.g. services, local user data sets),
• includes a number of graphical user interfaces to view and interact with the data,
• the output of each part of the workflow is compatible with the input of each part, implying that the user is free to choose a specific sequence of actions,
• allows for the use of custom-built as well as third-party tools applications and tools.
Currently, the data refinement workflow is made up of three distinct parts:
1.Taxonomic Name Resolution / Occurrence retrieval
2.Geo-temporal data selection
3.Data quality checks / filtering
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (1)
Cherian Mathew, Vera Hernandez |
Titles (1)
[PLUGIN]Data Refinement Workflow |
Descriptions (1)
The aim of the (Taxonomic) Data Refinement Workflow is to provide a streamlined workflow environment for preparing observational and specimen data sets for use in scientific analysis on the Taverna platform. The workflow has been designed in a way that,
• accepts input data in a recognized format, but originating from various sources (e.g. services, local user data sets),
• includes a number of graphical user interfaces to view and interact with the data,
• the output of each part of the workflow is compatible with the input of each part, implying that the user is free to choose a specific sequence of actions,
• allows for the use of custom-built as well as third-party tools applications and tools.
Currently, the data refinement workflow is made up of three distinct parts:
1.Taxonomic Name Resolution / Occurrence retrieval
2.Geo-temporal data selection
3.Data quality checks / filtering
|
Dependencies (0)
Processors (5)
Name |
Type |
Description |
Select_File |
localworker |
Scriptimport java.awt.CardLayout;
import java.awt.Image;
import java.awt.Toolkit;
import java.io.File;
import java.util.HashMap;
import java.util.Map;
import javax.swing.ImageIcon;
import javax.swing.JEditorPane;
import javax.swing.JFileChooser;
import javax.swing.JLabel;
import javax.swing.JPanel;
import javax.swing.filechooser.FileFilter;
class FileExtFilter extends FileFilter {
public FileExtFilter(String ext, String label, boolean includeDir) {
this.ext = ext;
this.label = label;
this.includeDir = includeDir;
}
public String getDescription() {
return this.label;
}
public boolean accept(File file) {
if (file.isDirectory() && includeDir) {
return true;
} else {
return file.getName().endsWith(this.ext);
}
}
String ext, label;
boolean includeDir;
}
if (title == void) {
title = null;
}
if ((fileExtensions == void) || (fileExtensions == null)) {
fileExtensions = "";
}
if ((fileExtLabels == void) || (fileExtLabels == null)) {
fileExtLabels = "";
}
JFileChooser chooser = new JFileChooser();
chooser.setDialogTitle(title);
String[] fileTypeList = fileExtensions.split(",");
String[] filterLabelList = fileExtLabels.split(",");
if (fileTypeList != null && filterLabelList != null && fileTypeList.length != filterLabelList.length) {
throw new RuntimeException("The list of extensions and file filter labels must be the same length");
}
// create the file filters
for (int i = 0; i < fileTypeList.length; i++) {
FileExtFilter filter = new FileExtFilter(fileTypeList[i], filterLabelList[i], true);
chooser.setFileFilter(filter);
}
chooser.showOpenDialog(null);
File file = chooser.getSelectedFile();
selectedFile = file.getAbsolutePath();
|
title_value |
stringconstant |
ValueChoose input file |
Read_Text_File |
localworker |
ScriptBufferedReader getReader (String fileUrl, String encoding) throws IOException {
InputStreamReader reader;
try {
if (encoding == null) {
reader = new FileReader(fileUrl);
} else {
reader = new InputStreamReader(new FileInputStream(fileUrl),encoding);
}
}
catch (FileNotFoundException e) {
// try a real URL instead
URL url = new URL(fileUrl);
if (encoding == null) {
reader = new InputStreamReader (url.openStream());
} else {
reader = new InputStreamReader (url.openStream(), encoding);
}
}
return new BufferedReader(reader);
}
StringBuffer sb = new StringBuffer(4000);
if (encoding == void) {
encoding = null;
}
BufferedReader in = getReader(fileurl, encoding);
String str;
String lineEnding = System.getProperty("line.separator");
while ((str = in.readLine()) != null) {
sb.append(str);
sb.append(lineEnding);
}
in.close();
filecontents = sb.toString();
|
Data_Cleaning_Worklow_Loop |
workflow |
Nested workflow which allows the user to plug in the output of any part of the workflow to the input of any other part of the workflow until the 'end workflow' choice is made. |
FirstElement_Depth1 |
beanshell |
Scriptif(stringlist.size > 0) {
firstelement = stringlist.get(0);
} else {
firstelement = "";
}
|
Beanshells (40)
Name |
Description |
Inputs |
Outputs |
FirstElement_Depth1 |
|
stringlist
|
firstelement
|
NameStatusConditional |
|
nameStatus
|
synpass_flags
synfail_flags
|
CreateID |
|
name
|
id_param
|
FirstElement_From_XPath |
|
xpath
xmltext
|
nodeVal
|
FirstElement_Depth1 |
|
stringlist
|
firstelement
|
nameParser |
|
taxonSearchJSON
|
synResponse
|
Col_Copyright_Conditional |
|
copyright_answer
|
col_copyright_conditional
|
Parse_Job |
|
jsonStr
|
jsonErr
jobID
|
Format_Options |
|
jobID
|
options
|
Delete_Data_Options |
|
projectID
|
options
|
Export_Data_Conditional |
|
gref_answer
|
save_true
cancel_true
|
FirstElement_Depth1 |
|
stringlist
|
firstelement
|
Parse_Project |
|
jsonStr
|
jsonErr
projectID
percent
|
Parse_Data_Upload |
|
jsonStr
|
jsonErr
upload_ok
|
gbifTaxonSearchParser |
|
gbifTaxonSerachJSON
|
synTaxonIDList
acceptedNameResponse
rank
|
Concat_Response |
|
accNameRes
datasetName
synRes
datasetID
|
concatResponse
|
DC_Choose_Sub_Flow |
|
internalCSVData
|
synExpOccRetCSVData
dataSelCSVData
dataQualCSVData
endWFlowCSVData
endWFlow
csvData
|
AssignInputOutput |
|
in
|
out
|
trimRESTurlResult |
|
url
|
resultUrl
|
checkDataUpload |
|
status
|
dataUpload_ok
dataUpload_failed
uploadStatus
|
Export_Data_Options |
|
projectID
|
options
|
OccTargetConditional |
|
sciNameList
|
gbifList
slwList
gbifChosen
|
Occ_Credit_Checker |
|
gbif_agreement_conditional
gbif_names_list
slw_names_list
|
gbif_names_list
slw_names_list
|
slw_filter_generator |
|
sciName
|
filter
|
passthrough |
|
csvin
|
csvout
|
FirstElement_Depth1 |
|
stringlist
|
firstelement
|
GBIF_Data_Use_Conditional |
|
copyright_answer
|
gbif_data_use_conditional
|
Empty_Response_Service |
|
datasetName
datasetID
|
emptyResponse
|
GBIF_Agreement_Conditional |
|
gbif_agreement_answer
|
gbif_agreement_conditional
|
gbifNameSearchParser |
|
gbifNameSerachJSON
|
taxonIDList
emptyTaxonIDList
|
Split_GBIFChklist_Name_Id |
|
gbifChkListNameId
|
gbifChkListName
gbifChkListId
|
FirstElement_Depth1 |
|
stringlist
|
firstelement
|
SynCheckGUI |
|
synreqres_list
|
names
synreqres
|
DCSynExpInputParser |
|
csvData
|
synonymRequest
incorrectRecords
|
GBIFCheckListParser |
|
gbifChkListJSON
|
gbifChkList
|
DCSynExpInputDialog |
|
gbifChkLists
synonymRequest
|
colSynReq
gbifSynReq
gbifSelChkListIDs
colChosen
gbifChosen
|
Syn_Credit_Checker |
|
col_copyright_conditional
colSynReq
gbif_data_use_conditional
gbifSynReq
|
colSynReq
gbifSynReq
|
Merge_Syn_Responses |
|
colSynResList
gbifSynResList
|
synResList
|
End_Workflow |
|
csvData
|
csv_output
|
FirstElement_From_XPath |
|
xpath
xmltext
|
nodeVal
|
Outputs (2)
Name |
Description |
csv_output |
Output of the workflow in CSV format.
|
endWFlow |
|
Datalinks (6)
Source |
Sink |
title_value:value |
Select_File:title |
Select_File:selectedFile |
Read_Text_File:fileurl |
Read_Text_File:filecontents |
Data_Cleaning_Worklow_Loop:internalCSVData |
Data_Cleaning_Worklow_Loop:internalCSVData |
FirstElement_Depth1:stringlist |
FirstElement_Depth1:firstelement |
csv_output |
Data_Cleaning_Worklow_Loop:endWFlow |
endWFlow |
Uploader
License
All versions of this Workflow are
licensed under:
Version 11
(of 17)
Credits (1)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (2)
Featured In Packs (1)
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (1)
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment