parse_csv_points

Created: 2013-12-24 11:41:36 Last updated: 2014-11-18 18:08:26

Download Workflow

Parses csv content with species occurrence points in the DarwinCore archive format, determining column indexes and returning the records as a list of points in openModeller format (XML). No distinction is made between presences or absences.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/3970/download?version=4
[ More Info Expand ]

Workflow Components

Authors (1)

Titles (1)

Descriptions (1)

Dependencies (0)

Inputs (1)

Name	Description
points_csv	Comma-separated list of values. Each line in the file corresponds to a different record. The first line must be a header containing column names also separated by comma. The following columns are mandatory AND must be spelled EXACTLY as follows: occurrenceID, nameComplete, decimalLongitude and decimalLatitude. Other columns can be present. Columns can be in any order, but they must match the order of the corresponding values in the following lines.

Processors (4)

Name	Type	Description
Merge_String_List_to_a_String	localworker	Script String seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString();
parse_header	beanshell	Script import java.io.StringReader; import au.com.bytecode.opencsv.CSVReader; CSVReader reader = new CSVReader(new StringReader(csv_content),',','"'); int name_idx = -1; int id_idx = -1; int long_idx = -1; int lat_idx = -1; String[] header = reader.readNext(); if ( header != null) { List terms = Arrays.asList(header); name_idx = terms.indexOf("nameComplete"); id_idx = terms.indexOf("occurrenceID"); long_idx = terms.indexOf("decimalLongitude"); lat_idx = terms.indexOf("decimalLatitude"); } else { throw new RuntimeException("The input file provided for species occurrence points is empty."); } if ( name_idx == -1 ) { throw new RuntimeException("The column nameComplete is missing from the header of the input points file."); } if ( long_idx == -1 ) { throw new RuntimeException("The column decimalLongitude is missing from the header of the input points file."); } if ( lat_idx == -1 ) { throw new RuntimeException("The column decimalLatitude is missing from the header of the input points file."); }
get_first_taxon	beanshell	Script import java.io.StringReader; import au.com.bytecode.opencsv.CSVReader; CSVReader reader = new CSVReader(new StringReader(csv_content),',','"'); String taxon_name = ""; int name_idx_int = Integer.parseInt(name_idx); String[] first_line = reader.readNext(); String[] second_line = reader.readNext(); if (second_line != null) { if (name_idx_int >= 0) { if (second_line.length > name_idx_int) { taxon_name = second_line[name_idx_int]; } } } else { throw new RuntimeException("The input file provided for species occurrence points has no other lines after the header."); }
extract_taxon_points	beanshell	Script import java.io.StringReader; import au.com.bytecode.opencsv.CSVReader; CSVReader reader = new CSVReader(new StringReader(csv_content),',','"'); int name_idx_int = Integer.parseInt(name_idx); int id_idx_int = Integer.parseInt(id_idx); int long_idx_int = Integer.parseInt(long_idx); int lat_idx_int = Integer.parseInt(lat_idx); int max_idx = Math.max(name_idx_int, Math.max(id_idx_int, Math.max(long_idx_int, lat_idx_int))); ArrayList all_points = new ArrayList(); String id; int i = 0; String [] line; while ((line = reader.readNext()) != null) { i++; if (i == 1) { continue; } if (line.length > max_idx) { if (id_idx_int == -1) { id = String.valueOf(i); } else { id = line[id_idx_int]; } if (taxon_name == void \|\| line[name_idx_int].equals(taxon_name)) { all_points.add(""); } } } num_points = all_points.size();

Beanshells (3)

Name	Inputs	Outputs
parse_header	csv_content	name_idx id_idx long_idx lat_idx
get_first_taxon	csv_content name_idx	taxon_name
extract_taxon_points	csv_content name_idx taxon_name id_idx long_idx lat_idx	all_points num_points

Outputs (6)

Name	Description
id_idx	Index of the occurrenceID field in the header (starting with 0).
long_idx	Index of the decimalLongitude field in the header (starting with 0).
lat_idx	Index of the decimalLatitude field in the header (starting with 0).
first_taxon_name	First taxon name found in the csv content.
points_xml	List of all points (separated by new line) already in XML format for openModeller.
num_points	Number of points.

Datalinks (15)

Source	Sink
extract_taxon_points:all_points	Merge_String_List_to_a_String:stringlist
points_csv	parse_header:csv_content
points_csv	get_first_taxon:csv_content
parse_header:name_idx	get_first_taxon:name_idx
points_csv	extract_taxon_points:csv_content
parse_header:id_idx	extract_taxon_points:id_idx
parse_header:lat_idx	extract_taxon_points:lat_idx
parse_header:long_idx	extract_taxon_points:long_idx
parse_header:name_idx	extract_taxon_points:name_idx
parse_header:id_idx	id_idx
parse_header:long_idx	long_idx
parse_header:lat_idx	lat_idx
get_first_taxon:taxon_name	first_taxon_name
Merge_String_List_to_a_String:concatenated	points_xml
extract_taxon_points:num_points	num_points

Coordinations (0)

Information Workflow Type

Taverna 2

Information Uploader

Renato De Giovanni

Information Component Validity

Information License

All versions of this Workflow are licensed under:

Information Version 4 (latest) (of 4)

Information Credits (0)

(People/Groups)

None

Information Attributions (0)

(Workflows/Files)

None

Information Tags (1)

Uploader tags

component

Log in to add Tags

Information Shared with Groups (1)

BioVeL

Information Featured In Packs (1)

ENM components

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

1589 viewings

3477 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

parse_csv_points

Created by Renato De Giovanni on Tuesday 24 December 2013 11:41:34 (UTC)
parse_csv_points

Created by Renato De Giovanni on Wednesday 02 April 2014 23:13:34 (UTC)
parse_csv_points

Created by Renato De Giovanni on Monday 17 November 2014 19:04:51 (UTC)
parse_csv_points

Created by Renato De Giovanni on Tuesday 18 November 2014 18:08:26 (UTC)