ONB Web Archive Fits Characterisation using ToMaR
Created: 2013-12-09 15:58:54
Last updated: 2013-12-10 17:06:09
Hadoop based workflow for applying FITS on the files contained in ARC web archive container files and ingest the FITS output in a MongoDB using C3PO.
Dependencies:
- Spacip (https://github.com/shsdev/spacip)
- Tomar (https://github.com/openplanets/tomar)
- C3PO (https://github.com/peshkira/c3po)
Parameters:
- hdfs_input_path: Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files
- num_files_per_invokation: Number of items to be processed per invokation
- fits_local_tmp_dir: Local directory where the FITS output XML files will be stored
- c3po_collection_name: Name of the C3P0 collection
The workflow uses Spacip to unpackage the ARC container files into HDFS and creating input files which can be used by ToMaR. After merging the mapper output files from Spacip (MergeTomarInput) into one single file, the FITS characterisation process is invoked by ToMaR as a MapReduce job. The tool invokation depends on a tool specification file which must be available in HDFS, this is explained in the Tomar documentation.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (1)
Titles (1)
ONB Web Archive Fits Characterisation using ToMaR |
Descriptions (1)
Hadoop based workflow for applying FITS on the files contained in ARC web archive container files and ingest the FITS output in a MongoDB using C3PO.
Dependencies:
- Spacip (https://github.com/shsdev/spacip)
- Tomar (https://github.com/openplanets/tomar)
- C3PO (https://github.com/peshkira/c3po)
Parameters:
- hdfs_input_path: Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files
- num_files_per_invokation: Number of items to be processed per invokation
- fits_local_tmp_dir: Local directory where the FITS output XML files will be stored
- c3po_collection_name: Name of the C3P0 collection
The workflow uses Spacip to unpackage the ARC container files into HDFS and creating input files which can be used by ToMaR. After merging the mapper output files from Spacip (MergeTomarInput) into one single file, the FITS characterisation process is invoked by ToMaR as a MapReduce job. The tool invokation depends on a tool specification file which must be available in HDFS, this is explained in the Tomar documentation. |
Dependencies (0)
Inputs (4)
Name |
Description |
hdfs_input_path |
Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files
|
num_files_per_invokation |
Number of items to be processed per invokation
|
fits_local_tmp_dir |
Local directory where the FITS output XML files will be stored
|
c3po_collection_name |
Name of the C3P0 collection
|
Processors (6)
Name |
Type |
Description |
Spacip |
externaltool |
|
MergeTomarInput |
externaltool |
|
Tomar |
externaltool |
|
CopyFitsOutput |
externaltool |
|
C3POIngest |
externaltool |
|
CleanUp |
externaltool |
|
Outputs (2)
Name |
Description |
Tomar_STDOUT |
|
C3POIngest_STDOUT |
|
Datalinks (10)
Source |
Sink |
hdfs_input_path |
Spacip:hdfs_input_path |
num_files_per_invokation |
Spacip:num_files_per_invokation |
Spacip:STDOUT |
MergeTomarInput:spacip_joboutput_hdfs_dir |
MergeTomarInput:STDOUT |
Tomar:merged_tomar_input |
fits_local_tmp_dir |
CopyFitsOutput:fits_local_tmp_dir |
fits_local_tmp_dir |
C3POIngest:fits_local_tmp_dir |
c3po_collection_name |
C3POIngest:c3po_collection_name |
fits_local_tmp_dir |
CleanUp:fits_local_tmp_dir |
Tomar:STDOUT |
Tomar_STDOUT |
C3POIngest:STDOUT |
C3POIngest_STDOUT |
Coordinations (3)
Controller |
Target |
C3POIngest |
CleanUp |
CopyFitsOutput |
C3POIngest |
Tomar |
CopyFitsOutput |
Uploader
License
All versions of this Workflow are
licensed under:
Version 2 (latest)
(of 2)
Credits (1)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (3)
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment