Workflows

Search filter terms
Filter by type
Filter by tag
Filter by user
Filter by licence
Filter by group
Results per page:
Sort by:
Showing 3 results. Use the filters on the left and the search box below to refine the results.
Tag: arc User: Sven Group: SCAPE
Uploader

Workflow ARC to WARC Migration with CDX Index and w... (1)

Thumb
Workflow for migrating ARC to WARC and comparing the CDX index files (Linux). The workflow has an input port “input_directory” which is a local path to the directory containing the ARC files, and an input port “output_directory” which is the directory where the workflow outputs are created. The files in the input directory are migrated using the “arc2warc_migration_cli” tool service component to perform the migration. The “cdx_creator_arc” and “cdx_creator_warc” tool service components creat...

Created: 2014-07-09

Credits: User Sven

Uploader

Workflow ARC2WARC Hadoop Job (1)

Thumb
Just a wrapper workflow for a Hadoop job converting ARC to WARC files.

Created: 2014-03-06

Credits: User Sven

Uploader

Workflow ONB Web Archive Fits Characterisation usin... (2)

Thumb
Hadoop based workflow for applying FITS on the files contained in ARC web archive container files and ingest the FITS output in a MongoDB using C3PO. Dependencies: - Spacip (https://github.com/shsdev/spacip) - Tomar (https://github.com/openplanets/tomar) - C3PO (https://github.com/peshkira/c3po) Parameters: - hdfs_input_path: Path to a directory which contains textfile(s) with absolute HDFS paths to ARC files - num_files_per_invokation: Number of items to be processed per invokation - fits...

Created: 2013-12-09 | Last updated: 2013-12-10

Credits: User Sven

Results per page:
Sort by: