Tag Results
Items tagged with "hive" (1)
Note: some items may not be visible to you, due to viewing permissions.
Workflows (1)
Hadoop Large Document Collection Data Prep... (1)
Workflow for preparing large document collections for data analysis. Different types of hadoop jobs (Hadoop-Streaming-API, Hadoop Map/Reduce, and Hive) are used for specific purposes.
The *PathCreator components create text files with absolute file paths using the unix command 'find'. The workflow then uses 1) a Hadoop Streaming API component (HadoopStreamingExiftoolRead) based on a bash script for reading image metadata using Exiftool, 2) the Map/Reduce component (HadoopHocrAvBlockWidthMapR...
Created: 2012-08-17 | Last updated: 2012-08-18
Credits: Sven