Slim Migrate And QA mp3 to Wav Using Hadoop Jobs.
This workflow migrates an input list (available on HDFS) of mp3 files (available on NFS) to wav files (in output directory on NFS) using an ffmpeg Hadoop job. The workflow then compares content of the original mp3 and the migrated wav by first converting the two files to wav using an mpg123 Hadoop job and the identity function respectively, and then using an xcorrSound waveform-compare Hadoop job.
The needed Hadoop jobs are available from
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/4080/download?version=3
[ More Info ]
Taverna is available from http://taverna.sourceforge.net/
If you are having problems downloading it in Taverna, you may need to provide your username and password in the URL so that Taverna can access the Workflow:
Replace http:// in the link above with http://yourusername:yourpassword@
Workflow Components
Bolette A. Jurik, Statsbiblioteket & SCAPE |
Slim Migrate And QA mp3 to Wav Using Hadoop Jobs. |
This workflow migrates an input list (available on HDFS) of mp3 files (available on NFS) to wav files (in output directory on NFS) using an ffmpeg Hadoop job. The workflow then compares content of the original mp3 and the migrated wav by first converting the two files to wav using an mpg123 Hadoop job and the identity function respectively, and then using an xcorrSound waveform-compare Hadoop job. The needed Hadoop jobs are available from |
None
Name | Description |
---|---|
nfs_output_path | Output directory fora the migrated wav files on nfs. |
mp3_list_on_hdfs_input_path | path to input file on hdfs containing list of paths to mp3 files on nfs to be migrated |
hdfs_output_path_2 | Output directory for preservation event files and other log files. |
mapreduce_output_path | output directory for Hadoop output |
jar_input_path | The directory where the jar file with the hadoop jobs is. |
max_split_size | max-split-size is the max input size to a Hadoop map task. The input to these Hadoop jobs are file lists, and we actually want a very small max-split-size, so each map task only gets few files to process. |
remove_wav_files_really_remove |
Name | Type | Description |
---|---|---|
FfmpegMigrate_Tavern | workflow | |
Mpg321Convert_Tavern | workflow | |
MakeWavFilePairsList | beanshell |
ScriptList wavFilePathPairs = new ArrayList(); for (int i=0; i |
OutputDirFfmpegJob | beanshell |
ScriptString ffmpegHadoopJobOutputDir = outputdir; if (ffmpegHadoopJobOutputDir.endsWith("/")) { ffmpegHadoopJobOutputDir = ffmpegHadoopJobOutputDir.substring(0, ffprobeHadoopJobOutputDir.length()-1); } ffmpegHadoopJobOutputDir = ffmpegHadoopJobOutputDir + "_ffmpeg"; |
OutputDirMpg321Job | beanshell |
ScriptString mpg321HadoopJobOutputDir = outputdir; if (mpg321HadoopJobOutputDir.endsWith("/")) { mpg321HadoopJobOutputDir = mpg321HadoopJobOutputDir.substring(0, ffprobeHadoopJobOutputDir.length()-1); } mpg321HadoopJobOutputDir = mpg321HadoopJobOutputDir + "_mpg321"; |
Split_string_into_string_list_by_regular_expression | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Formatter | beanshell |
Scriptint returnCode = 0; String[] splitOnWhitespace; List formatted = new ArrayList(); for (int i = 0; i < lines.size(); i++){ String line=lines.get(i); if (line.matches("^\\d+\\s+.*")){ splitOnWhitespace = line.split("\\s+"); returnCode = Integer.parseInt(splitOnWhitespace[0]); line = line.replaceFirst("^\\d+\\s+","").trim(); } formatted.add(line); } Collections.sort(formatted); |
Formatter_2 | beanshell |
Scriptint returnCode = 0; String[] splitOnWhitespace; List formatted = new ArrayList(); for (int i = 0; i < lines.size(); i++){ String line=lines.get(i); if (line.matches("^\\d+\\s+.*")){ splitOnWhitespace = line.split("\\s+"); returnCode = Integer.parseInt(splitOnWhitespace[0]); line = line.replaceFirst("^\\d+\\s+","").trim(); } formatted.add(line); } Collections.sort(formatted); |
regex_value | stringconstant |
Value\n |
Split_string_into_string_list_by_regular_expression_2 | localworker |
ScriptList split = new ArrayList(); if (!string.equals("")) { String regexString = ","; if (regex != void) { regexString = regex; } String[] result = string.split(regexString); for (int i = 0; i < result.length; i++) { split.add(result[i]); } } |
Merge_String_List_to_a_String | localworker |
ScriptString seperatorString = "\n"; if (seperator != void) { seperatorString = seperator; } StringBuffer sb = new StringBuffer(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); sb.append(item); if (i.hasNext()) { sb.append(seperatorString); } } concatenated = sb.toString(); |
WriteFilePairListToHDFS | externaltool | |
Write_Text_File | localworker |
ScriptBufferedWriter out; if (encoding == void) { out = new BufferedWriter(new FileWriter(outputFile)); } else { out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), encoding)); } out.write(filecontents); out.flush(); out.close(); outputFile = filecontents; |
utf8 | stringconstant |
Valueutf8 |
WavFilePairListFullPathOnNFS | localworker |
Scriptoutput = string1 + "/" + string2; |
wavFilePairsList.txt | stringconstant |
ValuewavFilePairsList.txt |
OutputDirWaveformCompareJob | beanshell |
ScriptString waveformcompareHadoopJobOutputDir = outputdir; if (waveformcompareHadoopJobOutputDir.endsWith("/")) { waveformcompareHadoopJobOutputDir = waveformcompareHadoopJobOutputDir.substring(0, ffprobeHadoopJobOutputDir.length()-1); } waveformcompareHadoopJobOutputDir = waveformcompareHadoopJobOutputDir + "_waveform-compare"; |
xcorrSound_waveform_ | workflow | |
WavFilePairListFullPathOnHDFS | localworker |
Scriptoutput = string1 + "/" + string2; |
remove_wav_files | beanshell |
Scriptboolean really_remove_boolean = Boolean.parseBoolean(really_remove); if (really_remove_boolean) { boolean success = true; for (int i=0; i |
remove_wav_files_2 | beanshell |
Scriptboolean really_remove_boolean = Boolean.parseBoolean(really_remove); if (really_remove_boolean) { boolean success = true; for (int i=0; i |
Really_really_remove | beanshell |
Scriptif (really_remove.trim().equals("true") || really_remove.trim().equals("false")) { really_really_remove = really_remove; } else { really_really_remove = "false"; } |
Name | Description | Inputs | Outputs |
---|---|---|---|
MakeWavFilePairsList |
ffmpegMigratedWavPaths mpg321ConvertedWavPaths |
wavFilePathPairs | |
OutputDirFfmpegJob | outputdir | ffmpegHadoopJobOutputDir | |
OutputDirMpg321Job | outputdir | mpg321HadoopJobOutputDir | |
Formatter | lines | formatted | |
Formatter_2 | lines | formatted | |
OutputDirWaveformCompareJob | outputdir | waveformcompareHadoopJobOutputDir | |
remove_wav_files |
file_list really_remove |
success | |
remove_wav_files_2 |
file_list really_remove |
success | |
Really_really_remove | really_remove | really_really_remove |
Name | Description |
---|---|
xcorrSound_waveform__GetResultsFromHadoopJob_STDERR | |
xcorrSound_waveform__GetResultsFromHadoopJob_STDOUT | |
xcorrSound_waveform__HadoopJob_STDERR | |
xcorrSound_waveform__HadoopJob_STDOUT | |
WriteFilePairListToHDFS_STDERR | |
WriteFilePairListToHDFS_STDOUT | |
remove_wav_files_success | |
remove_wav_files_2_success |
Source | Sink |
---|---|
nfs_output_path | FfmpegMigrate_Tavern:nfs_output_path |
mp3_list_on_hdfs_input_path | FfmpegMigrate_Tavern:mp3_list_on_hdfs_input_path |
hdfs_output_path_2 | FfmpegMigrate_Tavern:hdfs_output_path_2 |
OutputDirFfmpegJob:ffmpegHadoopJobOutputDir | FfmpegMigrate_Tavern:mapreduce_output_path |
jar_input_path | FfmpegMigrate_Tavern:jar_input_path |
max_split_size | FfmpegMigrate_Tavern:max_split_size |
hdfs_output_path_2 | Mpg321Convert_Tavern:hdfs_output_path_2 |
nfs_output_path | Mpg321Convert_Tavern:nfs_output_path |
mp3_list_on_hdfs_input_path | Mpg321Convert_Tavern:mp3_list_on_hdfs_input_path |
OutputDirMpg321Job:mpg321HadoopJobOutputDir | Mpg321Convert_Tavern:mapreduce_output_path |
jar_input_path | Mpg321Convert_Tavern:jar_input_path |
max_split_size | Mpg321Convert_Tavern:max_split_size |
Formatter:formatted | MakeWavFilePairsList:ffmpegMigratedWavPaths |
Formatter_2:formatted | MakeWavFilePairsList:mpg321ConvertedWavPaths |
mapreduce_output_path | OutputDirFfmpegJob:outputdir |
mapreduce_output_path | OutputDirMpg321Job:outputdir |
regex_value:value | Split_string_into_string_list_by_regular_expression:regex |
FfmpegMigrate_Tavern:GetResultsFromHadoopJob_STDOUT | Split_string_into_string_list_by_regular_expression:string |
Split_string_into_string_list_by_regular_expression:split | Formatter:lines |
Split_string_into_string_list_by_regular_expression_2:split | Formatter_2:lines |
Mpg321Convert_Tavern:GetResultsFromHadoopJob_STDOUT | Split_string_into_string_list_by_regular_expression_2:string |
regex_value:value | Split_string_into_string_list_by_regular_expression_2:regex |
MakeWavFilePairsList:wavFilePathPairs | Merge_String_List_to_a_String:stringlist |
mapreduce_output_path | WriteFilePairListToHDFS:mapreduce_output_path |
WavFilePairListFullPathOnNFS:output | WriteFilePairListToHDFS:file_pair_list_on_nfs |
Merge_String_List_to_a_String:concatenated | Write_Text_File:filecontents |
utf8:value | Write_Text_File:encoding |
WavFilePairListFullPathOnNFS:output | Write_Text_File:outputFile |
wavFilePairsList.txt:value | WavFilePairListFullPathOnNFS:string2 |
nfs_output_path | WavFilePairListFullPathOnNFS:string1 |
mapreduce_output_path | OutputDirWaveformCompareJob:outputdir |
WavFilePairListFullPathOnHDFS:output | xcorrSound_waveform_:wav_file_pairs_list_on_hdfs_input_path |
nfs_output_path | xcorrSound_waveform_:nfs_output_path |
OutputDirWaveformCompareJob:waveformcompareHadoopJobOutputDir | xcorrSound_waveform_:mapreduce_output_path |
hdfs_output_path_2 | xcorrSound_waveform_:hdfs_output_path_2 |
jar_input_path | xcorrSound_waveform_:jar_input_path |
max_split_size | xcorrSound_waveform_:max_split_size |
wavFilePairsList.txt:value | WavFilePairListFullPathOnHDFS:string2 |
mapreduce_output_path | WavFilePairListFullPathOnHDFS:string1 |
Formatter:formatted | remove_wav_files:file_list |
Really_really_remove:really_really_remove | remove_wav_files:really_remove |
Formatter_2:formatted | remove_wav_files_2:file_list |
Really_really_remove:really_really_remove | remove_wav_files_2:really_remove |
remove_wav_files_really_remove | Really_really_remove:really_remove |
xcorrSound_waveform_:GetResultsFromHadoopJob_STDERR | xcorrSound_waveform__GetResultsFromHadoopJob_STDERR |
xcorrSound_waveform_:GetResultsFromHadoopJob_STDOUT | xcorrSound_waveform__GetResultsFromHadoopJob_STDOUT |
xcorrSound_waveform_:HadoopJob_STDERR | xcorrSound_waveform__HadoopJob_STDERR |
xcorrSound_waveform_:HadoopJob_STDOUT | xcorrSound_waveform__HadoopJob_STDOUT |
WriteFilePairListToHDFS:STDERR | WriteFilePairListToHDFS_STDERR |
WriteFilePairListToHDFS:STDOUT | WriteFilePairListToHDFS_STDOUT |
remove_wav_files:success | remove_wav_files_success |
remove_wav_files_2:success | remove_wav_files_2_success |
Controller | Target |
---|---|
Write_Text_File | WriteFilePairListToHDFS |
xcorrSound_waveform_ | remove_wav_files_2 |
WriteFilePairListToHDFS | xcorrSound_waveform_ |
xcorrSound_waveform_ | remove_wav_files |
Workflow Type
Version 3 (of 4)
- audio
- |
- comparison
- |
- conversion
- |
- ffmpeg
- |
- hadoop
- |
- large scale digital repositories
- |
- lsdr
- |
- migration
- |
- mpg321
- |
- qa
- |
- quality assurance
- |
- scape
- |
- significant properties
- |
- testbed
- |
- waveform-compare
- |
- xcorrsound
- audio
- |
- comparison
- |
- conversion
- |
- ffmpeg
- |
- hadoop
- |
- large scale digital repositories
- |
- lsdr
- |
- migration
- |
- mpg321
- |
- qa
- |
- quality assurance
- |
- scape
- |
- significant properties
- |
- testbed
- |
- waveform-compare
- |
- xcorrsound
Log in to add Tags
Shared with Groups (1)
Statistics
In chronological order:
-
Created by Bolette Jurik on Friday 21 February 2014 14:23:31 (UTC)
-
Created by Bolette Jurik on Monday 30 June 2014 09:05:55 (UTC)
Revision comment:updated with working optional deletion of output files
-
Created by Bolette Jurik on Monday 30 June 2014 09:17:36 (UTC)
Revision comment:updated with description
-
Created by Bolette Jurik on Monday 30 June 2014 09:22:10 (UTC)
Revision comment:updated description 2
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment