Workflow3Write_Text_File_outputFileExtractFeaturesworkingdir0threadscount0execpath0 This operation extracts SIFT features of the documents placed in given directory and stores them in xml.gz archives in the same directory. The threads count manages task execution in parallel. 2012-07-27 11:52:23.916 UTC net.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity 789663B8-DA91-428A-9F7D-B3F3DA185FD4 default local <?xml version="1.0" encoding="UTF-8"?> <localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation> 49ae7655-1567-4bfd-aabf-1d6a642c5fe1 python2.7 %%execpath%%FindDuplicates.py %%workingdir%% extract --threads %%threadscount%% 1200 1800 execpath threadscount workingdir threadscount threadscount false false false UTF-8 false false false workingdir workingdir false false false UTF-8 false false false execpath execpath false false false UTF-8 false false false false true true 0 false net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeTrainthreadscount0workingdir0execpath0 This operation analyzes visual words of each document and builds a BoW (Bag of Words) vocabulary, that comprises about 1000 distinctive visual words. These key poins are the most characteristical features of analyzed collection. The threads count manages task execution in parallel. 2012-07-27 11:56:36.514 UTC net.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity 789663B8-DA91-428A-9F7D-B3F3DA185FD4 default local <?xml version="1.0" encoding="UTF-8"?> <localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation> 49ae7655-1567-4bfd-aabf-1d6a642c5fe1 python2.7 %%execpath%%FindDuplicates.py %%workingdir%% train --threads %%threadscount%% 1200 1800 execpath threadscount workingdir threadscount threadscount false false false UTF-8 false false false workingdir workingdir false false false UTF-8 false false false execpath execpath false false false UTF-8 false false false false true true 0 false net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeThreadscountvalue00 This parameter defines how many threads should be used. 2012-07-30 15:33:05.269 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWorkingdirvalue00 This is a working directory where temporary files are stored. These files comprise SIFT features and BoW histogramms for associated document. 2012-07-30 15:32:09.294 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity /tmp/TestCollection net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeBoW_histogramsworkingdir0threadscount0execpath0 This operation counts visual words that are matching one of the key visual words from BoW dictionary and creates BoW histogram file for associated document. The threads count manages task execution in parallel. 2012-07-27 12:00:59.62 UTC net.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity 789663B8-DA91-428A-9F7D-B3F3DA185FD4 default local <?xml version="1.0" encoding="UTF-8"?> <localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation> 49ae7655-1567-4bfd-aabf-1d6a642c5fe1 python2.7 %%execpath%%FindDuplicates.py %%workingdir%% bowhist --threads %%threadscount%% 1200 1800 execpath threadscount workingdir threadscount threadscount false false false UTF-8 false false false workingdir workingdir false false false UTF-8 false false false execpath execpath false false false UTF-8 false false false false true true 0 false net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeCompareworkingdir0execpath0threadscount0STDOUT00 This operation performs nearest neighbour search for each document in collection. The main goal of the search is to find duplicates in collection. This method based on comparison of BoW histogram associated with particular document with all BoW histograms stored in collection. The result of comparison is a structural similarity score between 0 and 1. 0 stands for low similarity and 1 stands for high similarity. The threads count manages task execution in parallel. 2012-07-27 12:08:40.306 UTC net.sf.taverna.t2.activitiesexternal-tool-activity1.4net.sf.taverna.t2.activities.externaltool.ExternalToolActivity 789663B8-DA91-428A-9F7D-B3F3DA185FD4 default local <?xml version="1.0" encoding="UTF-8"?> <localInvocation><shellPrefix>/bin/sh -c</shellPrefix><linkCommand>/bin/ln -s %%PATH_TO_ORIGINAL%% %%TARGET_NAME%%</linkCommand></localInvocation> 49ae7655-1567-4bfd-aabf-1d6a642c5fe1 python2.7 %%execpath%%FindDuplicates.py %%workingdir%% compare --threads %%threadscount%% 1200 1800 execpath threadscount workingdir threadscount threadscount false false false UTF-8 false false false workingdir workingdir false false false UTF-8 false false false execpath execpath false false false UTF-8 false false false false true true 0 false net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeWrite_Text_FileoutputFile0filecontents0outputFile00net.sf.taverna.t2.activitieslocalworker-activity1.4net.sf.taverna.t2.activities.localworker.LocalworkerActivity outputFile 0 'text/plain' java.lang.String true filecontents 0 'text/plain' java.lang.String true encoding 0 'text/plain' java.lang.String true outputFile 0 'text/plain' 0 workflow net.sourceforge.taverna.scuflworkers.io.TextFileWriter net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeDuplicates_listvalue00 This is a name of resulting file that contains duplicates list. 2012-07-30 15:30:58.133 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity duplicates.txt net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeExecutable_pathvalue00 This is a path to the folder where executables are stored. 2012-07-30 15:32:35.468 UTC net.sf.taverna.t2.activitiesstringconstant-activity1.4net.sf.taverna.t2.activities.stringconstant.StringConstantActivity /usr/local/bin/ net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Parallelize 1 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.ErrorBouncenet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Failovernet.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.Retry 1.0 1000 5000 0 net.sf.taverna.t2.coreworkflowmodel-impl1.4net.sf.taverna.t2.workflowmodel.processor.dispatch.layers.InvokeExtractFeaturesworkingdirWorkingdirvalueExtractFeaturesthreadscountThreadscountvalueExtractFeaturesexecpathExecutable_pathvalueTrainthreadscountThreadscountvalueTrainworkingdirWorkingdirvalueTrainexecpathExecutable_pathvalueBoW_histogramsworkingdirWorkingdirvalueBoW_histogramsthreadscountThreadscountvalueBoW_histogramsexecpathExecutable_pathvalueCompareworkingdirWorkingdirvalueCompareexecpathExecutable_pathvalueComparethreadscountThreadscountvalueWrite_Text_FileoutputFileDuplicates_listvalueWrite_Text_FilefilecontentsCompareSTDOUTWrite_Text_File_outputFileWrite_Text_FileoutputFile 3d0d4616-4881-4e07-8443-c11684280063 2012-07-30 13:45:03.877 UTC 3c16c610-efd9-4ccd-bb0b-5284cf83ebfd 2012-07-30 10:52:44.235 UTC a508f535-9d46-4ab4-8222-22800c3de7a4 2012-07-27 11:52:19.922 UTC c8200316-6687-4f7c-b810-78c71f5123ba 2012-07-30 14:04:59.790 UTC 4d511337-d91c-4c13-b268-4be0eb4c68dd 2012-07-30 14:05:34.255 UTC 131fe79d-75f3-4cbe-b07b-2f221bce5179 2012-07-30 15:16:24.486 UTC c879cb55-e325-4913-9ad2-559244242bfa 2012-07-27 10:11:35.973 UTC 7323c22c-2564-4a94-8452-5412b06248fe 2012-07-27 11:56:36.691 UTC d6af897b-7641-4177-9cdb-b6ed15215baa 2012-07-30 15:33:05.460 UTC 25df8257-8693-4f1a-81d0-ed66a6e1902b 2012-07-27 11:41:54.124 UTC 0b000648-e97a-42f6-b1c6-150c31dbdae3 2012-07-27 12:08:40.504 UTC 46440496-e71b-4e40-8ceb-9783963fe152 2012-07-30 14:20:19.318 UTC d999c44e-b988-4c56-82f0-2d1440e264d5 2012-07-30 13:23:35.35 UTC 7197f3fe-2721-4377-a480-1b1263bb11d9 2012-07-30 13:48:32.843 UTC 0654ef6a-bd82-42e2-96c4-fcb5586eebf4 2012-07-30 10:58:09.521 UTC 1cb796e2-710a-47a5-8560-6a5e0739f6a8 2012-07-30 14:23:52.342 UTC 32bfa39a-5e98-4024-a8d6-247e23b9c302 2012-07-30 13:49:07.765 UTC 8772a31c-4aaf-47a4-9163-9134ddc47868 2012-07-30 15:22:20.171 UTC 8772a31c-4aaf-47a4-9163-9134ddc47868 2012-07-30 14:25:30.505 UTC e22740d4-e377-41dc-9a9f-82cc5270871a 2012-07-27 12:16:38.754 UTC 041ffb49-2dd0-486a-8776-a74ed5c05d35 2012-07-30 09:36:13.590 UTC 46440496-e71b-4e40-8ceb-9783963fe152 2012-07-30 13:59:20.239 UTC 6456b13b-d403-42c8-a0d9-b46578e9cc2f 2012-07-30 13:30:03.965 UTC 34c0b22f-ed54-4485-9688-e6c2b64e1bef 2012-07-30 14:02:18.712 UTC 6364fa9e-f77c-4840-acbf-7440a2e9338d 2012-07-30 14:06:23.285 UTC cb7e57ab-ea4c-4e2e-af17-f4698d8c7b32 2012-07-30 14:21:06.963 UTC 6364fa9e-f77c-4840-acbf-7440a2e9338d 2012-07-30 14:03:29.505 UTC 2c25b62f-4d9b-436d-8524-96c1a4e6e591 2012-07-27 12:00:59.250 UTC