Find Duplicates using Matchbox command line tool

Created: 2012-07-31 11:00:21      Last updated: 2012-07-31 11:35:12

The workflow takes a list of digital documents as input, extracts SIFT features using image processing algorithms, creates dictionary of visual words, generates BoW (Bag of Words) histogramms and finds duplicates. The count of parallel threads can be passed as a parameter. Finally search results are stored in a text file that contains a list of possible duplicates with associated similarity score. This score values are spread between 0 (low similarity) and 1 (high similarity). Image comparison is performed by Matchbox command line tool and associated python scripts. The sources of the Matchbox tool are located in

Information Preview

Information Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
[ More InfoExpand ]

Information Workflow Components

Information Authors (0)
Information Titles (0)
Information Descriptions (0)
Information Dependencies (0)
Inputs (0)
Processors (9)
Beanshells (0)
Outputs (1)
Datalinks (15)
Coordinations (3)

Information Workflow Type

Taverna 2

Information Uploader

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (1)


Information Attributions (0)



Information Tags (5)

Log in to add Tags

Information Shared with Groups (1)

Information Featured In Packs (0)


Log in to add to one of your Packs

Information Attributed By (0)



Information Favourited By (0)

No one

Information Statistics


Citations (0)


Version History

In chronological order:

Reviews Reviews (0)

No reviews yet

Be the first to review!

Comments Comments (0)

No comments yet

Log in to make a comment

Workflow Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.