Finding all Examples that have duplicate values in certain attributes

Created: 2010-06-18 08:59:39

This process will retrieve all examples, who have identical values in a specific attribute. For testing, the following data can be writen into the file, that will be read by the Read CSV operator:

CID,Value 3596,X 4054,X 4054,X 3000,S 3000,T 3000,U 32135,S

The target of this process is to return the two examples having the same value in the CID column.

To achieve this, first a real id is generated by the generate id. After this, we have to find all duplicates: For this we first remove duplicates based on the single attribute CID. We subtract the result from the original example set in order to get all values of CID, that occure more than once. Since they might occur more than once in this set, we use a Remove Duplicates to have each value just once, so that we can join them after defining them to be the id.

Information Preview

Information Run

Not available


Information Workflow Components

Unavailable

Information Workflow Type

RapidMiner

Information Uploader

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (0)

(People/Groups)

None

Information Attributions (0)

(Workflows/Files)

None

Information Tags (0)

None

Log in to add Tags

Information Shared with Groups (0)

None

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

 

Citations (0)

None


Version History

In chronological order:



Reviews Reviews (0)

No reviews yet

Be the first to review!



Comments Comments (0)

No comments yet

Log in to make a comment




Workflow Other workflows that use similar services (0)

There are no workflows in myExperiment that use similar services to this Workflow.