RCOMM Challenge 2: Broken Iris

Created: 2010-09-17 08:55:16

Download Workflow

At the RComm 2010 (www.rcomm2010.org), an unusual competition was held. Titled "Who Wants to Be a Data Miner", three challenges were issued to the participants of the conference. In all challenges, participants had to design RapidMiner processes as quickly as possible. This is the winning process of Challenge 2: "Broken Iris" by Nico Piatkowski. This was the task:

You are given a decision tree model (M) designed on the well-known Iris data set and unlabelled data (U) on which the model is to be applied. Unfortunately, the unlabelled data set misses one of the four original attributes (a4), so the model is not immediately applicable. Even worse, you also cannot recreate the model, since in last night's database crash the label column of the original labelled data set (L) was lost. However, this example set contains all four columns. The task is: Given M, U, and L, find a way to apply M to U and convince the audience that this does something useful. (The audience does not accept just re-creating the missing column and filling it with constant or random values.)

The solution uses L to create a regression model that predicts a4 from a1, a2, and a3 and uses this to add an attribute a4 to U. Then, M can be applied to U.

myExperiment workflow "RCOMM Challenge 2: Broken Iris (Preparation)" creates the original input M, U, and T from RapidMiner's Iris sample.