RCOMM 2013 Challenge: 2. Solution (Re-infer potential attribute values from model)
This process is the solution for one of the RCOMM 2013 data mining challenge tasks which participants had to solve within 10 minutes. The task was this: Given
(1) a variant of the Golf data set (found in the //Samples/data folder) where the attribute Outlook is missing,
(2) a decision tree model built on the complete Golf data set, and
(3) a utility data set containing only the three distinct values of Golf,
create an example set based on the incomplete data set from (1) containing all possible values for the Outlook attribute which are compatible with the given decision tree (2), i.e. the prediction made by the decision tree matches the true label. This is obviously a superset of the original Golf data set.
This solution is surprisingly simple. To run the process, please produce the required input first, by running the "Generate Input" process. The solution works by building the cartesian product of the incomplete data set (1) with the possible values of Outlook (3). Now, we have an example set with three times the number of rows of the original incomplete data set, some which are incompatible with the model. We apply the model and then apply a "Filter Examples" operator which keeps only "correct_predictions".
Preview
Run
Not available
Workflow Components
Unavailable
Reviews (0)
Other workflows that use similar services (0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment