Description:
Sample workflow that exploits background knowledge in the form of links connecting attributes in the different measurement data.
The input data is assumed to be labeled and consists of different interconnected biological levels. Additionally a number of files defining links between attributes in the different measurement data are provided. This data was used in the KUP data mining challenge (http://tunedit.org/challenge/ON). Here we focus on two biological levels: protein data measured by LCMS, and miRNA is measured using specific pan-miRNA arrays. The goal is to build a regression model for predicting Pelvic Diameter that has a good performance on a holdout miRNA data.
This WF first selects 20 proteins using ReliefF from LCMS data labeled with Differential Renal Function, then it selects 20 features from miRNA data labeled with Pelvic Diameter, then it extends the 20 miRNAs by those miRNAs that are related with the selected 20 proteins from LCMS. This procedure gives rise to 54 miRNAs in total. Then we compare the performance of a simple linear SVM evaluated on a holdout miRNA data and trained on the miRNA data with only 20 features, with 54 features obtained from both miRNA and LCMS, and 54 features selected only from miRNA.
Comments (0)
No comments yet
Log in to make a comment