Group: e-LICO
e-LICO -- An e-Laboratory for Interdisciplinary Collaborative Research in Data Mining and Data-Intensive Science
The goal of the e-LICO project is to build a virtual laboratory for interdisciplinary collaborative research in data mining and data-intensive sciences. The proposed e-lab will comprise three layers: the e-science and data mining layers will form a generic research environment that can be adapted to different scientific domains by customizing the application layer.
The e-science layer, built on an open-source e-science infrastructure developed by one of the partners, will support content creation through collaboration at multiple scales and degrees of commitment — ranging from small, contract-bound teams to voluntary, constraint-free participation in dynamic virtual communities.
The data mining layer will be the distinctive core of e-LICO; it will provide comprehensive multimedia (structured records, text, images, signals) data mining tools. Standard tools will be augmented with preprocessing or learning algorithms developed specifically to meet challenges of data-intensive, knowledge rich sciences, such as ultra-high dimensionality or undersampled data. Methodologically sound use of these tools will be ensured by a knowledge-driven data mining assistant, which will rely on a data mining ontology and knowledge base to plan the mining process and propose ranked workflows for a given application problem. Extensive e-lab monitoring facilities will automate the accumulation of experimental meta-data to support replication and comparison of data mining experiments. These meta-data will be used by a meta-miner, which will combine probabilistic reasoning with kernel-based learning from complex structures to incrementally improve the assistant's workflow recommendations.
e-LICO will be showcased in a systems biology task: biomarker discovery and molecular pathway modelling for diseases affecting the kidney and urinary pathways.
Created at: Monday 16 November 2009 08:24:00 (UTC)
Unique name: elico
-
Sebastian land shared Using Remember / Recall for "tunneling" resultsThis process shows how Remeber and Recall operators can be used for passing results from one position to another position in the process, when it's impossible to make a direct connection. This process introduces another advanced RapidMiner technique: The macro handling. We have used the predefined ma …Wednesday 08 May 2013 14:52:01 (UTC)
-
Simon Fischer shared Image Mining with RapidMinerThis is an image mining process using the image mining Web service provided by NHRF within e-Lico. It first uploads a set of images found in a directory, then preprocesses the images and visualizes the result. Furthermore, references to the uploaded images are stored in the local RapidMiner repositor …Wednesday 08 May 2013 14:52:01 (UTC)
-
James Eales shared PDF to plain textThis workflow will extract the plain text content of PDF files supplied to the input port. You can connect the Load PDF from directory workflow to this workflows input. We recommend you send the output from this workflow to the Clean plain text workflow, because the PDF to text process can add …Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Sentence splittingThis workflow will attempt to split up text into sentences, returning a list of sentences to the output port. The sentence splitting service makes use of the OpenNLP sentence detector and has been trained to work on english text. This workflow can be used to provide input to the Termine with c- …Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Termine with c-value thresholdThis workflow accepts a list of sentences from a single document and returns the terms found by the TerMine web service. It also allows you to set a threshold c-value score so that only terms with a user-controlled probability (of being a real term) are returned as an output. To get sentences …Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Clean plain text (ASCII)This workflow will remove any XML-invalid and non-ASCII characters (e.g. for sending to the ASCII-only Termine service) from any text supplied to the input port. This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Terms from collection of PDF filesThis workflow will give you a set of candidate terms for each PDF document in a user-specified directory. You can also specify a c-value threshold that will restrict the terms to those with higher scores. This workflow was created using only nested workflows. These workflow components work on t …Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Clean plain textThis workflow will remove any XML-invalid characters (these characters often appear in the output of PDF to text software) from any text supplied to the input port. This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Terms from collection of text filesThis workflow will give you a set of candidate terms for each text file in a user-specified directory. You can also specify a c-value threshold that will restrict the terms to those with higher scores. This workflow was created using only nested workflows. These workflow components work on thei …Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Load plain text from directoryThis workflow will automate the reading of a set of text files stored in a single directory (the path to which should be supplied as a single input value). It will assume that the text files are saved using the default character encoding for the system that Taverna is running on. This is …Wednesday 08 May 2013 14:52:00 (UTC)
-
James Eales shared Load PDF from directoryThis workflow will automate the reading of a set of PDF files stored in a single directory (the path to which should be supplied as a single input value). This is a workflow component, designed to be used as a nested workflow inside a larger text mining or text processing workflow.Wednesday 08 May 2013 14:52:00 (UTC)