Concept Profile Generation Pipeline
Requirements:
Have MySQL installed with a database called mydb or please create one. Have mysql-connector-java-5.1.28-bin.jar or better placed in the local JAR files. Choose this dependency by rightclicking a beanshell script and selecting it for the whole workflow.
Download the Peregrine SKOS CLI from: https://trac.nbic.nl/biosemantics/downloads Install LVG2013Lite. See: https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI
Change lvg.properties LVG_DIR=/home/path/to/lvg2013lite/ This file can be found in:
/home/path/to/lvg2013lite/data/config/lvg.properties
Copy properties file from production.properties which can be found on: https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI
Change the normalizer.lvg.properties and normalizer.lvg.binaryCache properties to point to the LVG installation path. This file can be obtained from https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI
Have Peregrine indexer installed locally and give it the correct path in the component named "Indexer_tool". Which can be found in the nested workflow named Peregrine indexer.
If using a Windows machine please install cygwin. Set cygwin in the $PATH of the environment variabale. Go to taverna File --> preferences --> Tool invocation --> Edit defaul local --> write in Shell: C:\Windows\system32\cmd.exe /c and save that.
Have JAVA_HOME C:\Program Files\jdk 1.7.0 in environment variabale when using Windows.
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/4272/download?version=4
[ More Info ]
Taverna is available from http://taverna.sourceforge.net/
If you are having problems downloading it in Taverna, you may need to provide your username and password in the URL so that Taverna can access the Workflow:
Replace http:// in the link above with http://yourusername:yourpassword@
Workflow Components
![]() | ![]() |
Amrish Mahes |
![]() | ![]() |
Concept Profile Generation Pipeline |
![]() | ![]() |
Requirements: Have MySQL installed with a database called mydb or please create one. Have mysql-connector-java-5.1.28-bin.jar or better placed in the local JAR files. Choose this dependency by rightclicking a beanshell script and selecting it for the whole workflow. Download the Peregrine SKOS CLI from: https://trac.nbic.nl/biosemantics/downloads Install LVG2013Lite. See: https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI Change lvg.properties LVG_DIR=/home/path/to/lvg2013lite/ This file can be found in: /home/path/to/lvg2013lite/data/config/lvg.properties Copy properties file from production.properties which can be found on: https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI Change the normalizer.lvg.properties and normalizer.lvg.binaryCache properties to point to the LVG installation path. This file can be obtained from https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI Have Peregrine indexer installed locally and give it the correct path in the component named "Indexer_tool". Which can be found in the nested workflow named Peregrine indexer. If using a Windows machine please install cygwin. Set cygwin in the $PATH of the environment variabale. Go to taverna File --> preferences --> Tool invocation --> Edit defaul local --> write in Shell: C:\Windows\system32\cmd.exe /c and save that. Have JAVA_HOME C:\Program Files\jdk 1.7.0 in environment variabale when using Windows. |
![]() | ![]() |
mysql-connector-java-5.1.28-bin.jar |
Inputs (6) | ![]() |
Name | Description |
---|---|
Ontology_input | A Skos formated dictionary/thesaurus of predefined concepts. |
User_password | Input your sql password to acces as root. |
abstract_input_depth2 | Give a list with sublist. Each sublist consist out of two elements. Such as [ [ Document_URI, Abstract_URI] ]. For an example please load in the file named: input value for full CPGP. Which can be found under the files tab on MyExperiment. |
Driver_value | The driver needed for all SQL query to run. |
usr_id | The user ID which is in this case root. |
url_value | The url value Sql connects to. |
Processors (6) | ![]() |
Name | Type | Description |
---|---|---|
Peregrine indexer | workflow | The nested workflow Peregrine Indexer uses a java peregrine skos cli.jar file to index the occurence of concepts in documents. The Peregrine indexer program can be downloaded from trac.nbic.nl/biosemantics/downloads. Install LVG2013Lite. See: https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI Copy properties file from production.properties which can be downloaded from: https://trac.nbic.nl/biosemantics/wiki/Peregrine%20SKOS%20CLI Change lvg.properties LVG_DIR=/home/path/to/lvg2013lite/ This file can be found in: /home/path/to/lvg2013lite/data/config/lvg.properties Change the normalizer.lvg.properties and normalizer.lvg.binaryCache properties to point to the LVG installation path. normalizer.lvg.properties =/home/path/to/lvg2013lite/data/config/config/lvg.properties normalizer.lvg.binaryCahce=/home/path/to/lvg2013lite/standartNormCache2013.bin Have MySQL installed with a database called mydb or please create one. Have mysql-connector-java-5.1.28-bin.jar or better placed in the local JAR files. |
Calculate Uncertainty Coefficient | workflow | A nested workflow to calculate the Uncertainty coefficient with the contingency values of the concept pairs. The calculated values are afterwards inserted into a SQL table. |
Calculate Contincengy table | workflow | A nested workflow that retrieves the values of a 2x2 contingency table and puts the co-occurence of made concept pairs into the SQL table TableName_co_occurence. This is done by creating concept pairs which is retrieved from the Sql table TableName_occurence. This contincengy table consits out of a matrix where two concepts co-occur, concept A only occurs and not concept B, concept B occurs and not concept A and neither concepts occur in literature. B Not B A AB A not A B not A not B This is calculated for every possible concept pair. |
Calculate Inner Product | workflow | A nested workflow to calculate the inner product of a concept profile. |
Create Tables | workflow | This nested workflows create the necessary tables for inserting the generated values. The following tables are created: - Co-occurence table This table will consist out of - Occurence table - Uncertainty coefficient table - Inner product table - Big table |
Summarize Big Table | workflow |
Beanshells (20) | ![]() |
Name | Description | Inputs | Outputs |
---|---|---|---|
Beanshell_generate_SQL_table_inner_product | timestamp | out1 | |
Beanshell_SQL_generate_Get_Inner_Product | A Beanshell script that generates a SQL statement to calculate the inner product between two concept profiles. These concept profiles only have concepts that both have in common. The concepts that are in common in the concept profiles are multiplied with each others UC value and added. This summed up value of two concept profiles is the inner product. For example: Concept profile A, has the concept A, B, C and D. Concept profile B has the concepts B, C, E. This means the common concepts for both concept profiles are B, C. Deze waardes worden gemultipliceerd en opgeteld. |
concept_pairs timestamp |
out1 |
Beanshell_SQL_generate_INSERT_INTO_Inner_Product_table | A Beanshell script that generates a SQL statement to insert the inner product values calculated of two concept profiles into the inner product table. |
concept_pairs timestamp inner_product_value |
out1 |
Beanshell_SQL_query_generator_Concept_Doc_occurence | A Beanshell script that generates a Sql statement to INSERT the concept URI and document URI in the table named after the timestamp/tablename the workflow started. It uses a database called mydb. If there is not a database/schema named mydb, please create one. |
table_name concept_occurence_document |
out1 |
Beanshell_extract_uri_doc_and_abstract | A Beanshell script which loops through the input to seperate the Document URIs and Abstract URIs. | Input |
doc_uri abstracts |
Beanshell_remove_dot_and_put_correct_doc_uri_in_con_occ_doc | A Beanshell script that replaces the temp created Taverna file with the correct Document URI. |
concept_occurence_document doc_uri |
out1 |
Beanshell_SQL_generate_table_UC | A Beanshell script that generates a table for the uncertainty coefficient values. | time_stamp | output |
Beanshell_SQL_create_table_co_occurence | timestamp | out1 | |
Beanshell_SQL_INSERT_UC_values_INTO_table | A Beanshell script that generates a SQL statement to insert the calculated values of uncertainty coefficient per concept pair into the table. |
timestamp concept_pairs UC_value |
output_SQL |
Beanshell_calculate_Uncertainty_Coefficient | A Beanshell script that calculates the Uncertainty coefficient based on the script of Herman van Haagen. | corrected_contingency_values | UC_value |
Beanshell_SQL_generate_contingency_table | A Beanshell script that generates a SQL statement to retrieve the amount two concepts co-occur, concept A only occurs and not concept B, concept B occurs and not concept A and neither concepts occur in literature. The output of only concept A and only concept B needs to be corrected. This is done in the Beanshell script called: "Beanshell_correct_contincengy_table". Because the output of those two values were with the co-occurence. This needed to be subtracted. |
concept_pairs table_name |
out1 |
Beanshell_SQL_insert_into_co_occurence_table | A Beanshell script that generates SQL statement to insert into the co-occurence table the amount two given concepts co-occur together in literature. |
contingency_values timestamp concept_pairs |
out1 |
Beanshell_correct_contincengy_table | A Beanshell script that corrects the output SQL retrieved contincengy table by subtracting the co-occurence values of the concept only a and of concept only b value. | contingency_values | corrected_contingency_values |
Beanshell_generate_SQL_statement_Get_destinct_concept | A Beanshell script that retrieves every concept from the Sql table TableName_occurence. | table_name | out1 |
Beanshell_make_concept_pairs | A Beanshell script that gets as input a list of concepts to create concept pairs. |
concept_B concept_A |
concept_pair |
Beanshell_generate_SQL_occurence_concept_doc_table | A Beanshell script which generates a sql statement to CREATE a TABLE. TABLE name is the input of the timestamp. The table consists out of 4 columns: concept_occ_doc_id, concept_uri, occurence_uri and doc_uri. Where as concept_occ_doc_id is the PRIMARY KEY and AUTOINCREMENTED. Every other column has a VARCHAR(200) | timeStamp | Table_sql |
Beanshell_get_timestamp | A Beanshell script which retreives the time the program started. As in year, month, day, hour, minutes and seconds in military time. | timeStamp | |
Beanshell_SQL_CREATE_last_big_table | timestamp | out1 | |
Beanshell_SQL_Generate_INSERT_INTO_big_table | A Beanshell script that generates SQL statements to insert two concepts and their given co-occurence, uncertainty coefficientcy and inner product values into the table. |
timestamp in1 |
out1 |
Beanshell_SQL_statement_get_last_table_data_input | A Beanshell that generates SQL statements to retrieve the inner product of two concept profiles, the co-occurence of the concepts and the uncertainty coefficient. | timestamp | out1 |
Outputs (1) | ![]() |
Name | Description |
---|---|
end |
Datalinks (35) | ![]() |
Source | Sink |
---|---|
Create Tables:timeStamp | Peregrine indexer:table_name |
url_value | Peregrine indexer:url |
User_password | Peregrine indexer:password |
Ontology_input | Peregrine indexer:ontology |
abstract_input_depth2 | Peregrine indexer:Input |
usr_id | Peregrine indexer:userid |
Driver_value | Peregrine indexer:driver |
usr_id | Calculate Uncertainty Coefficient:userid |
User_password | Calculate Uncertainty Coefficient:password |
Calculate Contincengy table:outputlist | Calculate Uncertainty Coefficient:concept_pairs |
Calculate Contincengy table:corrected_contingency_values | Calculate Uncertainty Coefficient:corrected_contingency_values |
url_value | Calculate Uncertainty Coefficient:url |
Driver_value | Calculate Uncertainty Coefficient:driver |
Create Tables:timeStamp | Calculate Uncertainty Coefficient:timestamp |
url_value | Calculate Contincengy table:url |
Driver_value | Calculate Contincengy table:driver |
User_password | Calculate Contincengy table:password |
Create Tables:timeStamp | Calculate Contincengy table:table_name |
usr_id | Calculate Contincengy table:userid |
usr_id | Calculate Inner Product:userid |
User_password | Calculate Inner Product:password |
Calculate Contincengy table:outputlist | Calculate Inner Product:concept_pairs |
Driver_value | Calculate Inner Product:driver |
Create Tables:timeStamp | Calculate Inner Product:timestamp |
url_value | Calculate Inner Product:url |
usr_id | Create Tables:userid |
url_value | Create Tables:url |
Driver_value | Create Tables:driver |
User_password | Create Tables:password |
Create Tables:timeStamp | Summarize Big Table:timestamp |
usr_id | Summarize Big Table:userid |
url_value | Summarize Big Table:url |
User_password | Summarize Big Table:password |
Driver_value | Summarize Big Table:driver |
Summarize Big Table:resultList | end |
Coordinations (3) | ![]() |
Controller | Target |
---|---|
Calculate Inner Product | Summarize Big Table |
Calculate Uncertainty Coefficient | Calculate Inner Product |
Peregrine indexer | Calculate Contincengy table |
Workflow Type
Version 4
(of 9)
Shared with Groups (0)
None
Log in to add to one of your Packs
Statistics
In chronological order:
-
Created by Amrish Mahes on Wednesday 30 April 2014 19:14:47 (UTC)
-
Created by Amrish Mahes on Wednesday 30 April 2014 19:18:01 (UTC)
-
Created by Amrish Mahes on Wednesday 30 April 2014 19:18:42 (UTC)
-
Created by Amrish Mahes on Wednesday 07 May 2014 15:33:38 (UTC)
Revision comment:New version with comment and the right websites to download dependencies
-
Created by Amrish Mahes on Monday 12 May 2014 15:34:00 (UTC)
-
Created by Amrish Mahes on Wednesday 11 June 2014 13:23:36 (UTC)
-
Created by Amrish Mahes on Sunday 15 June 2014 17:45:34 (UTC)
Revision comment:Changed Table construction workflow
For peregrine tool invocation it is now mandatory to give path to the prodcution.properties and peregrine-skos-cli.jar
-
Created by Amrish Mahes on Sunday 15 June 2014 17:50:32 (UTC)
-
Created by Amrish Mahes on Sunday 15 June 2014 17:55:11 (UTC)
Revision comment:Removed the LIMIT 50 in the sql_value file
Reviews
(0)
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
No comments yet
Log in to make a comment