Concept Profile Generation Pipeline
Requirements:
Have MySQL installed with a database called mydb or please create one. Have mysql-connector-java-5.1.28-bin.jar or better placed in the local JAR files.
Have Peregrine indexer installed locally and give it the correct path in the component named "Indexer_tool" Download the Peregrine SKOS CLI from: https://trac.nbic.nl/biosemantics_bet_dev/downloads Install LVG2013Lite. See: https://trac.nbic.nl/data-mining/wiki/Using%20plain%20jar%20files Copy properties file from production.properties Change the normalizer.lvg.properties and normalizer.lvg.binaryCache properties to point to the LVG installation path.
If using a Windows machine please install cygwin. Set cygwin in the $PATH of the environment variabale. Go to taverna File --> preferences --> Tool invocation --> Edit defaul local --> write in Shell: C:\Windows\system32\cmd.exe /c and save that.
Have JAVA_HOME C:\Program Files\jdk 1.7.0 in environment variabale.
Preview
Run
Run this Workflow in the Taverna Workbench...
Option 1:
Copy and paste this link into File > 'Open workflow location...'
http://myexperiment.org/workflows/4272/download?version=3
[ More Info ]
Taverna is available from http://taverna.sourceforge.net/
If you are having problems downloading it in Taverna, you may need to provide your username and password in the URL so that Taverna can access the Workflow:
Replace http:// in the link above with http://yourusername:yourpassword@
Workflow Components
![]() | ![]() |
Amrish Mahes |
![]() | ![]() |
None
![]() | ![]() |
Requirements: Have MySQL installed with a database called mydb or please create one. Have mysql-connector-java-5.1.28-bin.jar or better placed in the local JAR files. Have Peregrine indexer installed locally and give it the correct path in the component named "Indexer_tool" Download the Peregrine SKOS CLI from: https://trac.nbic.nl/biosemantics_bet_dev/downloads Install LVG2013Lite. See: https://trac.nbic.nl/data-mining/wiki/Using%20plain%20jar%20files Copy properties file from production.properties Change the normalizer.lvg.properties and normalizer.lvg.binaryCache properties to point to the LVG installation path. If using a Windows machine please install cygwin. Set cygwin in the $PATH of the environment variabale. Go to taverna File --> preferences --> Tool invocation --> Edit defaul local --> write in Shell: C:\Windows\system32\cmd.exe /c and save that. Have JAVA_HOME C:\Program Files\jdk 1.7.0 in environment variabale. |
![]() | ![]() |
mysql-connector-java-5.1.28-bin.jar |
Inputs (6) | ![]() |
Name | Description |
---|---|
Ontology_input | A Skos formated dictionary/thesaurus of predefined concepts. |
User_password | Input your sql password to acces as root. |
abstract_input_depth2 | Give a list with sublist. Each sublist consist out of two elements. Such as [ [ Document_URI, Abstract_URI] ]. For an example please load in the file named: Input_value_for_test_upload_CPGP. Which can be found under the files tab on MyExperiment. |
Driver_value | The driver needed for all SQL query to run. |
usr_id | The user ID which is in this case root. |
url_value |
Processors (6) | ![]() |
Name | Type | Description |
---|---|---|
Peregrine indexer | workflow | |
Calculate Uncertainty Coefficient | workflow | |
Calculate Contincengy table | workflow | |
Calculate Inner Product | workflow | |
Create Tables | workflow | |
Summarize Big Table | workflow |
Beanshells (20) | ![]() |
Name | Description | Inputs | Outputs |
---|---|---|---|
Beanshell_SQL_create_table_co_occurence | timestamp | out1 | |
Beanshell_SQL_generate_contingency_table | A Beanshell script that generates a SQL statement to retrieve the amount two concepts co-occur, concept A only occurs and not concept B, concept B occurs and not concept A and neither concepts occur in literature. |
concept_pairs table_name |
out1 |
Beanshell_SQL_insert_into_co_occurence_table | A Beanshell script that generates SQL statement to insert into the co-occurence table the amount two given concepts co-occur together in literature. |
contingency_values timestamp concept_pairs |
out1 |
Beanshell_correct_contincengy_table | A Beanshell script that corrects the output SQL retrieved contincengy table. | contingency_values | corrected_contingency_values |
Beanshell_generate_SQL_statement_Get_destinct_concept | A Beanshell script that retrieves every concept. | table_name | out1 |
Beanshell_make_concept_pairs | A Beanshell script that gets as input a list of concepts to create concept pairs. |
concept_B concept_A |
concept_pair |
Beanshell_SQL_INSERT_UC_values_INTO_table | A Beanshell script that generates a SQL statement to insert the calculated values of uncertainty coefficient per concept pair into the table. |
timestamp concept_pairs UC_value |
output_SQL |
Beanshell_calculate_Uncertainty_Coefficient | A Beanshell script that calculates the Uncertainty coefficient based on the work of Herman van Haagen. | corrected_contingency_values | UC_value |
Beanshell_SQL_generate_Get_Inner_Product | A Beanshell script that generates a SQL statement to retrieve the inner product between two concept profiles. |
concept_pairs timestamp |
out1 |
Beanshell_SQL_generate_INSERT_INTO_Inner_Product_table | A Beanshell script that generates a SQL statement to insert the inner product values calculated of two concept profiles into the inner product table. |
concept_pairs timestamp inner_product_value |
out1 |
Beanshell_SQL_query_generator_Concept_Doc_occurence | A Beanshell script that generates a Sql statement to INSERT the concept URI, occurence URI and document URI in the table named after the timestamp the workflow started. It uses a database called mydb. It is needed to have a database called mydb. |
table_name concept_occurence_document |
out1 |
Beanshell_extract_uri_doc_and_abstract | A Beanshell script which loops through the input to seperate the Document URIs and Abstract URIs. | Input |
doc_uri abstracts |
Beanshell_remove_dot_and_put_correct_doc_uri_in_con_occ_doc | A Beanshell script that replaces the temp created Taverna file with the correct Document URI. |
concept_occurence_document doc_uri |
out1 |
Beanshell_generate_SQL_occurence_concept_doc_table | A Beanshell script which generates a sql statement to CREATE a TABLE. TABLE name is the input of the timestamp. The table consists out of 4 columns: concept_occ_doc_id, concept_uri, occurence_uri and doc_uri. Where as concept_occ_doc_id is the PRIMARY KEY and AUTOINCREMENTED. Every other column has a VARCHAR(200) | timeStamp | Table_sql |
Beanshell_SQL_generate_table_UC | A Beanshell script that generates a table for the uncertainty coefficient values. | time_stamp | output |
Beanshell_get_timestamp | A Beanshell script which retreives the time the program started. As in year, month, day, hour, minutes and seconds in military time. | timeStamp | |
Beanshell_SQL_Generate_INSERT_INTO_big_table | A Beanshell script that generates SQL statements to insert two concepts and their given co-occurence, uncertainty coefficientcy and inner product values into the table. |
timestamp in1 |
out1 |
Beanshell_SQL_statement_get_last_table_data_input | A Beanshell that generates SQL statements to retrieve the inner product of two concept profiles, the co-occurence of the concepts and the uncertainty coefficient. | timestamp | out1 |
Beanshell_generate_SQL_table_inner_product | timestamp | out1 | |
Beanshell_SQL_CREATE_last_big_table | timestamp | out1 |
Outputs (1) | ![]() |
Name | Description |
---|---|
end |
Datalinks (35) | ![]() |
Source | Sink |
---|---|
Create Tables:timeStamp | Peregrine indexer:table_name |
url_value | Peregrine indexer:url |
User_password | Peregrine indexer:password |
Ontology_input | Peregrine indexer:ontology |
abstract_input_depth2 | Peregrine indexer:Input |
usr_id | Peregrine indexer:userid |
Driver_value | Peregrine indexer:driver |
usr_id | Calculate Uncertainty Coefficient:userid |
User_password | Calculate Uncertainty Coefficient:password |
Calculate Contincengy table:outputlist | Calculate Uncertainty Coefficient:concept_pairs |
Calculate Contincengy table:corrected_contingency_values | Calculate Uncertainty Coefficient:corrected_contingency_values |
url_value | Calculate Uncertainty Coefficient:url |
Driver_value | Calculate Uncertainty Coefficient:driver |
Create Tables:timeStamp | Calculate Uncertainty Coefficient:timestamp |
url_value | Calculate Contincengy table:url |
Driver_value | Calculate Contincengy table:driver |
User_password | Calculate Contincengy table:password |
Create Tables:timeStamp | Calculate Contincengy table:table_name |
usr_id | Calculate Contincengy table:userid |
usr_id | Calculate Inner Product:userid |
User_password | Calculate Inner Product:password |
Calculate Contincengy table:outputlist | Calculate Inner Product:concept_pairs |
Driver_value | Calculate Inner Product:driver |
Create Tables:timeStamp | Calculate Inner Product:timestamp |
url_value | Calculate Inner Product:url |
usr_id | Create Tables:userid |
url_value | Create Tables:url |
Driver_value | Create Tables:driver |
User_password | Create Tables:password |
Create Tables:timeStamp | Summarize Big Table:timestamp |
usr_id | Summarize Big Table:userid |
url_value | Summarize Big Table:url |
User_password | Summarize Big Table:password |
Driver_value | Summarize Big Table:driver |
Summarize Big Table:resultList | end |
Coordinations (3) | ![]() |
Controller | Target |
---|---|
Calculate Inner Product | Summarize Big Table |
Peregrine indexer | Calculate Contincengy table |
Calculate Uncertainty Coefficient | Calculate Inner Product |
Workflow Type
Version 3
(of 9)
Shared with Groups (0)
None
Log in to add to one of your Packs
Statistics
In chronological order:
-
Created by Amrish Mahes on Wednesday 30 April 2014 19:14:47 (UTC)
-
Created by Amrish Mahes on Wednesday 30 April 2014 19:18:01 (UTC)
-
Created by Amrish Mahes on Wednesday 30 April 2014 19:18:42 (UTC)
-
Created by Amrish Mahes on Wednesday 07 May 2014 15:33:38 (UTC)
Revision comment:New version with comment and the right websites to download dependencies
-
Created by Amrish Mahes on Monday 12 May 2014 15:34:00 (UTC)
-
Created by Amrish Mahes on Wednesday 11 June 2014 13:23:36 (UTC)
-
Created by Amrish Mahes on Sunday 15 June 2014 17:45:34 (UTC)
Revision comment:Changed Table construction workflow
For peregrine tool invocation it is now mandatory to give path to the prodcution.properties and peregrine-skos-cli.jar
-
Created by Amrish Mahes on Sunday 15 June 2014 17:50:32 (UTC)
-
Created by Amrish Mahes on Sunday 15 June 2014 17:55:11 (UTC)
Revision comment:Removed the LIMIT 50 in the sql_value file
Reviews
(0)
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
No comments yet
Log in to make a comment