Name |
Type |
Description |
pubmed_database |
stringconstant |
Which database is being used. Valuepubmed |
extractPMID |
localworker |
This process extracts the pubmed ID's based on the eSearch run. Scriptimport org.dom4j.Document;
import org.dom4j.Node;
import org.dom4j.io.SAXReader;
SAXReader reader = new SAXReader(false);
reader.setIncludeInternalDTDDeclarations(false);
reader.setIncludeExternalDTDDeclarations(false);
Document document = reader.read(new StringReader(xmltext));
List nodelist = document.selectNodes(xpath);
// Process the elements in the nodelist
ArrayList outputList = new ArrayList();
ArrayList outputXmlList = new ArrayList();
String val = null;
String xmlVal = null;
for (Iterator iter = nodelist.iterator(); iter.hasNext();) {
Node element = (Node) iter.next();
xmlVal = element.asXML();
val = element.getStringValue();
if (val != null && !val.equals("")) {
outputList.add(val);
outputXmlList.add(xmlVal);
}
}
List nodelist=outputList;
List nodelistAsXML=outputXmlList; |
xpath |
stringconstant |
Value/*[local-name(.)='eSearchResult']/*[local-name(.)='IdList']/*[local-name(.)='Id'] |
run_eSearch |
wsdl |
This process will run eSearch that will extract the ID's of the articles that give a hit on the query. Wsdlhttp://eutils.ncbi.nlm.nih.gov/entrez/eutils/soap/eutils.wsdlWsdl Operationrun_eSearch |
parametersXML_eFecth |
xmlsplitter |
This process will create the parameters that can then be used by eSearch and eFetch. |
Retrive_abstracts |
workflow |
This nested workflow was part of Fishers workflow, but has been decreased in size. This workflow is about storing the xmll files from eFetch and doesn't require to extract the plain text abstract like the original workflow did. |
LookAtWatch |
beanshell |
Time flies like an arrow; fruit flies like banana.
This process receives the Abstract XML that activates a time lookup. This is then given to the next process. ScriptDate date = new Date();
stringy = "" + date;
CurrentTime = stringy |
CreateProvenance |
beanshell |
Provenance is important when you want to trace back your data. For this reason I added a process in the workflow that will add some basic provenance based on the work of the w3 (www.w3.org).
The Process adds the following types of provenance:
ResearcherID - Use www.reasercherID.org to get a researcherID. This can then be linked to your research.
ExtractionDate - The date and time the article was extracted.
MyExperimentID - The MyExperimentID of the used worklow.
WorkflowVersion - The Version of the used workflow.
WorkflowDevelopers - A list of the developers of the workflow.
StartDate - The starting date of the article search, see imput port for more information.
EndDate - The ending date of the article search, see imput port for more information.
SearchTerm - The original search query, see imput port for more information.
MaximumArticles - The Maximum amount of articles that have been searched, see imput port for more information. ScriptProv = "\n" + ResearcherID + "\n" + ExtractionDate + "\n" + MyExperimentID + "\n" + WorkflowVersion + "\n"+ WorkflowDevelopers +"\n" + "\n" + StartDate + "\n"+ EndDate +"\n" + SearchTerm+"\n"+MaximumArticles+"\n\n"+ "\n" |
Write_Text_File |
localworker |
This process writes the content of the workflow to files. The location of the file is created in the create file location process.
NOTE: You might want to change the working directory. This can be done by changing the CreateFileLocation process. Script
BufferedWriter out;
if (encoding == void) {
out = new BufferedWriter(new FileWriter(outputFile));
}
else {
out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), encoding));
}
out.write(filecontents);
out.flush();
out.close();
outputFile = filecontents;
|
CreateListOfArticlesThatNeedExtracting |
beanshell |
This Conditional branch splits the workflow in two directions:
IDsInDatabase: A list of ID's of articles that are already in the working directory. They should not be extracted.
ListOfArticlesThatNeedExtracting - The List of Pubmed ID's that should be extracted and added to the database. Script//Distribute Lists
import java.util.*;
List IDsInDatabaseOut = new ArrayList();
List ListOfArticlesThatNeedExtracting = new ArrayList();
String booleanStatement = IDsInDatabase.toString();
if (booleanStatement.equals("File exists")) {
IDsInDatabaseOut.add(ExtractableIDs + "IDsInDatabase");
}
else{
ListOfArticlesThatNeedExtracting.add(ExtractableIDs);
}
|
CheckIfArticleIsInDatabase |
externaltool |
This process calls the commandline and checks if the file at the filelocation exists. If this is not the case the process will return the string false. |
Flatten_List |
localworker |
We need to decrease the depth of the list by one level. Otherwise we will get errors in the Validation report. Scriptflatten(inputs, outputs, depth) {
for (i = inputs.iterator(); i.hasNext();) {
element = i.next();
if (element instanceof Collection && depth > 0) {
flatten(element, outputs, depth - 1);
} else {
outputs.add(element);
}
}
}
outputlist = new ArrayList();
flatten(inputlist, outputlist, 1); |
CreateFileLocation_2 |
beanshell |
This process creates the location of the file. This can then be used to check whetevver the ile exists in the next process (CheckIfArticleIsInDatabase).
NOTE: if you want to change the working directory, please change this process so it will link to the correct directory. Script//You can change the working directory (Default: "/home/") by another working directory of your likeing.
//Make sure it exists.
FileLocation = Workspace + PubmedID + ".xml" |
MyExperimentID_value |
stringconstant |
This value stores the MyExperiment ID of the workflow. If you reupload this workflow with improvements feel free to change this value. Value3659 |
WorkflowDevelopers_value |
stringconstant |
Sander van Boom and Paul Fisher created this workflow. If you've changed this workflow and uploaded it on myExperiment feel free to add your name in this variable as well. ValueSander van Boom and Paul Fisher |
WorkflowVersion_value |
stringconstant |
This value stores the current version of the workflow. Value5 |
AddProvenance |
beanshell |
This process fuses the provenance, the found abstracts and the header of the file.
The output is also send to an output port for checking the values. ScriptAbstractWithProvenance = "\n" + Provenance + Abstract+ "\n" + "" |
XPath_Service |
xpath |
This XPath Service removes the header from the file. This is because we want to add provenance to the file later in the workflow.
After we've added the provenance then we add the header back to the file. Xpath Expression/default:eFetchResult/default:PubmedArticleSet |
Flatten_List_2 |
localworker |
We need to decrease the depth of the list by one level. Otherwise we will get errors in the Validation report. Scriptflatten(inputs, outputs, depth) {
for (i = inputs.iterator(); i.hasNext();) {
element = i.next();
if (element instanceof Collection && depth > 0) {
flatten(element, outputs, depth - 1);
} else {
outputs.add(element);
}
}
}
outputlist = new ArrayList();
flatten(inputlist, outputlist, 1); |
CreateFileLocation_2_2 |
beanshell |
This process creates the location of the file. This can then be used to check whetevver the ile exists in the next process (CheckIfArticleIsInDatabase).
NOTE: if you want to change the working directory, please change this process so it will link to the correct directory. Script//You can change the working directory (Default: "/home/") by another working directory of your likeing.
//Make sure it exists.
FileLocation = Workspace + PubmedID + ".xml" |
InformationExtractionAndSolrImport |
workflow |
Read a file, extract the the content, extrat the pubmedID from the abstract and write the file back to a new workspace. Then import it in solr.
NOTE: Make sure that Solr is installed and the variable pathToPostJar is linking to the correct path of post.jar.
BONUS NOTE: If you want Solr to detect more then just the title and the ID, you should add extra xpaths and update our solr schema accordingly. |
pathToPostJar |
stringconstant |
This is the path to the Post.jar that Solr uses to import it's documents.
NOTE: Please change this variable to your Solr directory. Value/run/media/sander/Second Space/Downloads/Solaria/solr-4.4.0/example/exampledocs/post.jar |
Comments (0)
No comments yet
Log in to make a comment