Text-mining using OSCAR to obtain a list of Chemical names and Identifiers
Created: 2013-04-18 13:01:50
This service extracts chemical names from text and obtains identifiers for these names. It outputs a HTML string that can be opened in a browser providing a table of information and links to ChemSpider.
Known issues
- Character limit ~3000
- Unable to produce InChIs or CSID for some names
- Error sometimes encountered when a trivial and systematic name for the same compound are used
- Some issues with identifiers being recognised but not able to be processed
requires access to an OSCAR3 service and ChemSpider
more information is contained within the text constant "Instructions"
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (0)
Titles (0)
Descriptions (0)
Dependencies (0)
Inputs (2)
Name |
Description |
Content |
The text that you wish to be analysed by the service. Text must not contain any line breaks and there is a character limit.
Bugs
- If two names mean the same substance and both produce InChIs the service will fail
|
output_Filepath |
The file path that you wish to save the file to. end your file name with .html to easily open it.
|
Processors (17)
Name |
Type |
Description |
OSCAR_REST_Service |
rest |
|
Output_format |
stringconstant |
Valuesaf |
Flowcommand |
stringconstant |
Value |
Name |
stringconstant |
Value |
XPath_Service_2 |
xpath |
Xpath Expression/saf/annot |
HTML_generator |
beanshell |
Scriptimport java.util.ArrayList;
import java.util.HashMap;
ArrayList names = new ArrayList();
ArrayList inchis = new ArrayList();
ArrayList csids = new ArrayList();
ArrayList exNames = new ArrayList();
names.addAll(in1);
inchis.addAll(in2);
csids.addAll(in3);
exNames.addAll(in4);
String table = "ResultsSome names will unfourtunatly not produce InChIs in this service. CSID links will take you to the ChemSpider page for the substance. Name links will take you to a ChemSpider search for that name. "; // completes the html string
out1 = table;
|
InChI_filter |
beanshell |
Filters entries by confidence value, chemical tag and inchi tag Scriptimport java.util.ArrayList;
import java.util.HashMap;
ArrayList oscarOutput = new ArrayList();
ArrayList oscarXML = new ArrayList();
ArrayList filteredResults = new ArrayList();
ArrayList names = new ArrayList();
ArrayList inchis = new ArrayList();
ArrayList extraResults = new ArrayList();
ArrayList extraNames = new ArrayList();
oscarOutput.addAll(in1);
int scoreStandard = Integer.parseInt(in2);
for(String output : oscarOutput){
if(output.contains(">CM<")){
if(output.contains("confidence")){
int startOfScore = output.indexOf("confidence") + 12;
String scoreString = output.substring(startOfScore); // makes a shorter string starting at the score
String score = scoreString.substring(2,4); // gets confidence of chemical
int scoreInt = Integer.parseInt(score);
if(scoreInt >= scoreStandard){
oscarXML.add(output);
}
}
}
}
for(String chemXML : oscarXML){
if(chemXML.contains("\"InChI")){
filteredResults.add(chemXML); // adds Oscar results that have InChI component
} else {
extraResults.add(chemXML);
}
}
for(String result : filteredResults){
int indexa = result.indexOf("surface") + 9;
int index = result.indexOf(" |
InChIToCSID |
wsdl |
Wsdlhttp://www.chemspider.com/InChI.asmx?WSDLWsdl OperationInChIToCSID |
InChIToCSID_input |
xmlsplitter |
|
InChIToCSID_output |
xmlsplitter |
|
Remove_String_Duplicates |
localworker |
ScriptList strippedlist = new ArrayList();
for (Iterator i = stringlist.iterator(); i.hasNext();) {
String item = (String) i.next();
if (strippedlist.contains(item) == false) {
strippedlist.add(item);
}
}
|
Remove_String_Duplicates_2 |
localworker |
ScriptList strippedlist = new ArrayList();
for (Iterator i = stringlist.iterator(); i.hasNext();) {
String item = (String) i.next();
if (strippedlist.contains(item) == false) {
strippedlist.add(item);
}
}
|
Write_Text_File |
localworker |
Script
BufferedWriter out;
if (encoding == void) {
out = new BufferedWriter(new FileWriter(outputFile));
}
else {
out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(outputFile), encoding));
}
out.write(filecontents);
out.flush();
out.close();
outputFile = filecontents;
|
encoding_value |
stringconstant |
ValueUTF-8 |
Remove_String_Duplicates_3 |
localworker |
ScriptList strippedlist = new ArrayList();
for (Iterator i = stringlist.iterator(); i.hasNext();) {
String item = (String) i.next();
if (strippedlist.contains(item) == false) {
strippedlist.add(item);
}
}
|
Instructions |
stringconstant |
This workflow is straightforward. Simply insert text and a file path and the service will do the rest. The service does require access to an OSCAR3 server, by defult this is set to the local machine however may need to be changed if your location varies. The OSCAR server can be downloaded from http://www.sourceforge.net/projects/oscar3-chem/. It also requires access to the ChemSpider web services, these should be accessable as long as the user is online. Confidence setting can be changed if desired and is set to 50 by default. ValueClick on the anotation on this service and the others for more information |
Confidence_Value |
stringconstant |
The Confidence value that an entry must exceed in order to be recognised by the service. This is a value assigned by OSCAR based on its confidence an entry is correctly tagged. It must be given in the form of two digits e.g 50 for 50%. Value50 |
Beanshells (2)
Name |
Description |
Inputs |
Outputs |
HTML_generator |
|
in1
in2
in3
in4
|
out1
|
InChI_filter |
Filters entries by confidence value, chemical tag and inchi tag |
in1
in2
|
out1
out2
out3
|
Datalinks (21)
Source |
Sink |
Output_format:value |
OSCAR_REST_Service:output |
Flowcommand:value |
OSCAR_REST_Service:flowcommand |
Name:value |
OSCAR_REST_Service:name |
Content |
OSCAR_REST_Service:contents |
OSCAR_REST_Service:responseBody |
XPath_Service_2:xml_text |
InChIToCSID_output:InChIToCSIDResult |
HTML_generator:in3 |
Remove_String_Duplicates_2:strippedlist |
HTML_generator:in2 |
Remove_String_Duplicates:strippedlist |
HTML_generator:in1 |
Remove_String_Duplicates_3:strippedlist |
HTML_generator:in4 |
XPath_Service_2:nodelistAsXML |
InChI_filter:in1 |
Confidence_Value:value |
InChI_filter:in2 |
InChIToCSID_input:output |
InChIToCSID:parameters |
Remove_String_Duplicates_2:strippedlist |
InChIToCSID_input:inchi |
InChIToCSID:parameters |
InChIToCSID_output:input |
InChI_filter:out1 |
Remove_String_Duplicates:stringlist |
InChI_filter:out2 |
Remove_String_Duplicates_2:stringlist |
encoding_value:value |
Write_Text_File:encoding |
HTML_generator:out1 |
Write_Text_File:filecontents |
output_Filepath |
Write_Text_File:outputFile |
InChI_filter:out3 |
Remove_String_Duplicates_3:stringlist |
Write_Text_File:outputFile |
html |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (2)
(People/Groups)
Attributions (1)
(Workflows/Files)
Shared with Groups (0)
None
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(15)
Only the first 2 workflows that use similar services are shown. View all workflows that use these services.
Generate inChi
(1)
Generates an inChi string for a given chemical represented by its SMILES string, SDF or MOL file using the inchi web service provided by ChemSpider
Created: 2010-07-09
| Last updated: 2010-07-09
Credits:
Peter Li
Generate inChi information
(1)
Uses the GenerateInchiInfo web service operation from ChemSpider to generate information relating to the InChi string for a given chemical compound
Created: 2010-07-09
| Last updated: 2010-07-09
Credits:
Peter Li
Comments (0)
No comments yet
Log in to make a comment