HTML Citation Extraction
Created: 2011-10-05 15:17:06
Last updated: 2011-10-05 15:21:06
This workflow is a beanshell script that locates and extracts text within an html page. It was designed to pull citation information from the website http://robjhyndman.com/TSDL/.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (0)
Titles (0)
Descriptions (0)
Dependencies (0)
Processors (3)
Name |
Type |
Description |
URL_for_Citation |
stringconstant |
Valuehttp://robjhyndman.com/TSDL/ |
getPage_2_2 |
localworker |
ScriptURL inputURL = null;
if (base != void) {
inputURL = new URL(new URL(base), url);
}
else {
inputURL = new URL(url);
}
URLConnection con = inputURL.openConnection();
InputStream in = con.getInputStream();
InputStreamReader isr = new InputStreamReader(in);
Reader inReader = new BufferedReader(isr);
StringBuffer buf = new StringBuffer();
int ch;
while ((ch = inReader.read()) > -1) {
buf.append((char)ch);
}
inReader.close();
contents = buf.toString();
//String NEWLINE = System.getProperty("line.separator");
//
//URL inputURL = null;
//if (base != void) {
// inputURL = new URL(new URL(base), url);
//} else {
// inputURL = new URL(url);
//}
//StringBuffer result = new StringBuffer();
//BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream()));
//String line = null;
//while ((line = reader.readLine()) != null) {
// result.append(line);
// result.append(NEWLINE);
//}
//
//contents = result.toString();
|
extractULElementForCitationFromTSDL |
localworker |
Extracts Citation line from TSDL webpage.
BeanShell written by Jeffery Adamus
Copied and hacked getImageLinks by Alan Williams Script
// Copied and Hacked by Jeffery Adamus
// based on getImageLinks by Alan Williams
String lowerCaseContent = document.toLowerCase();
int index = 0;
int endUL = 0;
List ulElements = new ArrayList();
while ((index = lowerCaseContent.indexOf(" ", index)) != -1) {
index+= 4;
if (( endUL = lowerCaseContent.indexOf(" ", index)) == -1){
debugLog += " break" + endUL;
break;
}
String strLink = document.substring(index, endUL);
index = endUL;
if ( strLink.indexOf( " |
Datalinks (3)
Source |
Sink |
URL_for_Citation:value |
getPage_2_2:url |
getPage_2_2:contents |
extractULElementForCitationFromTSDL:document |
extractULElementForCitationFromTSDL:citationLine |
output |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (1)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (0)
None
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (1)
(Workflows/Files)
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment