Fetch today's Dinosaur Comic (http://qwantz.com)
Created: 2011-09-13 23:05:00
Last updated: 2011-09-14 15:58:54
This workflow retrieves the newest comic from http://qwantz.com. Using a string constant to input the URL, the workflow then copies the HTML code from the location indicated by the URL. A list of links to images is then taken from the HTML code. Using a regluar expression supplied by another string constant, this list is then search for the URL specificallyt for the comic. The image file for the comic is then retrieved and the comic is displayed as the workflow output.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (1)
Tom Oinn, Stian Soiland-Reyes |
Titles (1)
Descriptions (1)
Use the local java plugins and some filtering operations to fetch the comic strip image from http://xkcd.com/
Based on the FetchDailyDilbert workflow. |
Dependencies (0)
Processors (6)
Name |
Type |
Description |
getPage |
localworker |
"getPage" takes the URL from "Dinosaur_Comics_URL" as input, and outputs all of the HTML code for the URL provided. ScriptURL inputURL = null;
if (base != void) {
inputURL = new URL(new URL(base), url);
}
else {
inputURL = new URL(url);
}
URLConnection con = inputURL.openConnection();
InputStream in = con.getInputStream();
InputStreamReader isr = new InputStreamReader(in);
Reader inReader = new BufferedReader(isr);
StringBuffer buf = new StringBuffer();
int ch;
while ((ch = inReader.read()) > -1) {
buf.append((char)ch);
}
inReader.close();
contents = buf.toString();
//String NEWLINE = System.getProperty("line.separator");
//
//URL inputURL = null;
//if (base != void) {
// inputURL = new URL(new URL(base), url);
//} else {
// inputURL = new URL(url);
//}
//StringBuffer result = new StringBuffer();
//BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream()));
//String line = null;
//while ((line = reader.readLine()) != null) {
// result.append(line);
// result.append(NEWLINE);
//}
//
//contents = result.toString();
|
Dinosaur_Comics_url |
stringconstant |
This string constant service has as its value the URL for the comic I selected for this problem. This value is fed out to the "getPage" service. Valuehttp://qwantz.com/index.php |
getImageLinks |
localworker |
"getImageLinks" takes HTML from "getPage" and finds links to images. A list of these image-links are then output. ScriptString lowerCaseContent = document.toLowerCase();
int index = 0;
List imagelinks = new ArrayList();
while ((index = lowerCaseContent.indexOf("#");
String strLink = st.nextToken();
imagelinks.add(strLink);
}
|
findComicURL |
localworker |
"findComicURL" uses the regular expression from "comicURLRegex" to search for matches in the image-links provided by "getImageLinks". Scriptfilteredlist = new ArrayList();
for (Iterator i = stringlist.iterator(); i.hasNext();) {
String item = (String) i.next();
if (item.matches(regex)) {
filteredlist.add(item);
}
}
|
comicURLRegex |
stringconstant |
This string constant provides a regular expression to be output to "findComicURL" and used to find the appropriate URL from the list of image-links provided by "getImageLinks". Value.*/comics/.* |
getComicStrip |
localworker |
"getComicStrip" takes the comic URL from "findComicURL" and retrieves the specified image file from the URL. If all has gone well, this file will be the comic. ScriptURL inputURL = null;
if (base != void) {
inputURL = new URL(new URL(base), url);
} else {
inputURL = new URL(url);
}
byte[] contents;
if (inputURL.openConnection().getContentLength() == -1) {
// Content size unknown, must read first...
byte[] buffer = new byte[1024];
int bytesRead = 0;
int totalBytesRead = 0;
InputStream is = inputURL.openStream();
while (bytesRead != -1) {
totalBytesRead += bytesRead;
bytesRead = is.read(buffer, 0, 1024);
}
contents = new byte[totalBytesRead];
} else {
contents = new byte[inputURL.openConnection().getContentLength()];
}
int bytesRead = 0;
int totalBytesRead = 0;
InputStream is = inputURL.openStream();
while (bytesRead != -1) {
bytesRead = is.read(contents, totalBytesRead, contents.length - totalBytesRead);
totalBytesRead += bytesRead;
if (contents.length==totalBytesRead) break;
}
image = contents;
|
Outputs (1)
Name |
Description |
todaysDinosaurComic |
This workflow output should be the most recent comic from http://qwantz.com. Changing the URL provided in the "Dinosaur_Comics_URL" string constant to the URL for another webcomic should provide the most recent comic from the newly imput URL.
|
Datalinks (7)
Source |
Sink |
Dinosaur_Comics_url:value |
getPage:url |
getPage:contents |
getImageLinks:document |
comicURLRegex:value |
findComicURL:regex |
getImageLinks:imagelinks |
findComicURL:stringlist |
Dinosaur_Comics_url:value |
getComicStrip:base |
findComicURL:filteredlist |
getComicStrip:url |
getComicStrip:image |
todaysDinosaurComic |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (0)
(People/Groups)
None
Attributions (1)
(Workflows/Files)
[ edit ]
Shared with Groups (1)
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
Comments (0)
No comments yet
Log in to make a comment