Fetch today's Dinosaur Comic (http://qwantz.com)

Created: 2011-09-13 23:05:00 Last updated: 2011-09-14 15:58:54

Download Workflow

This workflow retrieves the newest comic from http://qwantz.com. Using a string constant to input the URL, the workflow then copies the HTML code from the location indicated by the URL. A list of links to images is then taken from the HTML code. Using a regluar expression supplied by another string constant, this list is then search for the URL specificallyt for the comic. The image file for the comic is then retrieved and the comic is displayed as the workflow output.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/2341/download?version=1
[ More Info Expand ]

Workflow Components

Authors (1)

Titles (1)

Descriptions (1)

Dependencies (0)

Inputs (0)

Processors (6)

Name	Type	Description
getPage	localworker	"getPage" takes the URL from "Dinosaur_Comics_URL" as input, and outputs all of the HTML code for the URL provided. Script URL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } URLConnection con = inputURL.openConnection(); InputStream in = con.getInputStream(); InputStreamReader isr = new InputStreamReader(in); Reader inReader = new BufferedReader(isr); StringBuffer buf = new StringBuffer(); int ch; while ((ch = inReader.read()) > -1) { buf.append((char)ch); } inReader.close(); contents = buf.toString(); //String NEWLINE = System.getProperty("line.separator"); // //URL inputURL = null; //if (base != void) { // inputURL = new URL(new URL(base), url); //} else { // inputURL = new URL(url); //} //StringBuffer result = new StringBuffer(); //BufferedReader reader = new BufferedReader(new InputStreamReader(inputURL.openStream())); //String line = null; //while ((line = reader.readLine()) != null) { // result.append(line); // result.append(NEWLINE); //} // //contents = result.toString();
Dinosaur_Comics_url	stringconstant	This string constant service has as its value the URL for the comic I selected for this problem. This value is fed out to the "getPage" service. Value http://qwantz.com/index.php
getImageLinks	localworker	"getImageLinks" takes HTML from "getPage" and finds links to images. A list of these image-links are then output. Script String lowerCaseContent = document.toLowerCase(); int index = 0; List imagelinks = new ArrayList(); while ((index = lowerCaseContent.indexOf("#"); String strLink = st.nextToken(); imagelinks.add(strLink); }
findComicURL	localworker	"findComicURL" uses the regular expression from "comicURLRegex" to search for matches in the image-links provided by "getImageLinks". Script filteredlist = new ArrayList(); for (Iterator i = stringlist.iterator(); i.hasNext();) { String item = (String) i.next(); if (item.matches(regex)) { filteredlist.add(item); } }
comicURLRegex	stringconstant	This string constant provides a regular expression to be output to "findComicURL" and used to find the appropriate URL from the list of image-links provided by "getImageLinks". Value ./comics/.
getComicStrip	localworker	"getComicStrip" takes the comic URL from "findComicURL" and retrieves the specified image file from the URL. If all has gone well, this file will be the comic. Script URL inputURL = null; if (base != void) { inputURL = new URL(new URL(base), url); } else { inputURL = new URL(url); } byte[] contents; if (inputURL.openConnection().getContentLength() == -1) { // Content size unknown, must read first... byte[] buffer = new byte[1024]; int bytesRead = 0; int totalBytesRead = 0; InputStream is = inputURL.openStream(); while (bytesRead != -1) { totalBytesRead += bytesRead; bytesRead = is.read(buffer, 0, 1024); } contents = new byte[totalBytesRead]; } else { contents = new byte[inputURL.openConnection().getContentLength()]; } int bytesRead = 0; int totalBytesRead = 0; InputStream is = inputURL.openStream(); while (bytesRead != -1) { bytesRead = is.read(contents, totalBytesRead, contents.length - totalBytesRead); totalBytesRead += bytesRead; if (contents.length==totalBytesRead) break; } image = contents;

Beanshells (0)

Outputs (1)

Name	Description
todaysDinosaurComic	This workflow output should be the most recent comic from http://qwantz.com. Changing the URL provided in the "Dinosaur_Comics_URL" string constant to the URL for another webcomic should provide the most recent comic from the newly imput URL.

Datalinks (7)

Source	Sink
Dinosaur_Comics_url:value	getPage:url
getPage:contents	getImageLinks:document
comicURLRegex:value	findComicURL:regex
getImageLinks:imagelinks	findComicURL:stringlist
Dinosaur_Comics_url:value	getComicStrip:base
findComicURL:filteredlist	getComicStrip:url
getComicStrip:image	todaysDinosaurComic

Coordinations (0)

Information Workflow Type

Taverna 2

Information Uploader

Chrisser

Information License

All versions of this Workflow are licensed under:

Information Version 1 (of 1)

Information Credits (0)

(People/Groups)

None

Information Attributions (1)

(Workflows/Files)

Fetch today's xkcd comic

[ edit ]

Information Tags (3)

Uploader tags

Log in to add Tags

Information Shared with Groups (1)

IST 600

Information Featured In Packs (0)

None

Log in to add to one of your Packs

Information Attributed By (0)

(Workflows/Files)

None

Information Favourited By (0)

No one

Information Statistics

1418 viewings

1218 downloads

[ see breakdown ]

Citations (0)

None

Version History

In chronological order:

Fetch today's Dinosaur Comic (http://qwantz.com)

Created by Chrisser on Tuesday 13 September 2011 23:05:00 (UTC)

Last edited by Chrisser on Wednesday 14 September 2011 15:58:54 (UTC)