The AffyArrayNormalization web services normalise raw Affymetrix GeneChip data. They are wrappers around Philip de Groot's normalization R script to provide remote programmatic access. This example workflow demonstrates the use of the AffyArrayNormalization services.
The flow is as follows:
* A client executes the AffyArrayNormalization_submit service with two inputs: a User object and a collection of URLs to CEL files.
* The User object contains a user ID, a password and an e-mail address. Currently the user ID and password can be any characters [a-zA-Z0-9]. Just pick something, there's no need to register them first. They are only used to make sure that the one who tries to download the results is the same person as the one who submitted the job.
* Your job will be submitted to the SUN Grid Engine on the NuGO R-server. The e-mail address is used by the Sun Grid Engine to notify you when your job is done. We might also use it to send you feedback in case something goes wrong with your job, but it won't be used for anything else and will only be stored for a maximum of 7 days (together with your job's results).
* The URLs to the CEL files must use either the http or https protocol. You can restrict access to these URLs using basic authentication and putting the username and password in the URL. For example if the user is pieter and the password is test you could have a URL like this: https://pieter:test@lab5.bioinformatics.nl/phenolink/home/TisMix_mix5a_01_v1_U133plus2.CEL. Hence you have to put the CEL files somewhere on a web server, so the AffyArrayNormalization_submit service can download them.
* The AffyArrayNormalization_submit service returns a job ID and a link to the results. Once the job is done this link can be used to download the results. Results will be available for 7 days after which they will be deleted automatically.
* The job ID is used to execute the AffyArrayNormalization_poll service inside a nested workflow. AffyArrayNormalization_poll returns the job status and unless the status is "finished" the entire nested workflow will fail. If the nested CheckStatus workflow fails, Taverna will automatically retry until it succeeds and hence the job has finished (or until the maximum number of retries is reached).
* The nested DownloadFile workflow depends on successful completion of the nested CheckStatus workflow. The name says it all: It downloads the result, which is a single ZIP file. This workflow does not take care of unzipping the archive. You have to do that yourself.
AffyArrayNormalization services use a secure connection over HTTPS. To make this work you *must* import our SSL certificates in your local Java keystores. You can do this either manually as described at https://www.bioinformatics.nl/phenolink/home/JavaAndHTTPS.html or use a Java tool to do the job as described at http://www.myexperiment.org/files/148
Please make sure you end the path, where the download should be saved, with a slash. (Slash backward for Windows or a slash forward for Linux and Mac OS X.)
//
// Import modules;
//
import java.io.*;
import java.net.*;
import java.util.regex.*;
//
// Main script.
//
String vError = "";
Downloaded = "false";
try {
// Get filename for the download from the URL.
String vFile = "download";
String vPattern = ".*?([^/]+)$";
Pattern oPattern = Pattern.compile(vPattern);
Matcher oMatcher = oPattern.matcher(URL);
Boolean vHit = oMatcher.matches();
if (vHit) {
vFile = oMatcher.group(1);
} else {
vError = vError + "Error no filename found in URL.";
}
// Connect to URL.
URL oURL = new URL(URL);
URLConnection oURLConnection = oURL.openConnection();
// Check if we are are dealing with a site
// that uses basic http(s) authentication.
vPattern = "(\\w+://)??([^:]+):([^:@]+)@.*";
oPattern = Pattern.compile(vPattern);
oMatcher = oPattern.matcher(URL);
vHit = oMatcher.matches();
if (vHit) {
//String vProtocol = oMatcher.group(1);
String vUser = oMatcher.group(2);
String vPass = oMatcher.group(3);
String vAuth = vUser + ":" + vPass;
String vEncodedUserPassword = new sun.misc.BASE64Encoder().encode(vAuth.getBytes());
oURLConnection.setRequestProperty ("Authorization", "Basic " + vEncodedUserPassword);
}
// Pump data to disk.
InputStream oIS = oURLConnection.getInputStream();
String vFilePath = Path + vFile;
OutputStream oOS = new FileOutputStream(vFilePath);
synchronized (oIS) {
synchronized (oOS) {
byte[] oBuffer = new byte[256];
while (true) {
int vBytesRead = oIS.read(oBuffer);
if (vBytesRead == -1) break;
oOS.write(oBuffer, 0, vBytesRead);
}
}
}
oIS.close();
oOS.close();
Downloaded = "true";
} catch (MalformedURLException oError) {
vError = vError + oError.getMessage();
} catch (FileNotFoundException oError) {
vError = vError + "File not found.";
} catch (PatternSyntaxException oError) {
vError = vError + "RegExp Error: ";
vError = vError + oError.getMessage();
} catch (IOException oError) {
vError = vError + "IO Error: ";
vError = vError + oError.getMessage();
}
Message = vError;
URL
Path
Downloaded
Message
text/x-taverna-web-url
Processor to parse the datatype URL
http://moby.ucalgary.ca/moby/MOBY-Central.pl
URL
result
Processor to parse the datatype URL
A generic password object.
nugo-r.bioinformatics.nl
password
http://moby.ucalgary.ca/moby/MOBY-Central.pl
Password
A generic user object.
nugo-r.bioinformatics.nl
user
http://moby.ucalgary.ca/moby/MOBY-Central.pl
User
An Universal Resource Locater (URL).
http://moby.ucalgary.ca/moby/MOBY-Central.pl
URL
An Universal Resource Locater (URL).
http://moby.ucalgary.ca/moby/MOBY-Central.pl
URL
Processor to parse the datatype Object
http://moby.ucalgary.ca/moby/MOBY-Central.pl
Object
job_id
Processor to parse the datatype Object
An e-mail address object.
nugo-r.bioinformatics.nl
email
http://moby.ucalgary.ca/moby/MOBY-Central.pl
Email
An Universal Resource Locater (URL).
http://moby.ucalgary.ca/moby/MOBY-Central.pl
URL
Asynchronous BioMOBY webservice for quality assessment of raw Affymetrix GeneChip data. This services requires as input a collection of 3 or more URLs to CEL files from the same experiment. URLs must be using the HTTP or HTTPS protocol. You will recieve a Job IDUse this Job ID with the AffyArrayNormalization_poll service from the same service provider to check the status of your job.Use the URL in the output of this service to fetch the results.
http://moby.ucalgary.ca/moby/MOBY-Central.pl
AffyArrayNormalization_submit
nugo-r.bioinformatics.nl
gcRMA(slow)
true
# Explanation of status types.
# SGE qstat status types:
p=>job is pending
r=>job is running
R=>job is restarting
s=>job is suspended
S=>queue is suspended and therefore job is suspended as well
t=>transferring job to cluster node
T=>job is suspended because suspension threshold of queue was exceeded
z=>zombie
h=>job was put on hold
u=>... by user
o=>... by operator
s=>... by system
j=>... because it depends on the results of other jobs which have not yet finished
a=>... because it was scheduled for execution at some time in the future
d=>deleting job
q=>job is queued
w=>job is waiting
E=>job is in error state
# Our own status types:
m=>job status is missing or unkown
f=>job has finished
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
String job_finished = "false";
if (status.equals("f") || status.equals("m")) {
job_finished = "true";
} else {
job_finished = "false";
}
status
job_finished
Checks the status of an asynchronous AffyArrayNormalization_submit job.
http://moby.ucalgary.ca/moby/MOBY-Central.pl
AffyArrayNormalization_poll
nugo-r.bioinformatics.nl
Processor to parse the datatype Object
http://moby.ucalgary.ca/moby/MOBY-Central.pl
Object
status
Processor to parse the datatype Object
text/plain
Just pick something. There is no need to register first. User and password are only used to make sure whoever tries to downlead the results is the same as the one who submitted the job in the first place.
The e-mail address is used by the Sun Grid Engine to notify you when your job is done. We might also use it to send you feedback in case something goes wrong with your job, but it won't be used for anything else and will only be stored for a maximum of 7 days (together with your job's results).
Just pick something. There is no need to register first. User and password are only used to make sure whoever tries to downlead the results is the same as the one who submitted the job in the first place.
An URL to an Affy CEL file. For debugging or as example you can use: https://lab5.bioinformatics.nl/phenolink/home/TisMix_mix5a_01_v1_U133plus2.CEL
An URL to an Affy CEL file. For debugging or as example you can use: https://lab5.bioinformatics.nl/phenolink/home/TisMix_mix5a_02_v1_U133plus2.CEL
An URL to an Affy CEL file. For debugging or as example you can use: https://lab5.bioinformatics.nl/phenolink/home/TisMix_mix5a_03_v1_U133plus2.CEL
Provide an absolute path to the directory where you want to store the downloaded results. Make sure the path ends with the path seperator for your operating system. (Slash forward for Linux, Unix and Mac OS X or a backslash for Windows. For example /home/user/downloads/ or D:\My Documents\downloads\)
text/xml
Completed
CheckStatus
DownloadFile
Scheduled
Running