Perform an InterProScan analysis of a protein sequence using the EBI’s WSInterProScan service (see http://www.ebi.ac.uk/Tools/webservices/services/interproscan). The input sequence to use and the user e-mail address are inputs, the other parameters for the analysis (see Job_params) are allowed to default.
InterProScan searches a protein sequence against the protein family and domain signature databases integrated into InterPro (see http://www.ebi.ac.uk/interpro/). A set of matches to the signatures are returned, which are annotated with the corresponding InterPro and GO term assignments for these signature matches.
Populate input data structure with input sequence and data type.
sequence
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Unpack byte[] version of result into a string.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
Wrap input data in a list.
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Unpack byte[] version of result into a string.
org.embl.ebi.escience.scuflworkers.java.ByteArrayToString
InterProScan job parameters.
1
p
1
1
org.embl.ebi.escience.scuflworkers.java.XMLInputSplitter
Using the text output of InterProScan generate GFF format (see http://www.sanger.ac.uk/Software/formats/GFF/) output.
import java.util.StringTokenizer;
interproscan_gff = "";
// Split into lines
StringTokenizer tok1 = new StringTokenizer(interproscan_text, "\n");
while(tok1.hasMoreElements()) {
feat1 = tok1.nextElement();
// Split into fields
StringTokenizer tok2 = new StringTokenizer(feat1, "\t");
fieldCount = 0;
attributeStr = "";
while(tok2.hasMoreElements()) {
fieldCount++;
fieldStr = tok2.nextElement();
if(fieldCount < 2) { // First field is the ID
interproscan_gff += fieldStr;
}
// The tool, feature, start and stop
else if(fieldCount == 4 || (fieldCount > 5 && fieldCount < 9)) {
interproscan_gff += "\t" + fieldStr;
}
// Score
else if(fieldCount == 9) {
if(fieldStr.equals("NA")) {
interproscan_gff += "\t.";
} else {
interproscan_gff += "\t" + fieldStr;
}
}
// Matching InterPro entry
else if(fieldCount == 12 && !fieldStr.equals("NULL")) {
attributeStr += fieldStr;
}
// Matching InterPro entry name
else if(fieldCount == 13 && !fieldStr.equals("NULL")) {
attributeStr += " " + fieldStr;
}
}
interproscan_gff += "\t.\t.\tInterProScan";
if(attributeStr.length() > 0) {
interproscan_gff += " ; " + attributeStr;
}
interproscan_gff += "\n";
}
interproscan_text
interproscan_gff
Get the plain text format result.
toolraw
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
poll
Submit the InterProScan job.
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
runInterProScan
Get the XML format result.
toolxml
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
poll
Wait for the job to complete.
Check status of an InterProScan job.
Map status codes into true/false is done flag.
if(job_status.equals("DONE")) {
is_done = "true";
} else {
is_done = "false";
}
job_status
is_done
If job has not finished fail the workflow.
org.embl.ebi.escience.scuflworkers.java.FailIfFalse
Get the current status of the InterProScan job.
http://www.ebi.ac.uk/Tools/webservices/wsdl/WSInterProScan.wsdl
checkStatus
EBI job identifer for the InterProScan job.
Status of job.
User e-mail address
Input protein sequence for analysis. This can either be the actual sequence (fasta format recommended) or a database identifier in database:identifer format (e.g. uniprot:wap_rat).
InterProScan result in tab delimited plain text format.
application/xml
InterProScan result in XML format.
EBI job identifier for the InterProScan job.
InterProScan result in plain text GFF format (see http://www.sanger.ac.uk/Software/formats/GFF/).
Completed
EBI_InterProScan_poll_job
Get_text_result
Scheduled
Running
Completed
EBI_InterProScan_poll_job
Get_XML_result
Scheduled
Running