RTCalc Retention Time Prediction and Outlier Removal
Created: 2014-02-06 14:38:51
Last updated: 2014-02-06 14:43:24
This workflow takes as input a pepXML file from PeptideProphet, applied RTCalc and outputs a filtered list of peptides based on the retention time Z-scores.
Preview
Run
Run this Workflow in the Taverna Workbench...
Workflow Components
Authors (1)
|  |
Magnus Palmblad (n.m.palmblad@lumc.nl)
Sonja Holl |
Titles (1)
|  |
RTCalc Retention Time Prediction and Outlier Removal |
Descriptions (1)
|  |
This workflow trains a retention time predictor using the RTCalc algorithm supplied by the Trans-
Proteomic Pipeline. The Rshell component calculates and plots RTCalc retention time Z-scores for
all peptides in the pepXML file. There is no probability cutoff for the training of RTCalc, so for datasets
with many false peptide identifications (or peptide-spectrum matches of low probability), a selection
should be in the XPaths, in the Join_and_Insert_Tabs component or in an additional component at
any point before the RTCalc_Train call.
This workflow identifies peptides from tandem mass spectra using X!Tandem as in a standard installation of the Trans-Proteomic Pipeline (TPP, version 4.6, but earlier versions should also work). The peptide-spectrum matches are validates using PeptideProphet (also from a standard installation of TPP) and peptides with at least a probability 0.95 are used to train a linear retention time predictor (Palmblad et al., 2002), whereby retention coefficients are also derived. These indirectly provide information on the chromatographic stationary and mobile phase (composition and pH) and the gradient.
The two Rshell scrips visualize the prediction versus measured retention times and amino acid coefficients colored by physico-chemical properties (basic/acidic/polar etc.). |
Dependencies (0)
|  |
Inputs (2)
|  |
Name |
Description |
pepXML_File |
This is the input peptide identifications in pepXML format.
|
Max_Abs_Z_Score |
This is the maximum absolute Z-Score for inclusion in the Fiterered_Peptide_List.
|
Processors (9)
|  |
Name |
Type |
Description |
Read_Text_File |
localworker |
This shell reads the X!Tandem/PeptideProphet results to feed the XPath extractors below. ScriptBufferedReader getReader (String fileUrl, String encoding) throws IOException {
InputStreamReader reader;
try {
if (encoding == null) {
reader = new FileReader(fileUrl);
} else {
reader = new InputStreamReader(new FileInputStream(fileUrl),encoding);
}
}
catch (FileNotFoundException e) {
// try a real URL instead
URL url = new URL(fileUrl);
if (encoding == null) {
reader = new InputStreamReader (url.openStream());
} else {
reader = new InputStreamReader (url.openStream(), encoding);
}
}
return new BufferedReader(reader);
}
StringBuffer sb = new StringBuffer(4000);
if (encoding == void) {
encoding = null;
}
BufferedReader in = getReader(fileurl, encoding);
String str;
String lineEnding = System.getProperty("line.separator");
while ((str = in.readLine()) != null) {
sb.append(str);
sb.append(lineEnding);
}
in.close();
filecontents = sb.toString();
|
Extract_Peptides |
xpath |
This XPath extracts the amino acid sequences of the identified peptides. Xpath Expression/default:msms_pipeline_analysis/default:msms_run_summary/default:spectrum_query/default:search_result/default:search_hit/@peptide |
Extract_Scan_Numbers |
xpath |
This XPath extracts the corresponding scan numbers for the identified peptides. Xpath Expression/default:msms_pipeline_analysis/default:msms_run_summary/default:spectrum_query/@start_scan |
Join_and_Insert_Tabs |
beanshell |
This Beanshell formats the data for RTCalc. ScriptString temp_training = new String();
String temp_validation_flat = new String();
for (i=0; i |
RTCalc_Train |
externaltool |
This component trains the RTCalc predictor. |
RTCalc_Predict |
externaltool |
This tool applies the RTCalc predictor. |
Compare_Z_Scores_and_Probabilities |
rshell |
This Rshell script calculates and plots RTCalc retention time Z-scores for all peptides in the pepXML file.
There is no probability cutoff for the training of RTCalc, so for datasets with many false peptide
identifications (or peptide-spectrum matches of low probability), a selection should be in the XPaths, in
the Join_and_Insert_Tabs component or in an additional component at any point before the
RTCalc_Train call. Scriptfile_split=unlist(strsplit(measured, split="\\n")) # split data into lines
peptides=lapply(strsplit(file_split,"\\t"), function(x) x[1])# create string vector using every second element
filtered_peptide_list<-"peptide\tprobability";
measured_RT=as.numeric(lapply(strsplit(file_split,"\\t"), function(x) x[2]))# create string vector using every second element
file_split=unlist(strsplit(predicted, split="\\n")) # split data into lines
predicted_RT=as.numeric(lapply(strsplit(file_split,"\\t"), function(x) x[2]))# create string vector using every second element
file_split=unlist(strsplit(probabilities, split="\\n")) # split data into lines
probabilities_P=as.numeric(lapply(strsplit(file_split,"\\t"), function(x) x[1])) # create string vector using every first element
rmsd2<-sqrt(sum(((measured_RT-predicted_RT)^2)/length(measured_RT)))
Z_score<-(measured_RT-predicted_RT)/rmsd2;
png("Z_scores_vs_probabilities.png", width=700, height=700); # make PNG plot
plot(log10(1-probabilities_P), Z_score, pch=20, xlab="log10(1-probability)", ylab="RTCalc Z-score");
for(i in 1:length(probabilities_P)) {
if (abs(Z_score[i])>max_abs_Z_score) points(log10(1-probabilities_P[i]), Z_score[i], pch=4, col="red")
if (abs(Z_score[i])<=max_abs_Z_score) {
new_line<-paste(peptides[i], probabilities_P[i], sep = "\t");
filtered_peptide_list<-paste(filtered_peptide_list, new_line ,sep = "\n");
}
}
temp<-dev.off();
R Serverlocalhost:6311 |
Extract_Probabilities |
xpath |
This XPath extracts the corresponding PeptideProphet probabilities for the identified peptides. Xpath Expression/default:msms_pipeline_analysis/default:msms_run_summary/default:spectrum_query/default:search_result/default:search_hit/default:analysis_result/default:peptideprophet_result/@probability |
Make_Table |
beanshell |
This Beanshell formats the probabilities for the Rshell below. Script//can be removed, connecting the nodelist to a numerical vector input
String temp_probabilities = new String();
for (i=0; i |
Beanshells (2)
|  |
Name |
Description |
Inputs |
Outputs |
Join_and_Insert_Tabs |
This Beanshell formats the data for RTCalc. |
in1
in2
|
training
validation_flat
|
Make_Table |
This Beanshell formats the probabilities for the Rshell below. |
in1
|
probabilities
|
Outputs (4)
|  |
Name |
Description |
RTCalc_model |
This holds the trained RTCalc model.
|
RTCalc_RMSD |
This is the RMSD between measured and predicted ("correct") retention time in scan numbers.
|
Z_Scores_vs_Probabilities |
This is a PNG plot of the Z-score as a function of PeptideProphet probability, with the removed PSMs
indicated in red.
|
Filtered_Peptide_List |
This output collects the PSMs within the Max_Abs_Z_Score in a flat file (string).
|
Datalinks (18)
|  |
Source |
Sink |
pepXML_File |
Read_Text_File:fileurl |
Read_Text_File:filecontents |
Extract_Peptides:xml_text |
Read_Text_File:filecontents |
Extract_Scan_Numbers:xml_text |
Extract_Peptides:nodelist |
Join_and_Insert_Tabs:in1 |
Extract_Scan_Numbers:nodelist |
Join_and_Insert_Tabs:in2 |
Join_and_Insert_Tabs:training |
RTCalc_Train:in1 |
RTCalc_Train:out1 |
RTCalc_Predict:in2 |
Join_and_Insert_Tabs:validation_flat |
RTCalc_Predict:in1 |
RTCalc_Predict:out1 |
Compare_Z_Scores_and_Probabilities:predicted |
Join_and_Insert_Tabs:training |
Compare_Z_Scores_and_Probabilities:measured |
Make_Table:probabilities |
Compare_Z_Scores_and_Probabilities:probabilities |
Max_Abs_Z_Score |
Compare_Z_Scores_and_Probabilities:max_abs_Z_score |
Read_Text_File:filecontents |
Extract_Probabilities:xml_text |
Extract_Probabilities:nodelist |
Make_Table:in1 |
RTCalc_Train:out1 |
RTCalc_model |
Compare_Z_Scores_and_Probabilities:rmsd2 |
RTCalc_RMSD |
Compare_Z_Scores_and_Probabilities:Z_scores_vs_probabilities |
Z_Scores_vs_Probabilities |
Compare_Z_Scores_and_Probabilities:filtered_peptide_list |
Filtered_Peptide_List |
Coordinations (0)
|  |
Uploader
License
All versions of this Workflow are
licensed under:
Version 1
(of 1)
Credits (2)
(People/Groups)
Attributions (0)
(Workflows/Files)
None
Shared with Groups (0)
None
Featured In Packs (0)
None
Log in to add to one of your Packs
Attributed By (0)
(Workflows/Files)
None
Favourited By (0)
No one
Statistics
Citations (0)Version History
Other workflows that use similar services
(0)
There are no workflows in myExperiment that use similar services to this Workflow.
No comments yet
Log in to make a comment