RTCalc Retention Time Prediction and Outlier Removal

Created: 2014-02-06 14:38:51 Last updated: 2014-02-06 14:43:24

Download Workflow

This workflow takes as input a pepXML file from PeptideProphet, applied RTCalc and outputs a filtered list of peptides based on the retention time Z-scores.

Preview

Download as scalable diagram (SVG)

Run

Run this Workflow in the Taverna Workbench...

Option 1:

Copy and paste this link into File > 'Open workflow location...'
http://www.myexperiment.org/workflows/4042/download?version=1
[ More Info Expand ]

Workflow Components

Authors (1)

Titles (1)

Descriptions (1)

Dependencies (0)

Inputs (2)

Name	Description
pepXML_File	This is the input peptide identifications in pepXML format.
Max_Abs_Z_Score	This is the maximum absolute Z-Score for inclusion in the Fiterered_Peptide_List.

Processors (9)

Name	Type	Description
Read_Text_File	localworker	This shell reads the X!Tandem/PeptideProphet results to feed the XPath extractors below. Script BufferedReader getReader (String fileUrl, String encoding) throws IOException { InputStreamReader reader; try { if (encoding == null) { reader = new FileReader(fileUrl); } else { reader = new InputStreamReader(new FileInputStream(fileUrl),encoding); } } catch (FileNotFoundException e) { // try a real URL instead URL url = new URL(fileUrl); if (encoding == null) { reader = new InputStreamReader (url.openStream()); } else { reader = new InputStreamReader (url.openStream(), encoding); } } return new BufferedReader(reader); } StringBuffer sb = new StringBuffer(4000); if (encoding == void) { encoding = null; } BufferedReader in = getReader(fileurl, encoding); String str; String lineEnding = System.getProperty("line.separator"); while ((str = in.readLine()) != null) { sb.append(str); sb.append(lineEnding); } in.close(); filecontents = sb.toString();
Extract_Peptides	xpath	This XPath extracts the amino acid sequences of the identified peptides. Xpath Expression /default:msms_pipeline_analysis/default:msms_run_summary/default:spectrum_query/default:search_result/default:search_hit/@peptide
Extract_Scan_Numbers	xpath	This XPath extracts the corresponding scan numbers for the identified peptides. Xpath Expression /default:msms_pipeline_analysis/default:msms_run_summary/default:spectrum_query/@start_scan
Join_and_Insert_Tabs	beanshell	This Beanshell formats the data for RTCalc. Script String temp_training = new String(); String temp_validation_flat = new String(); for (i=0; i
RTCalc_Train	externaltool	This component trains the RTCalc predictor.
RTCalc_Predict	externaltool	This tool applies the RTCalc predictor.
Compare_Z_Scores_and_Probabilities	rshell	This Rshell script calculates and plots RTCalc retention time Z-scores for all peptides in the pepXML file. There is no probability cutoff for the training of RTCalc, so for datasets with many false peptide identifications (or peptide-spectrum matches of low probability), a selection should be in the XPaths, in the Join_and_Insert_Tabs component or in an additional component at any point before the RTCalc_Train call. Script file_split=unlist(strsplit(measured, split="\\n")) # split data into lines peptides=lapply(strsplit(file_split,"\\t"), function(x) x[1])# create string vector using every second element filtered_peptide_list<-"peptide\tprobability"; measured_RT=as.numeric(lapply(strsplit(file_split,"\\t"), function(x) x[2]))# create string vector using every second element file_split=unlist(strsplit(predicted, split="\\n")) # split data into lines predicted_RT=as.numeric(lapply(strsplit(file_split,"\\t"), function(x) x[2]))# create string vector using every second element file_split=unlist(strsplit(probabilities, split="\\n")) # split data into lines probabilities_P=as.numeric(lapply(strsplit(file_split,"\\t"), function(x) x[1])) # create string vector using every first element rmsd2<-sqrt(sum(((measured_RT-predicted_RT)^2)/length(measured_RT))) Z_score<-(measured_RT-predicted_RT)/rmsd2; png("Z_scores_vs_probabilities.png", width=700, height=700); # make PNG plot plot(log10(1-probabilities_P), Z_score, pch=20, xlab="log10(1-probability)", ylab="RTCalc Z-score"); for(i in 1:length(probabilities_P)) { if (abs(Z_score[i])>max_abs_Z_score) points(log10(1-probabilities_P[i]), Z_score[i], pch=4, col="red") if (abs(Z_score[i])<=max_abs_Z_score) { new_line<-paste(peptides[i], probabilities_P[i], sep = "\t"); filtered_peptide_list<-paste(filtered_peptide_list, new_line ,sep = "\n"); } } temp<-dev.off(); R Server localhost:6311
Extract_Probabilities	xpath	This XPath extracts the corresponding PeptideProphet probabilities for the identified peptides. Xpath Expression /default:msms_pipeline_analysis/default:msms_run_summary/default:spectrum_query/default:search_result/default:search_hit/default:analysis_result/default:peptideprophet_result/@probability
Make_Table	beanshell	This Beanshell formats the probabilities for the Rshell below. Script //can be removed, connecting the nodelist to a numerical vector input String temp_probabilities = new String(); for (i=0; i

Beanshells (2)

Name	Description	Inputs	Outputs
Join_and_Insert_Tabs	This Beanshell formats the data for RTCalc.	in1 in2	training validation_flat
Make_Table	This Beanshell formats the probabilities for the Rshell below.	in1	probabilities

Outputs (4)

Name	Description
RTCalc_model	This holds the trained RTCalc model.
RTCalc_RMSD	This is the RMSD between measured and predicted ("correct") retention time in scan numbers.
Z_Scores_vs_Probabilities	This is a PNG plot of the Z-score as a function of PeptideProphet probability, with the removed PSMs indicated in red.
Filtered_Peptide_List	This output collects the PSMs within the Max_Abs_Z_Score in a flat file (string).

Datalinks (18)

Source	Sink
pepXML_File	Read_Text_File:fileurl
Read_Text_File:filecontents	Extract_Peptides:xml_text
Read_Text_File:filecontents	Extract_Scan_Numbers:xml_text
Extract_Peptides:nodelist	Join_and_Insert_Tabs:in1
Extract_Scan_Numbers:nodelist	Join_and_Insert_Tabs:in2
Join_and_Insert_Tabs:training	RTCalc_Train:in1
RTCalc_Train:out1	RTCalc_Predict:in2
Join_and_Insert_Tabs:validation_flat	RTCalc_Predict:in1
RTCalc_Predict:out1	Compare_Z_Scores_and_Probabilities:predicted
Join_and_Insert_Tabs:training	Compare_Z_Scores_and_Probabilities:measured
Make_Table:probabilities	Compare_Z_Scores_and_Probabilities:probabilities
Max_Abs_Z_Score	Compare_Z_Scores_and_Probabilities:max_abs_Z_score
Read_Text_File:filecontents	Extract_Probabilities:xml_text
Extract_Probabilities:nodelist	Make_Table:in1
RTCalc_Train:out1	RTCalc_model
Compare_Z_Scores_and_Probabilities:rmsd2	RTCalc_RMSD
Compare_Z_Scores_and_Probabilities:Z_scores_vs_probabilities	Z_Scores_vs_Probabilities
Compare_Z_Scores_and_Probabilities:filtered_peptide_list	Filtered_Peptide_List