This workflow takes the intron, the effect and the TP53 somatic mutation database as input and retrieves the full TP53 somatic mutation description(s) by first retrieving two different outputs:
- first output: a TP53 somatic mutation database unique IDs list associated with the input intron (done via a call to the getP53MutationIdsByIntron web service)
- second otput: a TP53 somatic mutation database unique IDs list associated with the input effect (done via a call to the getP53MutationIdsByEffect web service)and then using IDs for retrieving the full TP53 somatic mutations descriptions (done via a call to the getP53MutationsByIds web service).
All these web services are available at the soaplab system at http://bioinformatics.istge.it:8080/axis/services
<br>
A number or string list local elaborations (for both outputs) are required:
- returned IDs are in a string and this must be transformed in a list (done by the 'Split_string_into_string_list_by_regular_expression' processor and by the 'Split_string_into_string_list_by_regular_expression_2' processor, that are implemented by using a Split_string_into_string_list_by_regular_expression local processor)
- comparison of the two above outputs and identification of the common subset (done by the 'String_list_intersection' processor, that is implemented by using a String_list_intersection local processor)
- returned IDs include catalogues' names and this must be removed before their utilization for further processing (done by the 'Filter_list_of_strings_extracting_match_to_a_regex' processor, that is implemented by using a Filter_list_of_strings_extracting_match_to_a_regex local processor)
<br>
Special requirements on input data are:
- the intron range of numbers is 2-11,
- one or more of the following effects can be specified: 'fs' (frameshift), 'missense', 'na' (not available), 'nonsense', 'other', 'silent', 'splice'. Other values may lead to errors,
- when specifying more than one intron or effect, they must be in a unique input string but on distinct text lines
This regular expression specifies the format of a TP53 somatic mutation id in the O2I SRS network service.
This regular expression - ([A-Z0-9]*:)([0-9]*) - specifies that:
- the first part of the ID is the catalogue name and is made up of one or more uppercase letters, numbers and underscore characters
followed by a colon (':')
- the second part of the ID is the code of the mutation and must include numbers only.
([A-Z0-9]*:)([0-9]*)
This string ('\n') specify that TP53 somatic mutation IDs are separated by a end of line character.
It is used as a regex separator string to move more TP53 somatic mutation IDs from one text string into a list of strings.
\n
This data specifies that the mutation code is the second part of the ID (wrt the regular expression specified by the
'regex_id_separator' string.
2
This processor implements the following local string elaboration.
IDs which are returned by the getP53MutationIdsByEfect call are included in a plain text string.
They must be transformed in a list of strings.
This task is implemented by using a Split_string_into_string_list_by_regular_expression local processor.
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
This processor implements the following local string elaboration.
IDs which are returned by the call to String_list_intersection local processor include catalogue's name.
The catalogue's code must be removed before the utilization of Ids for further processing.
This task is implemented by using a Filter_list_of_strings_extracting_match_to_a_regex local processor.
org.embl.ebi.escience.scuflworkers.java.RegularExpressionStringList
This processor implements the following local string elaboration.
IDs which are returned by the getP53MutationIdsByIntron call are included in a plain text string.
They must be transformed in a list of strings.
This task is implemented by using a Split_string_into_string_list_by_regular_expression local processor.
org.embl.ebi.escience.scuflworkers.java.SplitByRegex
This processor implements the following local string elaboration.
The two lists of strings, which are returned by calls to the Split_string_into_string_list_by_regular_expression local processor and to Split_string_into_string_list_by_regular_expression_2 local processor, are compared in order to identify the common subset.
This task is implemented by using a String_list_intersection local processor.
org.embl.ebi.escience.scuflworkers.java.StringSetIntersection
This data specifies the name (constant) of the TP53 somatic mutation database that must be queried
tp53_somatic
Get tp53 gene mutation ids by effect from IARC TP53 Database catalogue (see srs.o2i.it/srs71/)
http://bioinformatics.istge.it:8080/axis/services/o2i.getP53MutationIdsByEffect
Get tp53 gene mutations by ids from TP53 IARC database (see http://srs.o2i.it/srs71/)
http://bioinformatics.istge.it:8080/axis/services/o2i.getP53MutationsByIds
Get tp53 gene mutation ids by intron from IARC TP53 Database catalogue (see srs.o2i.it/srs71/)
http://bioinformatics.istge.it:8080/axis/services/o2i.getP53MutationIdsByIntron