Affimetrix microarray: part two (perm)

Created: 2008-12-05 14:06:06

Download Workflow

We use a permutation test to infer the significance of the differently expressed genes found by the ANOVA analysis. Permutation tests are computationally intensive, needing at least 1000 permutations per gene to obtain acceptable results. For large experiments, the in-built feature of running R/MAANOVA in a single cluster may not be enough. We are using WS- VLAM workflow management system [2] to create a Grid-enabled R/MAANOVA workflow (Grid-MAANOVA) that will simultaneously run in multiple clusters. This Grid implementation will have two levels of parallelization.

Level 1: Job-farming the genes The probe level R/MAANOVA scheme is intrinsically parallel: the F statistics where for each group of probe-sets can be computed independently. However, variations on group size and prob-set sizes may lead to unbalanced computation time. This problem can be solved by splitting the groups with larger computation time into smaller groups.

Level 2: Job-farming the permutations If the computation time of a subset of genes is still too long, the permutation F test will be parallelized by submitting jobs containing an appropriate number of permutations. This will be done in three steps.