Correlation-based variable selection for differential gene expression analysis
Presenter
November 15, 2011
Keywords:
- Selection
MSC:
- 54C65
Abstract
The problem of variable selection is useful for identifying the principal drivers of differential
gene-response under one or more treatments, phenotypes, or conditions. Once identified, such drivers can be targeted as potential
knockouts or enhancers in drug discovery or diagnostic testing. In high throughput data such as gene or
protein expression the large number of variables has made it impractical to implement all but the
simplest univariate methods for variable selection, e.g., detecting significant shifts in t-test or Wilcoxon test statistics.
We propose an alternative approach based on detecting significant shifts in patterns of
connectivity of genes in a correlation graph or concentration graph. Remarkably, it is precisely when the sample size is small
that the approach is scalable, e.g., to whole genome analysis. Furthermore a statistical performance analysis establishes phase transition behaviors and tight approximations to false discovery rate that can be used for error control. We will illustrate the approach on
several gene expression datasets.