Oregon State University



Event Details

The Negative Binomial-P Distribution for Assessing Differential Gene Expression

Dan Schafer, Dept of Statistics, OSU

Monday, February 8, 2010 4:00 PM - 5:00 PM

RNA-Seq is a genetic sequencing technology, which can be used to measure expression levels of genes from a biological sample. Although RNA-Seq has been dubbed a revolutionary tool for transcriptomics and is believed by many to represent "the beginning of the end" of microarray analysis, there is currently no standard statistical technique for assessing differential gene expression from its output. 


The expression level for a gene is indicated by a relative frequency of occurrence of "reads" attributed to that gene. For each gene, the statistical problem is simply a two-sample comparison of counts. A difference in the count distribution in two groups is taken as an indication of differential gene expression.  There are three major obstacles, though: there is a huge multiple testing problem (in our data problem, for example, we test differential expression in about 30,000 genes), run times and costs of RNA-Seq technology preclude large numbers of replicates (in our example, the group sample size is 3), and there is extreme variation in the counts.  At first glance, therefore, the statistician should probably decline to help and should refer the biologist to the psychic hotline instead.  There are, however, reasons to believe that useful scientific conclusions can be drawn. The obstacles, though, force more than usual attention to power and robustness tradeoffs. In addition, the small number of biological replicates may preclude the use of standard asymptotic tests for count data.


Other statisticians, in assessing differential gene expression from serial analysis of gene expression (SAGE) technology, proposed an exact two-sample test based on a negative binomial (NB) model and an assumption that the dispersion parameter is constant for all genes. While this is an attractive solution for our problem, the assumption is not met.  There is, however, a parameterization of the negative binomial distribution, recently labeled the negative binomial-P (NBP) distribution, which includes an additional parameter for more flexibile modeling of the mean-variance relationship; and this model does fit. Our research, which is still in its early stages, involves estimation and testing based on the NBP distribution, with specific interest in its potential implementation for RNA-Seq technology and a clarification of sample size requirements.  We are also interested more generally in NBP log-linear regression. This is joint work with Jeff Chang and Jason Cumbie of the Botany and Plant Pathology Department, and Yanming Di of the Statistics Department.  


The talk will focus on the statistical problem, with very little attention to the background biology, and will be geared towards an audience that has taken, or is taking, a first year graduate-level sequence in statistical theory. 



Kidder Hall (campus map)
Maggie Neel
1 541 737 1981
neel at science.oregonstate.edu
Statistics (Science)
This event appears on the following calendars: