next up previous contents
Next: Validation Up: How Many Genes? Transcript Previous: Introduction   Contents


Methods

The most reliable diversity estimate is contingent on the underlying frequency distribution of transcripts, and is best chosen with informed experiments [32,112]. Appendix A demonstrates that, in general, the functional form of an accumulation curve depends on the underlying frequency distribution of transcripts.

In early attempts to estimate library diversity, we considered using nonlinear regression of accumulation curves, by fitting parameters to an asymptotic model. Several asymptotic functions have been used to infer diversity as a function of sample size, each having different assumptions about the sampling process [32]. The difficulty was how to choose an appropriate regression model. An alternative approach to modeling asymptotic parameters for an accumulation curve is to use one of several non-parametric estimators, such as derived in [28].

Ultimately, it is notoriously difficult to test the statistical hypothesis that an observed distribution originated from one distribution and not another [32,38]. Thus, we chose to use non-parametric diversity estimators developed in the ecological literature. Before estimating diversity in empirical libraries, a pilot validation study verified whether non-parametric estimators yield accurate, unbiased diversity estimates.



Subsections
next up previous contents
Next: Validation Up: How Many Genes? Transcript Previous: Introduction   Contents
Peter T. Hraber 2001-06-13