next up previous contents
Next: Diversity Estimators Up: Validation Previous: Validation   Contents


Sampling Distributions

Figure 3.2: Four frequency distributions used to test diversity estimators. Portion of total is shown as a function of transcript abundance. The zero abundance class was eliminated from frequency calculations. The remaining net diversity (S) and number of individuals (N) are summarized for each distribution.
\begin{figure}\begin{center}
\leavevmode
\epsfig{file=diversity/figures/validation-hist.eps,width=5.75in}\end{center}\end{figure}

We used four frequency distributions to validate diversity estimators (Figure 3.2), chosen to model ecological distributions [76,89]. The four sample distributions from four families, each producing long-tailed curves, having many rare and a few common individuals [38,89] were:

  1. Poisson, a discrete distribution with $x \ge 1$, $g(x)=e^{-\lambda} \lambda^x / x!, \lambda=1$, which simplifies to $(ex!)^{-1}$;

  2. exponential, $e^{-1}$;

  3. log-normal, with parameters log-mean=1 and log-variance=0.7; and

  4. negative binomial, with parameters size=1 and probability=0.5.

Methods in R, version 1.1.1 [62], generated the validation distributions (rpois, rexp, rlnorm, and rnbinom).

To make continuous distributions discrete, R rounded down decimal values to the nearest integer (e.g., 0.999 = 0). The zero abundance, null class was eliminated from frequency calculations because it is not observable; no representatives are present, so not counted in diversity (c.f. [38]). Sampling was from $g(x)$ as $m \times n$ samples, m=20 replicates of n=500 individuals, with replacement between samples.

The true underlying frequency distribution of transcripts in a cell is unknown, but expression assay results indicate that the general form of the distribution is likely to vary across cell types [74].


next up previous contents
Next: Diversity Estimators Up: Validation Previous: Validation   Contents
Peter T. Hraber 2001-06-13