To test diversity estimates for accuracy, for each of the four
validation distributions (a-d), we calculated the deviation as
, where
is true diversity and
estimated diversity. We calculated mean and standard deviation of
from 20 replicate samples in which 500 individuals were
sampled before calculating estimators, to emulate the process of
sampling random clones in preparing a cDNA library. To identify
reliable estimators, two two-tailed univariate tests were performed of
the null hypothesis that
, namely Student's t, and the
non-parametric Wilcoxon signed-rank test, with
.
P-values were interpreted to indicate significant differences at
95% and 99% experiment-wide confidence levels. The correction to
maintain experiment-wide confidence for twenty multiple comparisons
was
, where
is the size of the test.
Methods in R, version 1.1.1 [62] performed the tests (t.test and w.test).
We used the most accurate, least-biased estimators, as judged by this experiment, to infer diversity in empirical transcript libraries.