next up previous contents
Next: Library Complementarity and Similarity Up: Results Previous: Validation   Contents

Estimated Diversity in Empirical Libraries

Figure 3.6: Estimated diversity in pure and mixed-culture M. truncatula libraries. The sample size (n) is 2125. Library diversity was estimated at three identity thresholds. Results from two estimators, (A) ACE and (B) Chao 1, are shown. The means and standard deviations of estimated diversity were calculated from 50 resampled replicates.
\begin{figure}\begin{center}
\leavevmode
\epsfig{file=diversity/figures/2125.eps,height=7in}\end{center}\end{figure}


Table 3.2: Estimated diversity in pure M. truncatula and mixed-culture libraries. Shown are the number of samples sequenced (N), and for three identity thresholds, observed diversity ($S_{obs}$), estimated diversity ($\hat{S}$, computed as the mean of ACE and Chao 1 estimators), and the ratio of observed to estimated diversity (q). All diversity estimators were computed from m=50 replicate samples of n=2125 sequences.
90% identity 70% identity 50% identity
library N $S_{obs}$ $\hat{S}$ q $S_{obs}$ $\hat{S}$ q $S_{obs}$ $\hat{S}$ q
DSIR 2284 1893 10584 17.9 1798 7776 23.1 1765 8124 21.7
MHAM 3017 2487 11833 21.0 2360 9480 24.9 2311 9053 25.5
KV0 2491 2112 13240 15.9 2005 9749 20.6 1963 8797 22.3
KV3 2173 1931 14674 13.2 1839 10888 16.9 1800 9979 18.0
NF root 2142 1798 8294 21.7 1694 6349 26.7 1636 5554 29.5
NF nod 2689 2264 13860 16.3 2165 10265 21.1 2094 9081 23.1

Figure 3.7: Effect of increasing sample size on estimated diversity of four pure and mixed-culture M. truncatula libraries, at varied stringency (percent identity threshold). Means and standard deviations were computed from 50 resampled replicates. Results from two estimators, (A) ACE and (B) Chao 1, are shown.
\begin{figure}\begin{center}
\leavevmode
\epsfig{file=diversity/figures/n.eps,height=7in}\end{center}\end{figure}

Estimated diversity increases with stringency, the percent identity threshold (Figure 3.6 and Table 3.2). That is, the more stringent the criterion for considering two sequences redundant, the more distinct samples result. Diversity also increases with increasing sample size (Figure 3.7). Fortunately, the change in estimated diversity also decreases with increasing samples, as the estimates approach the limit of true diversity. As seen in Figure 3.4, estimated diversity approaches true diversity far more rapidly than observed diversity.

The two diversity estimators yield similar results, with slightly larger values arising when using the ACE estimator than the Chao 1 estimator (Figures 3.6 and 3.7). For subsequent discussion, we take the mean of these two estimators as estimated diversity.

Diversity in DSIR, the Phytophthora-infected library, is consistently lower than in the other libraries (Table 3.2, and Figures 3.6 and 3.7). However, diversity in axenic root library is comparable to that found in mycorrhizal and nodulated root libraries.

Comparing diversity in libraries prepared and sequenced independently from similar tissue types, we might expect the KV0 and NF root libraries to have similar diversity, as well as the KV3 and NF nod libraries. Indeed, this appears to be true in the latter case, though not in the former (Table 3.2). In both nodulating root libraries, estimated diversity is about 14,000 distinct transcripts at the 90% identity threshold, 10,500 at 70% identity, and 9500 at 50% identity. However, diversity estimates in the two axenic root libraries differ more dramatically, with the KV0 library having about 60% greater estimated diversity than the NF nod library.

What fraction of expected diversity has been sequenced? If we evaluate the ratio of observed to estimated diversity ( $q =
S_{obs}/\hat{S}$), the proportion of expected total diversity already observed in each library varies from about 13% for KV3 at the 90% identity threshold, to 30% for the NF root library at the 50% identity threshold (Table 3.2).


next up previous contents
Next: Library Complementarity and Similarity Up: Results Previous: Validation   Contents
Peter T. Hraber 2001-06-13