Which estimators accurately predict diversity?
![]() |
![]() |
| DISTRIBUTION | |
|||||
| ACE ESTIMATOR | ||||||
| a | 0.00326 | 0.00402 | 3.630 | 0.0018* | 192 | 0.0005** |
| b | 0.00035 | 0.00161 | 0.980 | 0.3395 | 127 | 0.4221 |
| c | 0.00379 | 0.00367 | 4.620 | 0.0002* | 193 | 0.0004** |
| d | -0.00058 | 0.00126 | -2.068 | 0.0526 | 56 | 0.0702 |
| ICE ESTIMATOR | ||||||
| a | 0.00624 | 0.00406 | 6.866 | 0.0001** | 209 | 0.0001** |
| b | 0.00113 | 0.00163 | 3.126 | 0.0056 | 173 | 0.0094 |
| c | 0.00554 | 0.00375 | 6.607 | 0.0001** | 203 | 0.0001** |
| d | -0.00034 | 0.00126 | -1.201 | 0.2444 | 76 | 0.2789 |
| CHAO1 ESTIMATOR | ||||||
| a | -0.00211 | 0.00394 | -2.391 | 0.0273 | 50 | 0.0399 |
| b | -0.00003 | 0.00187 | -0.082 | 0.9360 | 103 | 0.9563 |
| c | -0.00123 | 0.00476 | -1.158 | 0.2614 | 84 | 0.4524 |
| d | -0.00039 | 0.00132 | -1.329 | 0.1995 | 72 | 0.2250 |
| CHAO2 ESTIMATOR | ||||||
| a | -0.00219 | 0.00388 | -2.529 | 0.0204 | 44 | 0.0215 |
| b | -0.00004 | 0.00185 | -0.094 | 0.9258 | 102 | 0.9273 |
| c | -0.00124 | 0.00479 | -1.154 | 0.2627 | 85 | 0.4749 |
| d | -0.00039 | 0.00132 | -1.329 | 0.1997 | 72 | 0.2250 |
| MM ESTIMATOR | ||||||
| a | 0.23480 | 0.00702 | 149.674 | 0.0001** | 210 | 0.0001** |
| b | 0.17495 | 0.00575 | 136.163 | 0.0001** | 210 | 0.0001** |
| c | 0.20930 | 0.00430 | 217.704 | 0.0001** | 210 | 0.0001** |
| d | 0.12596 | 0.00339 | 165.916 | 0.0001** | 210 | 0.0001** |
Accumulation curves (Figure 3.4) illustrate observed and estimated transcript diversity as a function of increasing samples. Accumulation curves also indicate biases for small samples which diminish rapidly for most estimators. Estimator variance is low. The ACE and ICE estimators were correlated, as were Chao 1 and Chao 2, so we generally refer to ACE and Chao 1 (Figure 3.4).
Least biased estimators were Chao 1 (Figure 3.5C and Table 3.1) and Chao 2. The coverage estimators ACE and ICE seem unbiased, and converge rapidly on the diversity limit (Figure 3.4). However, formal tests indicate these estimators were accurate in only two of four test distributions (Table 3.1): b, exponential and d, negative binomial. The latter is consistent with Chao and Lee's investigations [29] using the negative binomial distribution. Coverage estimators overestimate diversity by about 0.4% to 0.8% (Figure 3.5A and B). This is consistent with Colwell and Coddington's evaluation of empirical data from species counts in seed-bank samples [32].
Biased estimators were Jackknife 1, Jackknife 2, and Bootstrap (not shown). The Michaelis-Menten maximum likelihood estimator was inaccurate (Figure 3.5D and Table 3.1), overestimating diversity by 12% to 24%. The Michaelis-Menten model is a parametric estimator [32,94,112], and depends on the transcript frequency distribution having properties that are not necessarily satisfied in all cases, as demonstrated in Appendix A.
The preferred estimators in this case are Chao 1 and Chao 2, with possible consideration of ACE and ICE. What do they tell us about transcript diversity in pure and mixed tissue cultures?