next up previous contents
Next: Introduction Up: How Many Genes? Transcript Previous: How Many Genes? Transcript   Contents

Synopsis

Advances in sequencing technology have allowed many researchers to acquire expressed sequence tags (ESTs) to begin characterizing genes expressed in the organisms they study. However, to sequence the same EST repeatedly is an inefficient use of resources. One goal of an EST sequencing project should be to obtain as many distinct tags as possible, with minimal redundancy. Knowing how many genes are in a library before having sequenced them all can help to decide when it is best to normalize a library, and can be used to compute the proportion of expected diversity that has been sampled, the library coverage. This chapter describes the use of redundancy rates from random EST sequencing to predict the number of distinct transcripts in a library. We first test which non-parametric diversity estimators are least biased when inferring diversity for a variety of distributions, and then apply estimators to several plant libraries that have been partially sequenced.



Peter T. Hraber 2001-06-13