Some of the libraries were obtained from M. truncatula root
tissues grown in pure culture, while other libraries were prepared
from infected root cultures. We expect some redundancy in gene
content found in various libraries, as all are obtained from the same
living tissue, plant roots. To quantify the extent to which
transcript composition of one library resembles another, we calculated
the degree of similarity between each pair of libraries A and B
using the Jaccard similarity index
[32,88].
This similarity measure is related to the degree of complementarity
, or distinctness in composition between two libraries, as
.
The complementarity
is the ratio of the number of distinct
transcripts, those unique to either library (
), to the total diversity in both libraries combined (
), where
is the number of distinct
transcripts (or quasispecies) present in both libraries
[32]. Transcripts present in both libraries were
identified with a BLASTN search, where B is the set of
query sequences and A is the subject set.
Thus, Jaccard similarity summarizes the proportion of distinct transcripts found in both libraries. It varies from 0, in cases where no transcripts are shared between the two libraries, to 100%, in which the two contain entirely the same quasispecies of transcripts. An advantage of the Jaccard similarity index is that it possesses the properties of a true distance metric [32]. Average similarity across all libraries can also be used as an index of beta diversity.