Comparison curves indicate dissimilarity test results as normalized, cumulative distributions, rather than as individual values. Evaluating where a comparison curve intersects some threshold value of t indicates the proportion of sequences in a library likely to have originated from taxon A, as described in Section 2.4. Figure 4.3A summarizes calibration curves, confidence curves, and comparison curves for comparisons between fungi and plants. An unexpected outcome is the finding that the axenic plant library (Mt Long) contains a lower portion of putative plant transcripts than any of the three libraries prepared from G. intraradices tissues. Even if we restrict this to a conservative value for t, such as 400, a greater proportion of transcripts having hexamer compositions resembling those of plants is present among libraries from G. intraradices than in the library prepared from axenic M. truncatula root hairs. The extent of the disparity is quantified below.
Comparison results between fungi and rhizobacteria are shown in Figure 4.3B. Only one library contains a marginally significant proportion of sequences that resemble rhizobacteria, the Lammers library from germinating G. intraradices spores (black line), though none of these are significant at P < 0.05.
In contrast to looking at overall properties, the characteristics of
an individual sequence may also be of interest. One perspective is to
examine individual sequences that can be identified as more closely
resembling the hexamer compositions of fungal, plant, or
rhizobacterial training sets (Figure 4.4). By
extension, one may visualize the hexamer dissimilarity test results of
sequences from several different libraries at once. Transcripts more
like fungal training sets should appear in the lower-left quadrant,
because t < 0 when compared both with plants and rhizobacteria. By
the same reasoning, plant-like sequences should appear in the
lower-right quadrant (
;
), and sequences resembling
rhizobacteria should appear in the upper-left quadrant
(
;
).
![]() |
On inspection of the results plotted in Figure 4.4, sequences from Medicago truncatula generally lie in the lower-right quadrant (solid green circles). Shorter sequences fall near the origin, while longer sequences are further from the origin, because t generally increases with sequence length. Sequences from Glomus spp. (solid magenta circles) generally lie in the lower-left quadrant, with exceptions as noted for Table 4.3 and one sequence that has a hexamer composition more closely resembling rhizobacteria than fungi. This is a large (L > 1500 nt) fungal homeobox gene (accession AF110198), which is rich in GC content (56.1%).
Consistent with what was seen in Figure 4.3, the plant library has a greater proportion of putative fungal sequences than the fungal libraries. This is apparent from the large number of transcripts from the Long root-hair enriched library that appear in the lower-left quadrant (open green circles), relative to fungal libraries, which lie mostly in the plant quadrant.
In light of the confidence calculations described above, none of the plant or fungal transcripts that resemble rhizobacteria more closely than fungi should be considered significantly non-fungal (P > 0.05). However, in the case of fungal-plant comparisons, a non-trivial proportion of both plant and fungal transcripts appear to resemble plants strongly enough to reject the null hypothesis (P < 0.05). For a critical test value of t=312, 11% of 899 transcripts in the Long plant library, 10% of 363 transcripts in the Harrison library, 25% of 182 transcripts in the Lammers library, and 10% of 165 transcripts in the Sawaki library have hexamer compositions that significantly resemble plants, when compared with fungi.
To control for the effect that t increases with longer sequences,
one can readily rescale t as
and plot the transformed
data (Figure 4.5). In addition to
transformed data from the previous figure, values obtained from
comparing hypothetical, repeated-hexamer sequences of increasing
length, from 64 to 1024 nt, are also shown. This illustrates the
influence of single hexamer instances on t. Sequences consisting
only of the hexamers AAAAAA, TTTTTT, and GAGAGA
strongly resemble plant sequences, lying in the lower-right quadrant,
and sequences consisting of GCGCGC strongly resemble
rhizobacteria. The degree of resemblance increases with longer
sequences, but none changed quadrants.
![]() |