Figure 4.6 summarizes GC content in training sets, among validation sequences from Glomus and Medicago spp., and in test libraries. Plant sequences have lower GC than in fungi, which in turn have lower GC than rhizobacteria. The same is true for validation sequences, though the separation is not as great as in training sets. Curiously, this pattern is reversed for axenic plant and fungal test sequences. The Mt Long library has GC content comparable to validation sequences sampled from M. truncatula, intermediate between GC in plant and fungal training sets. Libraries from Glomus intraradices have about 10% lower GC than the fungal training set, and resemble more closely the plant training set. Overlap in GC content with the fungal training set is greatest for the Lammers library, less so for the Harrison library, and not at all for the Sawaki library.
![]() |
If we compare GC content (Figure 4.6) with abundant hexamers (Table 4.4 and 4.5), we note that the two comparative measures of composition are related within a set of sequences. Test sequences contain AT-rich common hexamers, and GC residues appear in the Long plant library more often than in the fungal libraries (Table 4.5). This pattern is the opposite of what was seen in training sets (Table 4.4), where common plant hexamers are dominated by T residues, and common fungal hexamers are less biased in composition.
| RANK | FUNGI | PLANTS | RHIZOBACTERIA | |||
| 1 | 0.158 | CAAGAA | 0.191 | TTTTTT | 0.143 | CGCCGC |
| 2 | 0.152 | TGGTAT | 0.177 | TATTTT | 0.142 | GGCGGC |
| 3 | 0.151 | TCAAGG | 0.168 | TTTTAT | 0.142 | GCGGCG |
| 4 | 0.148 | GTCAAG | 0.166 | TTATTT | 0.142 | GCCGGC |
| 5 | 0.147 | TCAAGA | 0.162 | ATTTTT | 0.137 | CGGCGC |
| 6 | 0.146 | CAAGGA | 0.161 | TGTTTT | 0.133 | CGGCGA |
| 7 | 0.128 | CTGGTA | 0.159 | TTTGTT | 0.130 | TCGCCG |
| 8 | 0.127 | ATCAAG | 0.157 | TTTATT | 0.129 | TCGGCG |
| 9 | 0.126 | GCTGGT | 0.155 | TTGTTT | 0.127 | GCGCCG |
| 10 | 0.126 | CGTCAA | 0.154 | TTAATT | 0.127 | CCGGCG |
| RANK | MT LONG | GI HARRISON | GI LAMMERS | GI SAWAKI | ||||
| 1 | 0.173 | GAAGAA | 0.247 | AAAGAA | 0.225 | TTATTA | 0.263 | AAGAAA |
| 2 | 0.149 | AAGAAG | 0.230 | AAAAGA | 0.221 | TTTATT | 0.246 | AAAGAA |
| 3 | 0.143 | AAGAAA | 0.215 | AAAAAT | 0.221 | ATTATT | 0.227 | GAAAAA |
| 4 | 0.126 | TTCTTC | 0.212 | AAGAAA | 0.212 | ATTTTT | 0.223 | AAAAAT |
| 5 | 0.122 | TGAAGA | 0.207 | AAAGAT | 0.192 | TATTAT | 0.222 | AAAAGA |
| 6 | 0.122 | TTGTTG | 0.204 | CAAAAA | 0.183 | TTATTT | 0.215 | AATAAA |
| 7 | 0.120 | AAAGAA | 0.202 | AAAAAG | 0.183 | TATTTT | 0.205 | AAAATT |
| 8 | 0.116 | AGAAGA | 0.195 | TGATGA | 0.177 | AATTTT | 0.200 | AAATTA |
| 9 | 0.115 | TGATGA | 0.191 | AAAATT | 0.172 | TTTTAT | 0.197 | ATTATT |
| 10 | 0.113 | TTTGTT | 0.189 | ACAAAA | 0.172 | AATAAT | 0.195 | TTTATT |