We wrote a PERL program to calculate the GC base composition of a sequence as the portion of guanine and cytosine residues among all unambiguous (non-N) nucleotides in a sequence. The hist method in R, version 1.1.1 [62] aggregated continuous percentages into discrete histogram bins, using bin sizes of 2% difference in GC, with inclusive lower bin boundaries and exclusive upper bounds; the lm method tested for linear correlation of the dissimilarity test statistic t with GC.