The main mechanisms that have so far been invoked to explain codon usage bias within genomes are concerned with either transcriptional efficiency (), or with the stability of the nucleic acids, or of the encoded proteins (1986). In a study of Bernardi and Bernardi (1986), it was shown that codon usage in genomes is determined by compositional constraints. That is, it was shown that the G/C content at the third, degenerate, position of the codons in a gene is correlated with the overall G/C content in the genome compartment where the gene resides. These in turn have to do with the stability of both nucleic acids and proteins, which depend on environmental pressures. Warm-blooded vertebrates have higher G/C content in their genes, which correlates with the stability of mRNA molecules. The amino acid replacements resulting from increasing G/C content have been shown to also lead to more thermodynamically stable proteins ().
With this mechanism in mind, I performed the following test. I used
the empirical mutability model described in chapter
to
calculate an average mutability per nucleotide for a number of
non-immunoglobulin genes. I also determined the A/T content of the
genes. Surprisingly, mutability is correlated with the
A/T-content of the genes. Thus, adjustments in the nucleotide
composition of non-immunoglobulin genes, that are associated with
higher stability of the DNA and mRNA, seem to correlate with low
apparent mutability under somatic hypermutation. This result raises
the interesting hypothesis that somatic hypermutation may involve
mispairing of nucleotides during DNA synthesis (this event being more
probable for A and T nucleotides), the resulting lesion failing to be
repaired. Alternatively, it may reflect a bias in the repair
mechanism.
For this study, I extracted a set of 140 human non-immunoglobulin
genes from GenBank (Appendix
). I performed a pairwise
alignment of all amino acid sequences, to ensure that no close
relation existed between any two sequences. That is because I want to
assess the significance of the biases that are found in random genes
in the genome, and this biases should not be due to the genealogical
relationship between sequences. I first determined the total
mutability (both silent and replacement) per nucleotide for all these
sequences. As shown in Fig.
, the mutability of a
sequence is anti-correlated with the G/C content of the sequence. The
correlation becomes even more significant when I calculate the
replacement mutability rather than total mutability of a nucleotide in
each sequence (Fig.
). This correlation can be
predicted qualitatively from the mutability matrix that we used. The
A/T content of individual triplets (which takes discrete values: 0,
1/3, 2/3, and 1) is already predictive of mutability. However, the
correlation is considerably stronger in real genes. Table
summarizes the results of the correlation test that
I performed on triplet mutability, and total and replacement
mutability per nucleotide for non-immunoglobulin sequences.
![]() |
This is precisely what previous studies of mutations that occur spontaneous in the genome evolution reported Bernardi and Bernardi (1986); Li (1997); Wolfe et al. (1989). It is also what we would expect for the somatic mutation mechanism that I studied, given that in the database of mutations from which the mutability values were inferred, adenine was the most frequently mutated nucleotide (1996). It is not, however, a general finding in somatic hypermutation studies. M. Flajnik (personal communication, 1998), for example, did not find a significant bias in mutation frequencies at different nucleotides. And yet others found a higher mutation frequency at G-C nucleotides (). At least in one of these cases (1992), however, the effect of selection could not be ruled out. The sequence-specificity of the mutator, that is, the mutability of a nucleotide in the context of the surrounding ones, was also not studied rigorously. What my result shows is that, at least in one model of somatic mutation in non-selected sequences (1996), the sequence-specificity of the mutator induces negative correlation between mutability and the G/C content of a sequence.