next up previous
Next: A significant proportion of Up: Non-immunoglobulin genes would have Previous: Non-immunoglobulin genes would have

In non-immunoglobulin genes, predicted mutability is correlated with A/T content

The main mechanisms that have so far been invoked to explain codon usage bias within genomes are concerned with either transcriptional efficiency (), or with the stability of the nucleic acids, or of the encoded proteins (1986). In a study of Bernardi and Bernardi (1986), it was shown that codon usage in genomes is determined by compositional constraints. That is, it was shown that the G/C content at the third, degenerate, position of the codons in a gene is correlated with the overall G/C content in the genome compartment where the gene resides. These in turn have to do with the stability of both nucleic acids and proteins, which depend on environmental pressures. Warm-blooded vertebrates have higher G/C content in their genes, which correlates with the stability of mRNA molecules. The amino acid replacements resulting from increasing G/C content have been shown to also lead to more thermodynamically stable proteins ().

With this mechanism in mind, I performed the following test. I used the empirical mutability model described in chapter [*] to calculate an average mutability per nucleotide for a number of non-immunoglobulin genes. I also determined the A/T content of the genes. Surprisingly, mutability is correlated with the A/T-content of the genes. Thus, adjustments in the nucleotide composition of non-immunoglobulin genes, that are associated with higher stability of the DNA and mRNA, seem to correlate with low apparent mutability under somatic hypermutation. This result raises the interesting hypothesis that somatic hypermutation may involve mispairing of nucleotides during DNA synthesis (this event being more probable for A and T nucleotides), the resulting lesion failing to be repaired. Alternatively, it may reflect a bias in the repair mechanism.

For this study, I extracted a set of 140 human non-immunoglobulin genes from GenBank (Appendix [*]). I performed a pairwise alignment of all amino acid sequences, to ensure that no close relation existed between any two sequences. That is because I want to assess the significance of the biases that are found in random genes in the genome, and this biases should not be due to the genealogical relationship between sequences. I first determined the total mutability (both silent and replacement) per nucleotide for all these sequences. As shown in Fig. [*], the mutability of a sequence is anti-correlated with the G/C content of the sequence. The correlation becomes even more significant when I calculate the replacement mutability rather than total mutability of a nucleotide in each sequence (Fig. [*]). This correlation can be predicted qualitatively from the mutability matrix that we used. The A/T content of individual triplets (which takes discrete values: 0, 1/3, 2/3, and 1) is already predictive of mutability. However, the correlation is considerably stronger in real genes. Table [*] summarizes the results of the correlation test that I performed on triplet mutability, and total and replacement mutability per nucleotide for non-immunoglobulin sequences.


  \begin{figure}% latex2html id marker 1920
\centerline{\epsfxsize=8cm \epsfbox{co...
...quence. Each data point represents one non-immunoglobulin sequence.}\end{figure}


  
Figure 4.2: Average replacement mutability per nucleotide versus the G/C content of the sequence. Each data point represents one non-immunoglobulin sequence.
\begin{figure}\centerline{\epsfxsize=8cm \epsfbox{correl_repl.epsi}}
\end{figure}


 
Table 4.1: Correlation between mutability and A/T content
  Data set Pearson correlation Spearman correlation
  (P-value) (P-value)
Triplets 0.293 (0.0187) 0.347 (0.0058)
Non-Ig sequences - total mutability 0.758 (0) 0.82 (0)
Non-Ig sequences - replacement mutability 0.846 (0) 0.873 (0)

This is precisely what previous studies of mutations that occur spontaneous in the genome evolution reported Bernardi and Bernardi (1986); Li (1997); Wolfe et al. (1989). It is also what we would expect for the somatic mutation mechanism that I studied, given that in the database of mutations from which the mutability values were inferred, adenine was the most frequently mutated nucleotide (1996). It is not, however, a general finding in somatic hypermutation studies. M. Flajnik (personal communication, 1998), for example, did not find a significant bias in mutation frequencies at different nucleotides. And yet others found a higher mutation frequency at G-C nucleotides (). At least in one of these cases (1992), however, the effect of selection could not be ruled out. The sequence-specificity of the mutator, that is, the mutability of a nucleotide in the context of the surrounding ones, was also not studied rigorously. What my result shows is that, at least in one model of somatic mutation in non-selected sequences (1996), the sequence-specificity of the mutator induces negative correlation between mutability and the G/C content of a sequence.


next up previous
Next: A significant proportion of Up: Non-immunoglobulin genes would have Previous: Non-immunoglobulin genes would have
Mihaela Oprea
1999-04-11