![]() |
In the previous chapter, I showed that the codon usage of framework
and complementarity-determining regions of immunoglobulin genes is
biased, inducing lower mutability of a FR nucleotide compared to a CDR
nucleotide. This can be inferred by comparing the mutability of the
germline sequence with a set of variants with identical amino acid
sequence, but unbiased codon usage. I will apply a similar technique
to the set of non-immunoglobulin sequences. Briefly, for each sequence
in the data set, I generate a set of 104 variants as follows. I
translate the nucleotide sequence into its corresponding amino acid
sequence. Then, for each amino acid, I choose, with uniform
probability, one of the codons that can encode it. I generate 104
such variants for each non-immunoglobulin sequence in the initial data
set. I calculate their average replacement mutability per nucleotide,
and then determine the rank of the mutability of the germline sequence
relative to its translationally neutral variants. Fig.
shows the frequency distribution of the normalized
ranks of the 140 genes. Approximately half of the genes in the set
have a mutability that is in the low 5% compared to their variants
with the same amino acid translation, and unbiased codon usage. As I
mentioned, I ruled out any obvious genealogical relationship between
these sequences. If their codon usage of the genes was unbiased, we
would expect that the distribution of ranks would be uniform. The fact
that it is not could indicate two things:
I attempted to decide between these alternatives using the following test. Let us generate a different codon usage bias. Let Cij be the set of codons, where i denotes the amino acid that the codon j is specifying. Let Pi(j) be a random permutation of the codons encoding amino acid i. Then to construct a sequence under this new codon usage bias, I replace each codon in the sequence Cij by CiPi(j). The set Pi(j), with i = 1...20 constitutes the new codon usage bias. For each codon bias thus constructed, I re-generate the set of 140 gene sequences, and calculate their replacement mutability under somatic mutation. Due to computational constraints, I only generated 100 different permutations of the codons.
As I showed previously (Fig.
), 73 of the 140
non-immunoglobulin sequences that I studied have codon usage that
places them in the lowest 5% in mutability among their
translationally invariant variants. In fact, 66 of the 140 sequences
are in the lowest 1% among their neutral variants. I generate similar
sets of translationally neutral variants for each sequence under each
codon usage bias. I then determine how many of these codon usage
biases give us as many significantly low mutable sequences. It turns
out that if I set the significance level at the normalized rank of
1% among the neutral variants, none of the codon usage biases can
produce as many low mutable sequences as the original codons usage
bias.
This result allows me to conclude that it is not a random codon usage bias that the somatic hypermutation mechanism would pick out of these sequences. It is specifically the codon bias present in the set of germline genes that I used for this study. Thus, there is a significant correlation between the sequence specificity of the somatic hypermutation mechanism and the codon bias present in human genes. This may be due to: