Understanding the contribution of various factors to the mutability of
a whole set of V region sequences cannot be done using the
information that I have generated for individual sequences. These
sequences arose from gene duplications, probably undergo gene
conversion, and thus we expect that their mutabilities are
correlated. To get around this problem, I designed another
test. Instead of constructing independent permutations of the
sequences, I construct permutations that do not alter the correlations
of codon usage in various genes. This may be achieved by aligning the
sequences and then permuting whole columns in the alignment. A column
should be one nucleotide in width if we are to construct variants that
preserve the nucleotide composition. Similarly, a column should span a
whole codon (3 nucleotides), if we want to obtain variants that
preserve the codon composition. Finally, for the translationally
neutral variants, I take each codon column in the alignment, and
identify the amino acids that appear at the position in the
alignment. For each of these amino acids I construct a permutation of
codons. Finally, going through all sequences, I replace the codon that
is present at that position in the germline sequence, with the one
that corresponds to it in the permutation. I repeat repeat this
process for each amino acid position in the alignment. I constructed
104 variant sets for each of these tests, determined the predicted
FR and CDR mutability for each of the sequences in the set, and then
averages these quantities over the set. The contour plot of the set
average of replacement mutability per FR and CDR nucleotide is shown
in Fig.
. These tests allowed me to conclude that the
nucleotide composition of CDRs creates motifs with higher replacement
mutability than that of the FRs. The set of CDR codons, which is a
subset of the motifs that can be created given the nucleotide
frequencies, is also a highly mutable subset. Also, the amino acid
sequence of human VH genes has on average higher CDR replacement
mutability, regardless of what the codon usage of these genes might
be.
Where does the set of real VH sequences stand with respect to these
variants (the average FR/CDR mutability of the germline sequence is
represented in Fig.
by the red dot)? It has
significantly higher CDR mutability than would be predicted from the
CDR nucleotide composition: If we do a rank test of the average FR and
CDR mutability, the normalized rank values that we obtain are 0.6233
for FR, and 0.9925 for CDR. It is not significantly different than
the sets with identical codon composition, the CDR codon composition
already rendering these regions highly mutable (normalized ranks
0.4246 for FR, and 0.1612 for CDR). This last test also tells us that
the exact way the codons are follow each other in the sequence does
not play a significant role in FR or CDR mutability. Finally, given
their amino acid sequence, the germline genes show clear codon usage
bias, for both FRs and CDRs. We find evidence for both FR mutability
minimization (normalized rank of the germline sequence set 0.0061) and
for CDR mutability maximization (normalized rank of the germline
sequence set 0.9566).
I can also clarify the effect of the serine codons on the mutability of human VH sequences. All amino acids, with the sole exception of serine are encoded by codons that are accessible from one another via a sequence of single point mutations. Thus, codon bias may evolve without changing the functionality of the protein product. For serine, this is not possible. This amino acid is encoded by six codons, of the type TCN and AGY (A,C,G,T being the four nucleotides, N standing for any of the four, and Y for purines, A and G). To go from the TCN codons to AGY requires two point mutations. Thus if in an ancestral sequence serine is encoded by a TCN codon, changing this into an AGY codon requires going through a non-serine amino acid. The consequence is that for serine we cannot disentangle selection for the specific amino acid from the development of a codon bias. Leaving out the serine codons in calculating the mutability of FRs and CDRs, I perform the same rank test of the germline sequence set with respect to its translationally neutral variant sets. What I find is that the predicted FR mutability remains significantly lower than the average of the variant sets with the same translation (normalized rank 0.0035), whereas the predicted mutability of the CDRs decreases considerably (normalized rank 0.1158). The CDR mutability remains, however, quite high, but the effect is not due to codon usage bias. Other factors that seem to be responsible for this high CDR mutability are the use of amino acids whose codons are highly mutable motifs such as tyrosine, and preferential use of two-fold degenerate amino acids. That is, amino acids that are encoded by only two codons.