next up previous
Next: Are human V-region sequences Up: Somatic hypermutation targets the Previous: Statistical analysis on the

Contribution of nucleotide composition, codon composition and codon usage bias to the predicted FR and CDR replacement mutability of human VH sequences

Understanding the contribution of various factors to the mutability of a whole set of V region sequences cannot be done using the information that I have generated for individual sequences. These sequences arose from gene duplications, probably undergo gene conversion, and thus we expect that their mutabilities are correlated. To get around this problem, I designed another test. Instead of constructing independent permutations of the sequences, I construct permutations that do not alter the correlations of codon usage in various genes. This may be achieved by aligning the sequences and then permuting whole columns in the alignment. A column should be one nucleotide in width if we are to construct variants that preserve the nucleotide composition. Similarly, a column should span a whole codon (3 nucleotides), if we want to obtain variants that preserve the codon composition. Finally, for the translationally neutral variants, I take each codon column in the alignment, and identify the amino acids that appear at the position in the alignment. For each of these amino acids I construct a permutation of codons. Finally, going through all sequences, I replace the codon that is present at that position in the germline sequence, with the one that corresponds to it in the permutation. I repeat repeat this process for each amino acid position in the alignment. I constructed 104 variant sets for each of these tests, determined the predicted FR and CDR mutability for each of the sequences in the set, and then averages these quantities over the set. The contour plot of the set average of replacement mutability per FR and CDR nucleotide is shown in Fig. [*]. These tests allowed me to conclude that the nucleotide composition of CDRs creates motifs with higher replacement mutability than that of the FRs. The set of CDR codons, which is a subset of the motifs that can be created given the nucleotide frequencies, is also a highly mutable subset. Also, the amino acid sequence of human VH genes has on average higher CDR replacement mutability, regardless of what the codon usage of these genes might be.

Where does the set of real VH sequences stand with respect to these variants (the average FR/CDR mutability of the germline sequence is represented in Fig. [*] by the red dot)? It has significantly higher CDR mutability than would be predicted from the CDR nucleotide composition: If we do a rank test of the average FR and CDR mutability, the normalized rank values that we obtain are 0.6233 for FR, and 0.9925 for CDR. It is not significantly different than the sets with identical codon composition, the CDR codon composition already rendering these regions highly mutable (normalized ranks 0.4246 for FR, and 0.1612 for CDR). This last test also tells us that the exact way the codons are follow each other in the sequence does not play a significant role in FR or CDR mutability. Finally, given their amino acid sequence, the germline genes show clear codon usage bias, for both FRs and CDRs. We find evidence for both FR mutability minimization (normalized rank of the germline sequence set 0.0061) and for CDR mutability maximization (normalized rank of the germline sequence set 0.9566).


  \begin{figure}% latex2html id marker 1277
\centerline{\epsfxsize=8cm \epsfbox{se...
... 10, and 100 sequences. The germline sequence
set is shown in red.}\end{figure}

I can also clarify the effect of the serine codons on the mutability of human VH sequences. All amino acids, with the sole exception of serine are encoded by codons that are accessible from one another via a sequence of single point mutations. Thus, codon bias may evolve without changing the functionality of the protein product. For serine, this is not possible. This amino acid is encoded by six codons, of the type TCN and AGY (A,C,G,T being the four nucleotides, N standing for any of the four, and Y for purines, A and G). To go from the TCN codons to AGY requires two point mutations. Thus if in an ancestral sequence serine is encoded by a TCN codon, changing this into an AGY codon requires going through a non-serine amino acid. The consequence is that for serine we cannot disentangle selection for the specific amino acid from the development of a codon bias. Leaving out the serine codons in calculating the mutability of FRs and CDRs, I perform the same rank test of the germline sequence set with respect to its translationally neutral variant sets. What I find is that the predicted FR mutability remains significantly lower than the average of the variant sets with the same translation (normalized rank 0.0035), whereas the predicted mutability of the CDRs decreases considerably (normalized rank 0.1158). The CDR mutability remains, however, quite high, but the effect is not due to codon usage bias. Other factors that seem to be responsible for this high CDR mutability are the use of amino acids whose codons are highly mutable motifs such as tyrosine, and preferential use of two-fold degenerate amino acids. That is, amino acids that are encoded by only two codons.


next up previous
Next: Are human V-region sequences Up: Somatic hypermutation targets the Previous: Statistical analysis on the
Mihaela Oprea
1999-04-11