next up previous
Next: Contribution of nucleotide composition, VH Up: Somatic hypermutation targets the Previous: All human immunoglobulin V-region

Statistical analysis on the level of individual sequences

One of the problems that limits the power of analysis of gene sequences is the lack of appropriate controls. For example, for a given immunoglobulin sequence we do not have a set of variants with identical amino acid translation that we can expose to somatic hypermutation to compare their relative propensity to undergo amino acid replacements. Thus, previous studies on differential mutability of FR and CDR of immunoglobulin had to use large sequence sets, and no information could be derived on the level of individual sequences. Such is the case with the serine codon sets segregation pointed out by Wagner et al. (1995), and, more generally, with the segregation of more mutable codons in CDRs (1997). Moreover, these studies were restricted to mutability of in-frame triplets, and did not use all the mutational information that might be present in the database of non-selected mutations. All these problems are circumvented in the approach that I introduced above.

The information concerning one sequence may be visually represented in the following way. Each variant sequence corresponds to a point in the plane of FR/CDR mutability. By taking the minimum and maximum FR and CDR mutability achieved by the variant sequences, one can isolate a rectangle in this plane. I divided this rectangle into 100 by 100 smaller rectangles, which I call bins. If one counts the number of variant sequences falling into each of the bins, one obtains a two-dimensional histogram. By sectioning the two-dimensional histogram at the level of 1, 10, and 100 sequences per bin, one obtains the contour plots that are shown in the figures. The outmost contour line corresponds to densities of 1 sequence per bin, and the innermost one to densities of 100 sequences per bin.

The possibility of analyzing the mutability of individual sequences allowed me to attempt a more detailed understanding of the selection pressures that operate on individual genes. For example, take a light chain sequence, V$_\kappa$A2, the predominant germline gene used in the immune response to Haemophilus influenzae in humans (1998). Its predicted average FR and CDR nucleotide mutabilities are 1.2%, and 1.57%, respectively, thus a CDR nucleotide is expected to undergo a replacement mutations 1.3 times more often than a FR nucleotide.

  \begin{figure}% latex2html id marker 1154
\centerline{\epsfxsize=8cm \epsfbox{VK...
...wn at 1, 10, and 100
sequences. The germline gene is shown in red.}\end{figure}

Fig. [*] shows a contour plot of the distribution of three sets of variants of this sequence in the FR-CDR mutability space. The set of sequences with similar FR/CDR nucleotide composition is represented in black, the set of sequences with similar codon composition in blue, and the translationally invariant set in green. The position of the observed germline sequence is represented by the red dot. Of all the artificial data sets, the set with identical codon composition has a mean FR/CDR mutability that is most similar to that of the germline sequence. This allows me to conclude that the mutability pattern of the observed VKA2 is best predicted by its codon composition. The CDR mutability is slightly higher than one could predict from its nucleotide sequence, and I find codon usage bias consistent with low FR and high CDR mutability. Insel and Varade (1998) found a low number of mutations in the complementarity-determining regions of VKA2. Their analysis concluded that this was not due to intrinsically low propensity of this sequence, and conjectured that mutations must be negatively selected. My results support this hypothesis, as I also find that V$_\kappa$A2 CDRs do not have a low propensity to undergo somatic mutation.


  \begin{figure}% latex2html id marker 1162
\centerline{\epsfxsize=8cm \epsfbox{VH...
...wn at 1, 10, and
100 sequences. The germline gene is shown in red.}\end{figure}

I will take another example, of a VH germline sequence, VH1-18. The predicted average replacement mutabilities of a FR and a CDR nucleotide from this sequence are 1.28%, and 1.85%, respectively. A CDR nucleotide is thus 1.45 times more likely to undergo a replacement mutation than a FR nucleotide. As shown in Fig. [*], the difference in composition between FR and CDR is reflected in their mutability. All variant sets have, on average, higher CDR than FR nucleotide mutability. The amino acid sequence of VH1-18 is such that, regardless of specific codon usage, most of the translationally neutral variants of this sequence would have higher replacement mutability of CDR nucleotides than of FR nucleotides. Moreover, the specific codons that are used in the CDRs would be extremely mutable, regardless of their sequentialization.


  \begin{figure}% latex2html id marker 1168
\centerline{\epsfxsize=8cm \epsfbox{VH...
...wn at 1, 10, and
100 sequences. The germline gene is shown in red.}\end{figure}

The picture changed dramatically when I analyzed a VH2 family gene, VH2-26 (Fig. [*]). The FR and CDR mutability values of this sequence, 1.24% and 1.32%, respectively, are well predicted by its nucleotide composition. Moreover, the frequencies of the different codons used in this sequence seem to be well predicted by the nucleotide composition of the sequence. The amino acid sequence of VH2-26, on the other hand would lead, on average, to lower CDR than FR nucleotide mutability. Thus, as was the case with V$_\kappa$A2 and VH1-18, the mutability of VH2-26 is best predicted by its codon composition. In contrast with the previous two sequences though, the amino acid sequence of VH2-26 would result in lower CDR than FR mutability if the codon usage was unbiased. Thus, for this sequence, the codon bias is crucial for the CDR-FR mutability difference.


  \begin{figure}% latex2html id marker 1174
\centerline{\epsfxsize=8cm \epsfbox{VH...
...wn at 1, 10, and
100 sequences. The germline gene is shown in red.}\end{figure}

Insel and Varade (1998) analyzed the pattern of somatic mutations in non-productive rearrangements of VH6-1, the only member of the 6th VH family, and argued that the CDRs of this sequence are inherently more mutable. My analysis confirms this result (Fig. [*]). The average replacement mutability of a CDR nucleotide in VH6-1 (1.77%) is 1.6 times higher than the average replacement mutability of a FR nucleotide (1.1%). Moreover, the CDR amino acid sequence would have high replacement mutability regardless of codon usage. The codon usage, however, further enhances the CDR-FR mutability difference, mainly through low FR mutability. I thus conclude that selection pressure for low FR mutability operates on VH6-1.

The results of this type of analysis on all human VH sequences are summarized in Table [*], which lists the normalized rank of the observed, germline, sequence, among the 105 variants of each type. I denoted by $\mu_F$ the average FR mutability of a sequence, by $\mu_C$ the average CDR mutability, and by $\mu_F/\mu_C$ the ratio of these two quantities. Note the significant codon usage bias of VH6-1, leading to low predicted FR mutability.


 
Table 3.1: Normalized ranks of individual VH sequences.
 
  Nucleotide permutations   Codon permutations   Translationally
Gene         invariant
  $\mu_F$ $\mu_C$ $\mu_C$/$\mu_F$   $\mu_F$ $\mu_C$ $\mu_C$/$\mu_F$   $\mu_F$ $\mu_C$ $\mu_C$/$\mu_F$
human VH1 genes
IGHV1-18 0.783 0.976 0.914   0.396 0.074 0.134   0.154 0.536 0.729
IGHV1-2 0.725 0.776 0.662   0.582 0.168 0.17   0.093 0.107 0.331
IGHV1-24 0.617 0.225 0.207   0.236 0.11 0.184   0.0745 0.6 0.797
IGHV1-3 0.861 0.92 0.783   0.639 0.078 0.077   0.287 0.478 0.598
IGHV1-45 0.896 0.84 0.66   0.735 0.241 0.177   0.351 0.118 0.175
IGHV1-46 0.805 0.969 0.898   0.134 0.359 0.566   0.191 0.709 0.824
IGHV1-58 0.815 0.779 0.637   0.474 0.477 0.485   0.221 0.552 0.65
IGHV1-69 0.878 0.885 0.731   0.49 0.561 0.563   0.19 0.787 0.863
IGHV1-8 0.837 0.47 0.313   0.257 0.07 0.142   0.154 0.137 0.292
IGHV1-f 0.931 0.8 0.547   0.66 0.187 0.163   0.344 0.529 0.61
human VH2 genes
IGHV2-26 0.549 0.177 0.193   0.445 0.04 0.082   0.034 0.783 0.943
IGHV2-5 0.463 0.484 0.508   0.548 0.317 0.311   0.014 0.742 0.94
IGHV2-70 0.426 0.571 0.599   0.365 0.243 0.343   0.007 0.907 0.991
human VH3 genes
IGHV3-11 0.74 0.95 0.872   0.564 0.762 0.717   0.055 0.922 0.977
IGHV3-13 0.829 0.866 0.725   0.906 0.477 0.284   0.42 0.776 0.79
IGHV3-15 0.592 0.554 0.498   0.415 0.192 0.238   0.24 0.61 0.708
IGHV3-16 0.159 0.216 0.397   0.472 0.286 0.307   0.019 0.579 0.826
IGHV3-19 0.227 0.221 0.364   0.507 0.294 0.308   0.061 0.564 0.76
IGHV3-20 0.178 0.363 0.555   0.516 0.07 0.081   0.018 0.6 0.895
IGHV3-21 0.481 0.995 0.987   0.504 0.787 0.752   0.03 0.991 0.999
IGHV3-23 0.815 0.953 0.858   0.721 0.322 0.256   0.126 0.941 0.971
IGHV3-30.3 0.766 0.849 0.687   0.678 0.438 0.371   0.088 0.956 0.987
IGHV3-30 0.679 0.752 0.634   0.608 0.576 0.515   0.097 0.938 0.975
IGHV3-33 0.594 0.862 0.789   0.606 0.542 0.488   0.026 0.916 0.984


 
Table 3.1: Normalized ranks of individual VH sequences (continued).
  Nucleotide permutations   Codon permutations   Translationally
Gene         invariant
  $\mu_F$ $\mu_C$ $\mu_C$/$\mu_F$   $\mu_F$ $\mu_C$ $\mu_C$/$\mu_F$   $\mu_F$ $\mu_C$ $\mu_C$/$\mu_F$
IGHV3-35 0.254 0.211 0.342   0.367 0.279 0.345   0.037 0.571 0.799
IGHV3-38 0.523 0.819 0.785   0.37 0.224 0.294   0.039 0.856 0.954
IGHV3-43 0.388 0.731 0.753   0.467 0.367 0.393   0.058 0.867 0.961
IGHV3-47 0.614 0.981 0.952   0.434 0.715 0.727   0.131 0.986 0.994
IGHV3-48 0.76 0.995 0.966   0.663 0.924 0.864   0.136 0.995 0.997
IGHV3-49 0.627 0.888 0.815   0.678 0.096 0.09   0.204 0.767 0.854
IGHV3-53 0.495 0.983 0.97   0.341 0.527 0.605   0.022 0.878 0.975
IGHV3-64 0.77 0.955 0.872   0.734 0.447 0.356   0.177 0.955 0.971
IGHV3-66 0.629 0.978 0.944   0.395 0.479 0.533   0.059 0.899 0.967
IGHV3-7 0.741 0.759 0.603   0.737 0.516 0.411   0.071 0.89 0.962
IGHV3-72 0.197 0.966 0.977   0.269 0.686 0.77   0.038 0.893 0.979
IGHV3-73 0.504 0.967 0.943   0.656 0.53 0.456   0.022 0.863 0.969
IGHV3-74 0.555 0.957 0.92   0.657 0.635 0.552   0.063 0.971 0.992
IGHV3-9 0.306 0.813 0.848   0.639 0.258 0.226   0.036 0.966 0.994
IGHV3-d 0.574 0.742 0.692   0.383 0.226 0.291   0.124 0.82 0.909
human VH4 genes
IGHV4-28 0.415 0.936 0.923   0.293 0.357 0.489   0.001 0.59 0.933
IGHV4-301 0.625 0.964 0.923   0.35 0.005 0.049   0.031 0.472 0.779
IGHV4-302 0.545 0.79 0.751   0.228 0.034 0.121   0.019 0.493 0.775
IGHV4-304 0.477 0.971 0.952   0.254 0.009 0.095   0.01 0.63 0.912
IGHV4-31 0.64 0.962 0.918   0.336 0.006 0.05   0.041 0.466 0.76
IGHV4-34 0.359 0.837 0.851   0.146 0.224 0.427   0.006 0.686 0.931
IGHV4-39 0.424 0.993 0.984   0.312 0.459 0.577   0.006 0.806 0.976
IGHV4-4 0.245 0.985 0.986   0.294 0.7 0.767   0.001 0.644 0.961
IGHV4-59 0.308 0.977 0.975   0.236 0.22 0.406   0.003 0.543 0.907
IGHV4-61 0.31 0.992 0.99   0.227 0.125 0.327   0.003 0.76 0.972
IGHV4-b 0.31 0.987 0.985   0.127 0.192 0.461   0.004 0.685 0.953
human VH5 genes
IGHV5-51 0.791 0.996 0.977   0.599 0.934 0.903   0.065 0.917 0.975
IGHV5-a 0.888 0.997 0.967   0.596 0.938 0.901   0.21 0.975 0.981
human VH6 gene
IGHV6-1 0.041 0.966 0.994   0.214 0.356 0.573   0.008 0.855 0.987
human VH7 genes
IGHV7-41 0.453 0.886 0.869   0.24 0.131 0.264   0.012 0.193 0.601
IGHV7-81 0.893 0.515 0.319   0.522 0.005 0.014   0.26 0.015 0.043


next up previous
Next: Contribution of nucleotide composition, VH Up: Somatic hypermutation targets the Previous: All human immunoglobulin V-region
Mihaela Oprea
1999-04-11