Next: Contribution of nucleotide composition, VH
Up: Somatic hypermutation targets the
Previous: All human immunoglobulin V-region
One of the problems that limits the power of analysis of gene
sequences is the lack of appropriate controls. For example, for a
given immunoglobulin sequence we do not have a set of variants with
identical amino acid translation that we can expose to somatic
hypermutation to compare their relative propensity to undergo amino
acid replacements. Thus, previous studies on differential mutability
of FR and CDR of immunoglobulin had to use large sequence sets, and no
information could be derived on the level of individual
sequences. Such is the case with the serine codon sets segregation
pointed out by Wagner et al. (1995), and, more generally, with the
segregation of more mutable codons in CDRs
(1997). Moreover, these studies were restricted to
mutability of in-frame triplets, and did not use all the mutational
information that might be present in the database of non-selected
mutations. All these problems are circumvented in the approach that I
introduced above.
The information concerning one sequence may be visually represented in
the following way. Each variant sequence corresponds to a point in the
plane of FR/CDR mutability. By taking the minimum and maximum FR and
CDR mutability achieved by the variant sequences, one can isolate a
rectangle in this plane. I divided this rectangle into 100 by 100
smaller rectangles, which I call bins. If one counts the number of
variant sequences falling into each of the bins, one obtains a
two-dimensional histogram. By sectioning the two-dimensional histogram
at the level of 1, 10, and 100 sequences per bin, one obtains the
contour plots that are shown in the figures. The outmost contour line
corresponds to densities of 1 sequence per bin, and the innermost one
to densities of 100 sequences per bin.
The possibility of analyzing the mutability of individual sequences
allowed me to attempt a more detailed understanding of the selection
pressures that operate on individual genes. For example, take a light
chain sequence, V
A2, the predominant germline gene used in the
immune response to Haemophilus influenzae in humans
(1998). Its predicted average FR and CDR nucleotide
mutabilities are 1.2%, and 1.57%, respectively, thus a CDR
nucleotide is expected to undergo a replacement mutations 1.3 times
more often than a FR nucleotide.
Fig.
shows a contour plot of the distribution of three sets
of variants of this sequence in the FR-CDR mutability space. The set
of sequences with similar FR/CDR nucleotide composition is represented
in black, the set of sequences with similar codon composition in blue,
and the translationally invariant set in green. The position of the
observed germline sequence is represented by the red dot. Of all the
artificial data sets, the set with identical codon composition has a
mean FR/CDR mutability that is most similar to that of the germline
sequence. This allows me to conclude that the mutability pattern of
the observed VKA2 is best predicted by its codon composition. The
CDR mutability is slightly higher than one could predict from its
nucleotide sequence, and I find codon usage bias consistent with low
FR and high CDR mutability. Insel and Varade (1998) found a low number of
mutations in the complementarity-determining regions of VKA2. Their
analysis concluded that this was not due to intrinsically low
propensity of this sequence, and conjectured that mutations must be
negatively selected. My results support this hypothesis, as I also
find that V
A2 CDRs do not have a low propensity to undergo
somatic mutation.
I will take another example, of a VH germline sequence, VH1-18. The
predicted average replacement mutabilities of a FR and a CDR
nucleotide from this sequence are 1.28%, and 1.85%, respectively. A
CDR nucleotide is thus 1.45 times more likely to undergo a replacement
mutation than a FR nucleotide. As shown in Fig.
, the
difference in composition between FR and CDR is reflected in their
mutability. All variant sets have, on average, higher CDR than FR
nucleotide mutability. The amino acid sequence of VH1-18 is such that,
regardless of specific codon usage, most of the translationally
neutral variants of this sequence would have higher replacement
mutability of CDR nucleotides than of FR nucleotides. Moreover, the
specific codons that are used in the CDRs would be extremely mutable,
regardless of their sequentialization.
The picture changed dramatically when I analyzed a VH2 family gene,
VH2-26 (Fig.
). The FR and CDR mutability values of this
sequence, 1.24% and 1.32%, respectively, are well predicted by its
nucleotide composition. Moreover, the frequencies of the different
codons used in this sequence seem to be well predicted by the
nucleotide composition of the sequence. The amino acid sequence of
VH2-26, on the other hand would lead, on average, to lower CDR than FR
nucleotide mutability. Thus, as was the case with V
A2 and
VH1-18, the mutability of VH2-26 is best predicted by its codon
composition. In contrast with the previous two sequences though, the
amino acid sequence of VH2-26 would result in lower CDR than FR
mutability if the codon usage was unbiased. Thus, for this sequence,
the codon bias is crucial for the CDR-FR mutability difference.
Insel and Varade (1998) analyzed the pattern of somatic mutations in
non-productive rearrangements of VH6-1, the only member of the
6th VH family, and argued that the CDRs of this sequence are
inherently more mutable. My analysis confirms this result (Fig.
). The average replacement mutability of a CDR nucleotide
in VH6-1 (1.77%) is 1.6 times higher than the average replacement
mutability of a FR nucleotide (1.1%). Moreover, the CDR amino acid
sequence would have high replacement mutability regardless of codon
usage. The codon usage, however, further enhances the CDR-FR
mutability difference, mainly through low FR mutability. I thus
conclude that selection pressure for low FR mutability operates on
VH6-1.
The results of this type of analysis on all human VH sequences are
summarized in Table
, which lists the
normalized rank of the observed, germline, sequence, among the 105
variants of each type. I denoted by
the average FR mutability
of a sequence, by
the average CDR mutability, and by
the ratio of these two quantities. Note the
significant codon usage bias of VH6-1, leading to low predicted FR
mutability.
Table 3.1:
Normalized ranks of individual VH sequences.
| |
Nucleotide permutations |
|
Codon permutations |
|
Translationally |
| Gene |
|
|
|
|
invariant |
| |
 |
 |
/ |
|
 |
 |
/ |
|
 |
 |
/ |
| human VH1 genes |
| IGHV1-18 |
0.783 |
0.976 |
0.914 |
|
0.396 |
0.074 |
0.134 |
|
0.154 |
0.536 |
0.729 |
| IGHV1-2 |
0.725 |
0.776 |
0.662 |
|
0.582 |
0.168 |
0.17 |
|
0.093 |
0.107 |
0.331 |
| IGHV1-24 |
0.617 |
0.225 |
0.207 |
|
0.236 |
0.11 |
0.184 |
|
0.0745 |
0.6 |
0.797 |
| IGHV1-3 |
0.861 |
0.92 |
0.783 |
|
0.639 |
0.078 |
0.077 |
|
0.287 |
0.478 |
0.598 |
| IGHV1-45 |
0.896 |
0.84 |
0.66 |
|
0.735 |
0.241 |
0.177 |
|
0.351 |
0.118 |
0.175 |
| IGHV1-46 |
0.805 |
0.969 |
0.898 |
|
0.134 |
0.359 |
0.566 |
|
0.191 |
0.709 |
0.824 |
| IGHV1-58 |
0.815 |
0.779 |
0.637 |
|
0.474 |
0.477 |
0.485 |
|
0.221 |
0.552 |
0.65 |
| IGHV1-69 |
0.878 |
0.885 |
0.731 |
|
0.49 |
0.561 |
0.563 |
|
0.19 |
0.787 |
0.863 |
| IGHV1-8 |
0.837 |
0.47 |
0.313 |
|
0.257 |
0.07 |
0.142 |
|
0.154 |
0.137 |
0.292 |
| IGHV1-f |
0.931 |
0.8 |
0.547 |
|
0.66 |
0.187 |
0.163 |
|
0.344 |
0.529 |
0.61 |
| human VH2 genes |
| IGHV2-26 |
0.549 |
0.177 |
0.193 |
|
0.445 |
0.04 |
0.082 |
|
0.034 |
0.783 |
0.943 |
| IGHV2-5 |
0.463 |
0.484 |
0.508 |
|
0.548 |
0.317 |
0.311 |
|
0.014 |
0.742 |
0.94 |
| IGHV2-70 |
0.426 |
0.571 |
0.599 |
|
0.365 |
0.243 |
0.343 |
|
0.007 |
0.907 |
0.991 |
| human VH3 genes |
| IGHV3-11 |
0.74 |
0.95 |
0.872 |
|
0.564 |
0.762 |
0.717 |
|
0.055 |
0.922 |
0.977 |
| IGHV3-13 |
0.829 |
0.866 |
0.725 |
|
0.906 |
0.477 |
0.284 |
|
0.42 |
0.776 |
0.79 |
| IGHV3-15 |
0.592 |
0.554 |
0.498 |
|
0.415 |
0.192 |
0.238 |
|
0.24 |
0.61 |
0.708 |
| IGHV3-16 |
0.159 |
0.216 |
0.397 |
|
0.472 |
0.286 |
0.307 |
|
0.019 |
0.579 |
0.826 |
| IGHV3-19 |
0.227 |
0.221 |
0.364 |
|
0.507 |
0.294 |
0.308 |
|
0.061 |
0.564 |
0.76 |
| IGHV3-20 |
0.178 |
0.363 |
0.555 |
|
0.516 |
0.07 |
0.081 |
|
0.018 |
0.6 |
0.895 |
| IGHV3-21 |
0.481 |
0.995 |
0.987 |
|
0.504 |
0.787 |
0.752 |
|
0.03 |
0.991 |
0.999 |
| IGHV3-23 |
0.815 |
0.953 |
0.858 |
|
0.721 |
0.322 |
0.256 |
|
0.126 |
0.941 |
0.971 |
| IGHV3-30.3 |
0.766 |
0.849 |
0.687 |
|
0.678 |
0.438 |
0.371 |
|
0.088 |
0.956 |
0.987 |
| IGHV3-30 |
0.679 |
0.752 |
0.634 |
|
0.608 |
0.576 |
0.515 |
|
0.097 |
0.938 |
0.975 |
| IGHV3-33 |
0.594 |
0.862 |
0.789 |
|
0.606 |
0.542 |
0.488 |
|
0.026 |
0.916 |
0.984 |
|
Table 3.1:
Normalized ranks of individual VH sequences (continued).
| |
Nucleotide permutations |
|
Codon permutations |
|
Translationally |
| Gene |
|
|
|
|
invariant |
| |
 |
 |
/ |
|
 |
 |
/ |
|
 |
 |
/ |
| IGHV3-35 |
0.254 |
0.211 |
0.342 |
|
0.367 |
0.279 |
0.345 |
|
0.037 |
0.571 |
0.799 |
| IGHV3-38 |
0.523 |
0.819 |
0.785 |
|
0.37 |
0.224 |
0.294 |
|
0.039 |
0.856 |
0.954 |
| IGHV3-43 |
0.388 |
0.731 |
0.753 |
|
0.467 |
0.367 |
0.393 |
|
0.058 |
0.867 |
0.961 |
| IGHV3-47 |
0.614 |
0.981 |
0.952 |
|
0.434 |
0.715 |
0.727 |
|
0.131 |
0.986 |
0.994 |
| IGHV3-48 |
0.76 |
0.995 |
0.966 |
|
0.663 |
0.924 |
0.864 |
|
0.136 |
0.995 |
0.997 |
| IGHV3-49 |
0.627 |
0.888 |
0.815 |
|
0.678 |
0.096 |
0.09 |
|
0.204 |
0.767 |
0.854 |
| IGHV3-53 |
0.495 |
0.983 |
0.97 |
|
0.341 |
0.527 |
0.605 |
|
0.022 |
0.878 |
0.975 |
| IGHV3-64 |
0.77 |
0.955 |
0.872 |
|
0.734 |
0.447 |
0.356 |
|
0.177 |
0.955 |
0.971 |
| IGHV3-66 |
0.629 |
0.978 |
0.944 |
|
0.395 |
0.479 |
0.533 |
|
0.059 |
0.899 |
0.967 |
| IGHV3-7 |
0.741 |
0.759 |
0.603 |
|
0.737 |
0.516 |
0.411 |
|
0.071 |
0.89 |
0.962 |
| IGHV3-72 |
0.197 |
0.966 |
0.977 |
|
0.269 |
0.686 |
0.77 |
|
0.038 |
0.893 |
0.979 |
| IGHV3-73 |
0.504 |
0.967 |
0.943 |
|
0.656 |
0.53 |
0.456 |
|
0.022 |
0.863 |
0.969 |
| IGHV3-74 |
0.555 |
0.957 |
0.92 |
|
0.657 |
0.635 |
0.552 |
|
0.063 |
0.971 |
0.992 |
| IGHV3-9 |
0.306 |
0.813 |
0.848 |
|
0.639 |
0.258 |
0.226 |
|
0.036 |
0.966 |
0.994 |
| IGHV3-d |
0.574 |
0.742 |
0.692 |
|
0.383 |
0.226 |
0.291 |
|
0.124 |
0.82 |
0.909 |
| human VH4 genes |
| IGHV4-28 |
0.415 |
0.936 |
0.923 |
|
0.293 |
0.357 |
0.489 |
|
0.001 |
0.59 |
0.933 |
| IGHV4-301 |
0.625 |
0.964 |
0.923 |
|
0.35 |
0.005 |
0.049 |
|
0.031 |
0.472 |
0.779 |
| IGHV4-302 |
0.545 |
0.79 |
0.751 |
|
0.228 |
0.034 |
0.121 |
|
0.019 |
0.493 |
0.775 |
| IGHV4-304 |
0.477 |
0.971 |
0.952 |
|
0.254 |
0.009 |
0.095 |
|
0.01 |
0.63 |
0.912 |
| IGHV4-31 |
0.64 |
0.962 |
0.918 |
|
0.336 |
0.006 |
0.05 |
|
0.041 |
0.466 |
0.76 |
| IGHV4-34 |
0.359 |
0.837 |
0.851 |
|
0.146 |
0.224 |
0.427 |
|
0.006 |
0.686 |
0.931 |
| IGHV4-39 |
0.424 |
0.993 |
0.984 |
|
0.312 |
0.459 |
0.577 |
|
0.006 |
0.806 |
0.976 |
| IGHV4-4 |
0.245 |
0.985 |
0.986 |
|
0.294 |
0.7 |
0.767 |
|
0.001 |
0.644 |
0.961 |
| IGHV4-59 |
0.308 |
0.977 |
0.975 |
|
0.236 |
0.22 |
0.406 |
|
0.003 |
0.543 |
0.907 |
| IGHV4-61 |
0.31 |
0.992 |
0.99 |
|
0.227 |
0.125 |
0.327 |
|
0.003 |
0.76 |
0.972 |
| IGHV4-b |
0.31 |
0.987 |
0.985 |
|
0.127 |
0.192 |
0.461 |
|
0.004 |
0.685 |
0.953 |
| human VH5 genes |
| IGHV5-51 |
0.791 |
0.996 |
0.977 |
|
0.599 |
0.934 |
0.903 |
|
0.065 |
0.917 |
0.975 |
| IGHV5-a |
0.888 |
0.997 |
0.967 |
|
0.596 |
0.938 |
0.901 |
|
0.21 |
0.975 |
0.981 |
| human VH6 gene |
| IGHV6-1 |
0.041 |
0.966 |
0.994 |
|
0.214 |
0.356 |
0.573 |
|
0.008 |
0.855 |
0.987 |
| human VH7 genes |
| IGHV7-41 |
0.453 |
0.886 |
0.869 |
|
0.24 |
0.131 |
0.264 |
|
0.012 |
0.193 |
0.601 |
| IGHV7-81 |
0.893 |
0.515 |
0.319 |
|
0.522 |
0.005 |
0.014 |
|
0.26 |
0.015 |
0.043 |
|
Next: Contribution of nucleotide composition, VH
Up: Somatic hypermutation targets the
Previous: All human immunoglobulin V-region
Mihaela Oprea
1999-04-11