next up previous
Next: Somatic hypermutation targets the Up: Shape space coverage with Previous: The fitness and structure

   
Implications for random antibody libraries

Although I do not have a formal proof, it seems that evolving the antibody libraries allows us to reach higher fitness values than we would have with random libraries, though the functional form of the dependency between fitness and library size does not change. Let us then explore what this functional form might be for a random library, under assumptions about the fitness of individual antigen-antibody interactions that may have biological relevance.

I assume again the random energy model, with all antibody-antigen interactions being characterized by a bond strength distributed according to a density function, g. The cumulative distribution of a single bond strength will be then denoted by G. For example, assume that the bond strength of an antigen-antibody interaction is exponentially distributed, meaning that most interactions are of low energy, higher energy interactions being progressively rare. Then $G(x) = 1 - e^{-\alpha x},$ with $\alpha$ constant. Correspondingly, $G^{-1}(x) = - \frac{1}{\alpha} \log(1-x).$ Let us denote $y^{\frac{1}{A}}$ by z. Then y = zA, $\frac{dy}{dz} = A z^{A-1}$, and the average fitness over the complete pathogen space will be given by

\begin{displaymath}f = - \frac{1}{\alpha} \int_0^1 A z^{A-1} \log(1 - z) dz =
\f...
...{\alpha} \left(\frac{d}{dz} \log(\Gamma(A+1)) + \gamma\right), \end{displaymath}

which is approximated by $\frac{1}{\alpha} (\log(A) + \gamma),$ with $\gamma$ being Euler's constant. Thus, in the case where antigen-antibody bond strengths are exponentially distributed, the fitness of a random antibody library scales logarithmically with the size of the library.

We may also consider a long-tailed distribution, such as a power law $G(x) = 1 - x^{-\alpha}$, with $\alpha$ constant. The inverse of this function is $G^{-1}(x) = (1 - x)^{\frac{-1}{\alpha}}$. With the same notation, $z = y^{\frac{1}{A}}$, the average fitness over the complete pathogen space is given by

\begin{displaymath}f = \int_0^1 A z^{A-1} (1 - z)^{\frac{-1}{\alpha}} dz =
\frac...
...) \Gamma(1 -
\frac{1}{\alpha})}{\Gamma(A+1-\frac{1}{\alpha})}.\end{displaymath}

Expanding $\frac{\Gamma(A+1)}{\Gamma(A+1 - \frac{1}{\alpha})},$ we obtain for the average fitness

\begin{displaymath}f = A^{\frac{1}{\alpha}} \left(1 - \frac{1}{\alpha}\left(1 -
...
...pha}\right) \frac{1}{2 A} + O\left(\frac{1}{A^2}\right)\right).\end{displaymath}

Summarizing, when the bond strengths are exponentially distributed, fitness grows logarithmically with the antibody library size; when the distribution is Gaussian, with faster than exponential tail, the fitness grows more slowly than logarithmically; and for a power law, the fitness is also a power law of the library size. The average fitness, then, as a function of the library size, has a functional form that is the inverse of the density function for the bond strength between an antibody and an antigen. We can use this framework to treat any distribution of antibody-pathogen bond strengths, as more data on this type of molecular interactions becomes available. This is an important feature, as the shape-space based models (and the results that depend on them) have often been criticized for being too restricted, and possibly unrealistic for analyzing biological data.

What may we conclude from this study? It is so far unclear what role the germline diversity plays in the generation of the immune repertoire. Based on the results that I presented here, I argue that adding more and more antibodies to the germline-encoded repertoire is unlikely to improve by a significant amount the survival probability of the host in an unbiased, very large, pathogenic environment. Clearly, with a logarithmic increase in fitness as a function of the antibody library size, germline diversity is unlikely to have a crucial contribution to the immune repertoire of an individual. This may well be a reason why the V region libraries in various species do not seem to number more than approximately 100 genes. But if the selection pressure for increasing library size is small, what would keep evolution from producing even smaller libraries than those that we observe? One possible explanation is that there is a recognition threshold in the matching between antibodies and pathogens below which recognition does not occur. In this case, some minimal number of antibodies would be required to ensure that at least one has minimal affinity for any given pathogen. Alternatively, one may envisage the pathogen set structured as a distribution of clusters such that different genes in the library would reflect different clusters of pathogens. The fine-tuning of the affinity of antibodies is realized through somatic hypermutation during the first encounter of the organism with that specific pathogen. This last process is known to be very efficient, the affinity of a pathogen-specific antibody may increase by as much as three orders of magnitude within a time span of approximately a month. The hypothesis that the composition of the germline antibody library reflects the commonly encountered pathogens has been proposed for different reasons by Cohn and Langman (1990). It has been so far difficult to test. Extensive data on the V genes that are involved in immune responses to virulent pathogens is not yet available. However, in some well-studied cases, such as Hemophilus influenzae in humans (1992), or Streptococcus pneumoniae in mice (1974), preferential involvement of a small number of V region genes (and light-heavy chain combinations) has been reported, adding credence to the proposed hypothesis.

Recently, Davis et al. (1998) proposed that the diversity of the repertoire for T cell, as well as B cell receptors, resides in the third complementarity determining region, CDR3. In contrast to CDR1 and CDR2, which are exclusively encoded by the V region, CDR3 gets contributions from the J (and D in the case of the heavy chain, or $TCR_\beta$) region, as well as from the non-templated nucleotide addition process. These authors proposed that CDR3 is sufficient for an initial binding of the immune receptor to the antigen, and that somatic mutation of CDR1 and CDR2 further improves that affinity/specificity of the interaction. This is an intriguing hypothesis, as it shifts the emphasis from germline and, somewhat, combinatorial diversity to processes that are largely responsible for creating random binding sites. These are end-processing of the gene fragments, and non-templated nucleotide addition. On the other hand, there are indications that these mechanisms are considerably restricted in newborns. Preferential rearrangement of certain combinations of V-D and D-J gene fragments results in a much more restricted repertoire, which is essentially germline-encoded (). It is this repertoire that is crucial for the survival and reproduction of the individual. Thus, although the CDR3 diversity might be sufficient for a diverse antibody repertoire, the hypothesis that I favor stresses the role of CDR1 and CDR2 antigen binding regions in the survival of the organism, particularly in the neonatal stage. Moreover, it is now clear that not all organisms have a large repertoire of CDR3 regions. As mentioned before, in sharks, V-D-J gene fragments are sometimes already linked in the germline, without any possibility of CDR3 diversification. In this situation, we also expect that the germline-encoded gene fragments have the determinant role in covering the species-specific set of pathogens.


next up previous
Next: Somatic hypermutation targets the Up: Shape space coverage with Previous: The fitness and structure
Mihaela Oprea
1999-04-11