I assume again the random energy model, with all antibody-antigen
interactions being characterized by a bond strength distributed
according to a density function, g. The cumulative distribution of a
single bond strength will be then denoted by G. For example, assume
that the bond strength of an antigen-antibody interaction is
exponentially distributed, meaning that most interactions are of low
energy, higher energy interactions being progressively rare. Then
with
constant. Correspondingly,
Let us denote
by z. Then y = zA,
,
and the average fitness over the complete pathogen space will be given
by
We may also consider a long-tailed distribution, such as a power law
,
with
constant. The inverse of
this function is
.
With the
same notation,
,
the
average fitness over the complete pathogen space is given by
Summarizing, when the bond strengths are exponentially distributed, fitness grows logarithmically with the antibody library size; when the distribution is Gaussian, with faster than exponential tail, the fitness grows more slowly than logarithmically; and for a power law, the fitness is also a power law of the library size. The average fitness, then, as a function of the library size, has a functional form that is the inverse of the density function for the bond strength between an antibody and an antigen. We can use this framework to treat any distribution of antibody-pathogen bond strengths, as more data on this type of molecular interactions becomes available. This is an important feature, as the shape-space based models (and the results that depend on them) have often been criticized for being too restricted, and possibly unrealistic for analyzing biological data.
What may we conclude from this study? It is so far unclear what role the germline diversity plays in the generation of the immune repertoire. Based on the results that I presented here, I argue that adding more and more antibodies to the germline-encoded repertoire is unlikely to improve by a significant amount the survival probability of the host in an unbiased, very large, pathogenic environment. Clearly, with a logarithmic increase in fitness as a function of the antibody library size, germline diversity is unlikely to have a crucial contribution to the immune repertoire of an individual. This may well be a reason why the V region libraries in various species do not seem to number more than approximately 100 genes. But if the selection pressure for increasing library size is small, what would keep evolution from producing even smaller libraries than those that we observe? One possible explanation is that there is a recognition threshold in the matching between antibodies and pathogens below which recognition does not occur. In this case, some minimal number of antibodies would be required to ensure that at least one has minimal affinity for any given pathogen. Alternatively, one may envisage the pathogen set structured as a distribution of clusters such that different genes in the library would reflect different clusters of pathogens. The fine-tuning of the affinity of antibodies is realized through somatic hypermutation during the first encounter of the organism with that specific pathogen. This last process is known to be very efficient, the affinity of a pathogen-specific antibody may increase by as much as three orders of magnitude within a time span of approximately a month. The hypothesis that the composition of the germline antibody library reflects the commonly encountered pathogens has been proposed for different reasons by Cohn and Langman (1990). It has been so far difficult to test. Extensive data on the V genes that are involved in immune responses to virulent pathogens is not yet available. However, in some well-studied cases, such as Hemophilus influenzae in humans (1992), or Streptococcus pneumoniae in mice (1974), preferential involvement of a small number of V region genes (and light-heavy chain combinations) has been reported, adding credence to the proposed hypothesis.
Recently, Davis et al. (1998) proposed that the diversity of the
repertoire for T cell, as well as B cell receptors, resides in
the third complementarity determining region, CDR3. In contrast to
CDR1 and CDR2, which are exclusively encoded by the V region, CDR3
gets contributions from the J (and D in the case of the heavy
chain, or
)
region, as well as from the non-templated
nucleotide addition process. These authors proposed that CDR3 is
sufficient for an initial binding of the immune receptor to the
antigen, and that somatic mutation of CDR1 and CDR2 further improves
that affinity/specificity of the interaction. This is an intriguing
hypothesis, as it shifts the emphasis from germline and, somewhat,
combinatorial diversity to processes that are largely responsible for
creating random binding sites. These are end-processing of the gene
fragments, and non-templated nucleotide addition. On the other hand,
there are indications that these mechanisms are considerably
restricted in newborns. Preferential rearrangement of certain
combinations of V-D and D-J gene fragments results in a much more
restricted repertoire, which is essentially germline-encoded
(). It is this repertoire that is
crucial for the survival and reproduction of the individual. Thus,
although the CDR3 diversity might be sufficient for a diverse antibody
repertoire, the hypothesis that I favor stresses the role
of CDR1 and CDR2 antigen binding regions in the survival of the
organism, particularly in the neonatal stage. Moreover, it is now
clear that not all organisms have a large repertoire of CDR3
regions. As mentioned before, in sharks, V-D-J gene fragments are
sometimes already linked in the germline, without any possibility of
CDR3 diversification. In this situation, we also expect that the
germline-encoded gene fragments have the determinant role in covering
the species-specific set of pathogens.