Punctuated Equilibrium Due to Epistasis in
Simulated Populations

  1. Daniel Hillis
    DRAFT, March 7,1988

Abstract


Simulated populations evolving under steady selective pressure show sudden
unidirectional changes in the frequency of certain traits. Recentanalysis has shown how
such "punctuated equilibrium" can be caused by transitions between multiple adaptive peaks,
but this explanation is insufficient to explain all such transitions that occur in simulations.
We show analytically that epistatic interactions among many loci can also produce such
discontinuous evolution, and that the transitions predicted by this analysis are consistent
with those observed. If such phenomena occur in biology they may explain some sudden
changes in the phenotipic average of a population.


Computer Simulated Evolution


Using parallel computers it is now practical to simulate the evolution of populations
of hundreds of thousands of individuals under selective pressure over tens of thousands of
generations. Such simulations serve two useful purposes. First, they can be used to solve large
combinatorial optimization problems [Bounds (1987), Wang (1987), Bremermann (1962),
Holland (1975), Rechenberg (1973)]. Second, to the degree that they are analogous to
biological systems, they may provide some insights into the workings of the far more
complicated biological world. In this latter role, such simulated systems can offer no evidence
one way or the other as to what actually happens in nature, but they can serve a role similar
to biological models in testing the consequences of a theory or in suggesting possible
mechanisms.


In the simulations described below, individuals are represented within the computer's
memory as pairs of number strings, which are analogous to the chromosome pairs of diploid
organisms. The population evolves in discrete generations. At the beginning of each
generation the computer begins by constructing a phenotype for each individual, using the
number strings (the "genome") as a specification. The function used for the interpretation is
dependent upon the experiment, but typically a fixed region within each of the chromosomes
is used to determine each phenotypic trait of the individual. Discrepancies between the two
bit strings in the pair are resolved according each pair of number strings into a single string
by randomly choosing substrings from one or the other. The crossover rate is an

experimental parameter. At this point, randomized point mutations or transpositions may
also be introduced. The two haploids from each mating pair are combined to produce the
genetic specification for each individual in the next generation. Each mating pair is used to
produce several siblings. (A constant population size is maintained by normalizing the
average fecundity.) The entire process is repeated for each generation, using the gene pool
produced by one generation as a specification for the next.


The experiments that we have conducted have simulated populations ranging in size
from 512 to ~10
6individuals, with between 1 and 40 chromosomes per individual.
Chromosome lengths ranged from 10 to 200 bits per chromosome, mutuation rates from 0
to 25% probability of mutation per bit per generation, and crossover frequencies ranged
from 0 to an average of 4 per chromosome. On a parallel computer each of the operations
such as sorting, mating, etc. can take place on the entire population simultaneously.
Using a Connection Machine with 64,536 processors, a typical experiment progresses at
about 100 to 1000 generations per minute.


Figure 1 shows the course of one such experiment, using 16,384 individuals, 8
chromosomes per individual, 128 bits per chromosome, random mating, and no mutation.
In this example the fitness function was chosen to solve the following optimization problem:
Find the minimal fixed sequence of comparisons and optional exchanges that can be used to
sort any list of 16 distinct numbers into descending order. (This is known as a sorting
network problem. Although it has been extensively investigated [Knuth], the optimal
solution is unknown.) To generate a scorable phenotype, fixed subsequences (loci) of each
genome are used to determine 64 numerical traits for each individual. Each trait is
interpreted by the computer as an instruction to compare and exchange a particular item in
the list to be sorted. Each individual is given a random test list which it attempts to sort by
executing the sequence of instructions specified by its own particular traits. The individual is
scored according to how well it arranged the sequence into descending order; one point is
scored for each pair in the correct order. This score is used to determine the individual's
probability of survival. Figure 1 shows the average score of the population over the period of
500 generations. Starting from a random initial population, this simulation found a solution
using 61 exchanges. (The best known solution requires 60.) In effect, the process has
automatically written a computer program for sorting numbers.


One important detail is exactly how the scores are used to determine differential
survival during the selection stage. We have used three different methods. The simplest is
truncating selection of a fixed percentage of the population with the highest score. A second
method, which seems to produce better results, is based on pairwise competition. Pairs of
individuals are chosen by a randomized process similar to one of those used for choosing
mates. The individual with the higher score survives to reproduce, and the other is

eliminated. A third method of selection is to assign an a priori probability of survival to each
phenotype. This method of selection corresponds to the mathematical models of fitness most
often used by population biologists, and to the analysis below.

Figure 1: The evolution of a simulated population of size 16,384 solving an artificial
optimization problem (see text). The periodic noise is an artifact of a cyclic pattern of
inbreeding.


Punctuated Equilibrium


One striking feature of such simulations is that the average fitness of a population
does not always increase steadily with time. Instead, progress often consists of long periods of
relative stasis, punctuated by short periods of rapid progress. Since biological evolution may
also be characterized by such 'punctuated equilibrium' [Gould], the question arises whether
both phenomena can be attributed to similar causes. One plausible explanation of
punctuated equilibria in biology is the "drift" of populations between multiple adaptive
peaks [Newman, Lande, Lewin]. This explanation assumes that the fitness function in the
space of genotypes has multiple local optima. A population can make a transition from one
optimum to another by briefly passing through less adapted intermediate states. The shift
from one peak to another may be caused by random drift due to finite population effects
[Lande, Newman], by a time varying -adaptive landscape [Wright], or by some combination
of the two.


Figure 2: The evolution of a population of 65,536 individuals. Fitness is influenced
by three favorable polygenic traits, depending conjunctively on five, six, and seven unlinked
loci, respectively.


Many of the observed transitions in the simulated systems seem to be caused by such
effects. However, some observations require a different explanation. In particular, transitions
occur even in extremely large populations (~10
6individuals) with time-invarient fitness
functions. We have also been able to contrive fitness functions with only a single adaptive
peak that exhibit this behavior. (See figure 2.)


Analysis


The systems with sudden transitions all have a high degree of epistasic interaction
between loci. This seems to be the primary explanation for the observed behavior. When a
favorable trait is dependent on a specific allele at several sites, say 5 or 10, there is a positive
feedback effect between the selective value of each individual allele and the frequencies of the
co-adaptive alleles at other loci. This leads to a bimodal occurence of the trait in the
population; it is either almost always present, or almost always absent. The transition
between the two states is rapid and irreversible, as described analytically below.


For simplicity, we will consider an extreme type of epistasis where each of k favorable
alleles confers selective advantage only when it occurs in conjunction with all of the others.
(A more general model would allow partial advantage for certain subsets.) We will assume
that there are k unlinked, dominant genes and that a favorable trait occurs only when all k
combine in a single individual. Let the average number of descendants produced by an
individual without the favorable trait be F, and with the favorable trait (1 + s)F, where s is the

selective advantage. For a finite randomly mating population of size N, the evolution of the
population can be described by a k-dimensional set of Fokker-Planck equations:

For small values of s, this may be closely approximated by:

The solution to this equation resembles a step function; the limit points are zero and
one, and the limiting derivative is zero in both directions. For a sufficiently small initial p,
the proportion of the time spent in transition from zero to one will be arbitrarily small.
(More precisely, for arbitrarily small Delta
land Delta2, there exists an c such that the ratio of the
time required for p to go from Delta
lto 1 - Deltal, to the time from cto 1 - cis less than Delta2.)
The behavior may be understood in qualitative terms. Since an individual allele only has
selective advantage and the presence of the others, initially the selective value of each allele is
relatively small. The occasional chance recombination of favorable alleles slowly pushes up
the frequency of the individual alleles until they reach some critical value. At this point, any

small increase in the frequency of one allele will greatly enhance the selective values of the
others, and vice versa. This positive feedback rapidly forces the population through the
transition.
The value of pat which frequency of the trait is changing most rapidly is:

A finite population will reach the transition point not only through a process of
selection but also through random drift. To analyze this we must consider the full k-
dimensional system of Fokker-Planck equations. Although these equations are difficult to
solve analytically, they can be solved numerically by conventional methods, especially for the
case of symmetric initial conditions [Hillis and Taylor (1988)]. Figure 3 shows an example
solution of such an equation over 4,000 generations for n= 3, 000, k= 6, s= .02, starting
from a uniform distribution of initial conditions. Each line represents the distribution of
populations at a point in time. The bimodal structure of the solutions correspond to the two
states of the system, and the low region in the middle corresponds to populations that are in
transition.

Figure 3: The evolution of the probability distribution of the occurence of a 6-loci
trait in populations of size 3,000 over 4,000 generations. The two peaks in the distribution
correspond to the two possible states of the system.

Discussion


If such conjunctively determined traits occur in nature, they would offer an
alternative explanation for sudden evolutionary changes that might otherwise be attributed to
a change in the environment, or to transitions between adaptive peaks. How common are
such traits? Most known examples of synthetic genes, such as prand kpr in Drosophila [ref],
involve only two or three sites. Chaeta-number in Drosophila is determined by combinations
at at least five loci [Thoday (1977)), and artificial selection for this trait [Thoday and Boam
(1961)] shows clear examples of punctuated progress (see figure 4). In this case, certain
subsets of the genes can, by themselves, produce increased chaeta-number.


Even in our simulated systems it is rare for a large group of genes to produce a
favorable trait without some subsets having at least partial advantage. In this case, the
behavior of the system is intermediate between the more commonly analyzed additive case an
' d the purely conjunctive case analyzed above. The relative importance of epistatic
interaction has been and will no doubt continue to be a subject of controversy [ref]. The
observation of sudden adaptive changes in large biological populations, or the absence
thereof, would lend evidence to this discussion.

Figure 4: The mean chaeta-number in lines of Dosophila (adapted from
Thoday and Boam (Thoday and Boam [1961])).

Acknowledgements


I would like to thank those biologists and mathematicians who have aided me in this
adventure outside of my own field: James F. Crow, Lennart Johnsson, Eric Lander, Charles
Taylor, and Wati Taylor.


References:
Bounds, D.G. "New Optimization Methods from Physics and Biology," Nature, vol. 329,
September 17, 1987.


Bremermann, H.J. "Optimization Through Evolution and Recombination," from Self-
Organizing Systems, M.C. Yovits, G.D. Goldstein, and G.T. Ja-
cobi (eds), Spartan, Washington, D.C., 1962, pp. 93-106.


Crow, J. and M. Kimura. An Introduction to Population Genetic Theory, Burgess
Publishing, 1970.


Gould, S. and N. Eldredge. "Paleobiology," Journal of Paleobiology, vol. 3, pp. 115-151,
1977.


Hillis, W.D. and W. Taylor IV. "Exploiting Symmetry in High-Dimensional Finite
Difference Calculations," Thinking Machines Corporation, 1988.


Holland, J.H. "Adaption in Natural and Artificial Systems," University of Michigan, Ann
Arbor, 1975.


Knuth, D.E. Sorting and Searching, vol. 3 of The Art of Computer Pro-
gramming, Addison-Wesley, Reading, MA, 1973.


Lande, R. "Expected Time for Random Genetic Drift of a Population Be-
tween Stable Phenotypic States," Evolution, Proceedings of the National Academy of
Science, vol. 82, pp. 7641-7645, November 1985.


Newman, C., J. Cohen, and C. Kipnis. "Neo-Darwinian Evolution Implies Punctuated
Equilibria," Nature, vol. 315, May 30, 1985.


Rechenberg, 1. Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der
Biologischen Evolution, Frommann-Holzboog, Stuttgart, 1973.

 

Tejeda, A.L. "Impact of the Use of Mixtures and Sequences of Insecticides in the Evolution
of Resistance in Culex Quinquefasciatus Say (Diptera: Culicidae)," Ph.D. Dissertation,
University of California, Riverside, June 1980.


Thoday, J.M. "Effects of Specific Genes," from Proceedings of the Interna-
tional Conference on Quantitative Genetics, E. Pollack, 0. Kempthorne, and T. Bailey, Jr.
(eds.), Iowa State University Press, Ames, 1977.


Thoday, J.M. and T.B. Bo;Lm. "Regular Responses to Selection,' Genet. Rei., 2, pp. 161-
176, 1961


Wang, Q. "Optimization by Simulating Molecular Evolution,' BioL Cy-
bern., 57, pp. 95-101, 1987.


Wright, S. Evolution and the Genetics of Populations, vol. 3 of Experi-
mental Results and Evolutionary Deductions, University of Chicago Press, 1977.

 

To bookmark this page click here and then bookmark this unframed version.