The Long Now Foundation: Press


	Punctuated Equilibrium Due to Epistasis in Simulated Populations
		Daniel Hillis DRAFT, March 7,1988
Abstract Simulated populations evolving under steady selective pressure show sudden unidirectional changes in the frequency of certain traits. Recentanalysis has shown how such "punctuated equilibrium" can be caused by transitions between multiple adaptive peaks, but this explanation is insufficient to explain all such transitions that occur in simulations. We show analytically that epistatic interactions among many loci can also produce such discontinuous evolution, and that the transitions predicted by this analysis are consistent with those observed. If such phenomena occur in biology they may explain some sudden changes in the phenotipic average of a population. Computer Simulated Evolution Using parallel computers it is now practical to simulate the evolution of populations of hundreds of thousands of individuals under selective pressure over tens of thousands of generations. Such simulations serve two useful purposes. First, they can be used to solve large combinatorial optimization problems [Bounds (1987), Wang (1987), Bremermann (1962), Holland (1975), Rechenberg (1973)]. Second, to the degree that they are analogous to biological systems, they may provide some insights into the workings of the far more complicated biological world. In this latter role, such simulated systems can offer no evidence one way or the other as to what actually happens in nature, but they can serve a role similar to biological models in testing the consequences of a theory or in suggesting possible mechanisms. In the simulations described below, individuals are represented within the computer's memory as pairs of number strings, which are analogous to the chromosome pairs of diploid organisms. The population evolves in discrete generations. At the beginning of each generation the computer begins by constructing a phenotype for each individual, using the number strings (the "genome") as a specification. The function used for the interpretation is dependent upon the experiment, but typically a fixed region within each of the chromosomes is used to determine each phenotypic trait of the individual. Discrepancies between the two bit strings in the pair are resolved according each pair of number strings into a single string by randomly choosing substrings from one or the other. The crossover rate is an

experimental parameter. At this point, randomized point mutations or transpositions may
also be introduced. The two haploids from each mating pair are combined to produce the
genetic specification for each individual in the next generation. Each mating pair is used to
produce several siblings. (A constant population size is maintained by normalizing the
average fecundity.) The entire process is repeated for each generation, using the gene pool
produced by one generation as a specification for the next.

The experiments that we have conducted have simulated populations ranging in size
from 512 to ~10⁶individuals, with between 1 and 40 chromosomes per individual.
Chromosome lengths ranged from 10 to 200 bits per chromosome, mutuation rates from 0
to 25% probability of mutation per bit per generation, and crossover frequencies ranged
from 0 to an average of 4 per chromosome. On a parallel computer each of the operations
such as sorting, mating, etc. can take place on the entire population simultaneously.
Using a Connection Machine with 64,536 processors, a typical experiment progresses at
about 100 to 1000 generations per minute.

Figure 1 shows the course of one such experiment, using 16,384 individuals, 8
chromosomes per individual, 128 bits per chromosome, random mating, and no mutation.
In this example the fitness function was chosen to solve the following optimization problem:
Find the minimal fixed sequence of comparisons and optional exchanges that can be used to
sort any list of 16 distinct numbers into descending order. (This is known as a sorting
network problem. Although it has been extensively investigated [Knuth], the optimal
solution is unknown.) To generate a scorable phenotype, fixed subsequences (loci) of each
genome are used to determine 64 numerical traits for each individual. Each trait is
interpreted by the computer as an instruction to compare and exchange a particular item in
the list to be sorted. Each individual is given a random test list which it attempts to sort by
executing the sequence of instructions specified by its own particular traits. The individual is
scored according to how well it arranged the sequence into descending order; one point is
scored for each pair in the correct order. This score is used to determine the individual's
probability of survival. Figure 1 shows the average score of the population over the period of
500 generations. Starting from a random initial population, this simulation found a solution
using 61 exchanges. (The best known solution requires 60.) In effect, the process has
automatically written a computer program for sorting numbers.

One important detail is exactly how the scores are used to determine differential
survival during the selection stage. We have used three different methods. The simplest is
truncating selection of a fixed percentage of the population with the highest score. A second
method, which seems to produce better results, is based on pairwise competition. Pairs of
individuals are chosen by a randomized process similar to one of those used for choosing
mates. The individual with the higher score survives to reproduce, and the other is


eliminated. A third method of selection is to assign an a priori probability of survival to each phenotype. This method of selection corresponds to the mathematical models of fitness most often used by population biologists, and to the analysis below.

Figure 1: The evolution of a simulated population of size 16,384 solving an artificial optimization problem (see text). The periodic noise is an artifact of a cyclic pattern of inbreeding. Punctuated Equilibrium One striking feature of such simulations is that the average fitness of a population does not always increase steadily with time. Instead, progress often consists of long periods of relative stasis, punctuated by short periods of rapid progress. Since biological evolution may also be characterized by such 'punctuated equilibrium' [Gould], the question arises whether both phenomena can be attributed to similar causes. One plausible explanation of punctuated equilibria in biology is the "drift" of populations between multiple adaptive peaks [Newman, Lande, Lewin]. This explanation assumes that the fitness function in the space of genotypes has multiple local optima. A population can make a transition from one optimum to another by briefly passing through less adapted intermediate states. The shift from one peak to another may be caused by random drift due to finite population effects [Lande, Newman], by a time varying -adaptive landscape [Wright], or by some combination of the two.

Figure 2: The evolution of a population of 65,536 individuals. Fitness is influenced
by three favorable polygenic traits, depending conjunctively on five, six, and seven unlinked
loci, respectively.

Many of the observed transitions in the simulated systems seem to be caused by such
effects. However, some observations require a different explanation. In particular, transitions
occur even in extremely large populations (~10⁶individuals) with time-invarient fitness
functions. We have also been able to contrive fitness functions with only a single adaptive
peak that exhibit this behavior. (See figure 2.)

Analysis

The systems with sudden transitions all have a high degree of epistasic interaction
between loci. This seems to be the primary explanation for the observed behavior. When a
favorable trait is dependent on a specific allele at several sites, say 5 or 10, there is a positive
feedback effect between the selective value of each individual allele and the frequencies of the
co-adaptive alleles at other loci. This leads to a bimodal occurence of the trait in the
population; it is either almost always present, or almost always absent. The transition
between the two states is rapid and irreversible, as described analytically below.

For simplicity, we will consider an extreme type of epistasis where each of k favorable
alleles confers selective advantage only when it occurs in conjunction with all of the others.
(A more general model would allow partial advantage for certain subsets.) We will assume
that there are k unlinked, dominant genes and that a favorable trait occurs only when all k
combine in a single individual. Let the average number of descendants produced by an
individual without the favorable trait be F, and with the favorable trait (1 + s)F, where s is the


selective advantage. For a finite randomly mating population of size N, the evolution of the population can be described by a k-dimensional set of Fokker-Planck equations:



For small values of s, this may be closely approximated by:

The solution to this equation resembles a step function; the limit points are zero and one, and the limiting derivative is zero in both directions. For a sufficiently small initial p, the proportion of the time spent in transition from zero to one will be arbitrarily small. (More precisely, for arbitrarily small Delta_land Delta₂, there exists an c such that the ratio of the time required for p to go from Delta_lto 1 - Delta_l, to the time from cto 1 - cis less than Delta₂.) The behavior may be understood in qualitative terms. Since an individual allele only has selective advantage and the presence of the others, initially the selective value of each allele is relatively small. The occasional chance recombination of favorable alleles slowly pushes up the frequency of the individual alleles until they reach some critical value. At this point, any


small increase in the frequency of one allele will greatly enhance the selective values of the others, and vice versa. This positive feedback rapidly forces the population through the transition. The value of pat which frequency of the trait is changing most rapidly is:


A finite population will reach the transition point not only through a process of selection but also through random drift. To analyze this we must consider the full k- dimensional system of Fokker-Planck equations. Although these equations are difficult to solve analytically, they can be solved numerically by conventional methods, especially for the case of symmetric initial conditions [Hillis and Taylor (1988)]. Figure 3 shows an example solution of such an equation over 4,000 generations for n= 3, 000, k= 6, s= .02, starting from a uniform distribution of initial conditions. Each line represents the distribution of populations at a point in time. The bimodal structure of the solutions correspond to the two states of the system, and the low region in the middle corresponds to populations that are in transition.

Figure 3: The evolution of the probability distribution of the occurence of a 6-loci trait in populations of size 3,000 over 4,000 generations. The two peaks in the distribution correspond to the two possible states of the system.


Discussion If such conjunctively determined traits occur in nature, they would offer an alternative explanation for sudden evolutionary changes that might otherwise be attributed to a change in the environment, or to transitions between adaptive peaks. How common are such traits? Most known examples of synthetic genes, such as prand kpr in Drosophila [ref], involve only two or three sites. Chaeta-number in Drosophila is determined by combinations at at least five loci [Thoday (1977)), and artificial selection for this trait [Thoday and Boam (1961)] shows clear examples of punctuated progress (see figure 4). In this case, certain subsets of the genes can, by themselves, produce increased chaeta-number. Even in our simulated systems it is rare for a large group of genes to produce a favorable trait without some subsets having at least partial advantage. In this case, the behavior of the system is intermediate between the more commonly analyzed additive case an ' d the purely conjunctive case analyzed above. The relative importance of epistatic interaction has been and will no doubt continue to be a subject of controversy [ref]. The observation of sudden adaptive changes in large biological populations, or the absence thereof, would lend evidence to this discussion.

Figure 4: The mean chaeta-number in lines of Dosophila (adapted from Thoday and Boam (Thoday and Boam [1961])).

Acknowledgements

I would like to thank those biologists and mathematicians who have aided me in this
adventure outside of my own field: James F. Crow, Lennart Johnsson, Eric Lander, Charles
Taylor, and Wati Taylor.

References:
Bounds, D.G. "New Optimization Methods from Physics and Biology," Nature, vol. 329,
September 17, 1987.

Bremermann, H.J. "Optimization Through Evolution and Recombination," from Self-
Organizing Systems, M.C. Yovits, G.D. Goldstein, and G.T. Ja-
cobi (eds), Spartan, Washington, D.C., 1962, pp. 93-106.

Crow, J. and M. Kimura. An Introduction to Population Genetic Theory, Burgess
Publishing, 1970.

Gould, S. and N. Eldredge. "Paleobiology," Journal of Paleobiology, vol. 3, pp. 115-151,
1977.

Hillis, W.D. and W. Taylor IV. "Exploiting Symmetry in High-Dimensional Finite
Difference Calculations," Thinking Machines Corporation, 1988.

Holland, J.H. "Adaption in Natural and Artificial Systems," University of Michigan, Ann
Arbor, 1975.

Knuth, D.E. Sorting and Searching, vol. 3 of The Art of Computer Pro-
gramming, Addison-Wesley, Reading, MA, 1973.

Lande, R. "Expected Time for Random Genetic Drift of a Population Be-
tween Stable Phenotypic States," Evolution, Proceedings of the National Academy of
Science, vol. 82, pp. 7641-7645, November 1985.

Newman, C., J. Cohen, and C. Kipnis. "Neo-Darwinian Evolution Implies Punctuated
Equilibria," Nature, vol. 315, May 30, 1985.

Rechenberg, 1. Evolutionsstrategie: Optimierung Technischer Systeme nach Prinzipien der
Biologischen Evolution, Frommann-Holzboog, Stuttgart, 1973.

Tejeda, A.L. "Impact of the Use of Mixtures and Sequences of Insecticides in the Evolution
of Resistance in Culex Quinquefasciatus Say (Diptera: Culicidae)," Ph.D. Dissertation,
University of California, Riverside, June 1980.

Thoday, J.M. "Effects of Specific Genes," from Proceedings of the Interna-
tional Conference on Quantitative Genetics, E. Pollack, 0. Kempthorne, and T. Bailey, Jr.
(eds.), Iowa State University Press, Ames, 1977.

Thoday, J.M. and T.B. Bo;Lm. "Regular Responses to Selection,' Genet. Rei., 2, pp. 161-
176, 1961

Wang, Q. "Optimization by Simulating Molecular Evolution,' BioL Cy-
bern., 57, pp. 95-101, 1987.

Wright, S. Evolution and the Genetics of Populations, vol. 3 of Experi-
mental Results and Evolutionary Deductions, University of Chicago Press, 1977.

To bookmark this page click here and then bookmark this unframed version.

Punctuated Equilibrium