Why physicists like models, and why bioligists should

by, W. Daniel Hillis

The late Richard Feynman, a physicist, once warned me never to compare physics to biology in front of a biologist: "It just makes them mad." He was joking, but he was also right. The source of friction is that there are few quantitative theories in biology that have the predictive power of physics. Even the great central theory of biology, the neo-Darwinian synthesis, has little of the precision of, say, general relativity or quantum chromodynamics. Biologists are annoyed when they sense that the physicists blame this on the biologists, rather than the inherent difficulty of the subject matter. As all biologists (and even most physicists) understand, biological systems are complex, multi-causal, poorly partitionable, and, let's face it messy. Biological systems have a beauty of their own, but often it is a beauty of complexity and richness, rather than the stark simple reductionist elegance of physics.

One consequence of this difference in the subject matter is a difference in the relationship between theory and experiment. To biologists theory is a poor approximation of reality; to a physicist it seems almost the reverse. To physicists the theory, if correct, is what is real; it is the experimental data that are just a reflection.

This difference in attitude can be seen in Einstein's comment about the theory of general relativity: When asked what he would have done if Eddington's measurements of light bending around the sun had failed to match the predictions of the theory, Einstein said, "Then I would have felt sorry for the dear Lord. The theory is correct."1 Physicists have learned the lesson that a very simple theory of what is going on is often correct. Biologists have learned the opposite lesson: simple mathematical theories of biology are usually wrong.

Up to this point, the difference in attitude seems to reflect a realistic assessment of how much a simple theory in the respective fields is likely to explain. What is not so justifiable is how this difference in attitude is reflected in the difference in the role of computational and mathematical modeling in the two fields. Biologists seem to feel the only legitimate role of such a model is to predict the data. This narrow view of models leads biologists to miss a source of power and insight that has proven extremely useful in the physical sciences.

Like biologists, physicists often encounter situations that are so complex that no simple predictive model applies: the turbulent whirlpools in a flowing fluid, the collision of galaxies, or the crystallization of a snowflake. Not all physics is as simple as the motion of the planets around the sun. Even the simple motion of the planets becomes difficult to predict over long time periods: on a 40 million year time scale the planets are chaotic. When physicists find themselves in situations that are too difficult to predict, they often resort to simplified computational or mathematical models as a source of insight. For example, the chaotic nature of planetary motion was studied by running large numbers of computer simulations of heavenly bodies using slightly different initial conditions on each run.2 No single one of these runs was expected to predict the actual motion over that time period, but rather the entire set of runs gave insights into the types of motions that can be expected. Fluid turbulence is another example. When fluid flow is studied by numerical integration of the Navier-Stokes equation, or by lattice gas models based on cellular automata, often the intent is not to predict the detailed flow of the fluid of any particular physical situation, but rather to gain insights into the nature of fluids and turbulence.

Biologists are already familiar with this type of instructive model in cases where it involves another living form instead of a computation. For example, when an experimenter reports that "C. elegans serves as a model for axonogenesis," they probably do not believe that neural growth of the nematode makes any quantitative prediction about the neural system of a cat, but rather that the simpler system can serve as an instructive example.

For various reasons, biologists who are willing to accept a living model as a source of insight are unwilling to apply the same criterion to a computational model. Some of the reasons for this are legitimate, others are not. For example, some fields of biology are so swamped with information that new sources of ideas are unwelcome. A Nobel Prize winning molecular biologist said to me recently, "There may be some good ideas there, but we don't really need any more good ideas right now." He might be right. A second reason for biologists' general lack of interest in computational models is that they are often expressed in mathematical terms. Because most mathematics is not very useful in biology, biologists have little reason to learn much beyond statistics and calculus. The result is that the time investment required for many biologists to understand what is going on in computational models is not worth the payoff. A third reason why biologists prefer living models is that all known life is related by common ancestry. Two living organisms may have many things in common that are beyond our ability to observe. Computational models are only similar by construction; life is similar by descent.

All three of these reasons are legitimate. But there are also emotional reasons that are less so. Biologists are biologists because they love living things. A computation is not alive. Most biologists can more easily empathize with someone who is feeding sugar to a cell culture than someone who is feeding programs to a computer. I once asked John Maddox, the editor of Nature, why he thought computational modeling results in biology received relatively little attention. He was sympathetic, but pointed out that most of the reviewers were experimental biologists: "You can't expect people who have spent years just getting their organism to grow to have much respect for someone who does something on a computer in a few hours." I agree with Maddox's observation on human psychology.

The problem predates computers. In his classic essay, "A Defense of Beanbag Genetics," J.B.S. Haldane described how his insights into various evolutionary processes came from working with models that were based on exchanging different colored beans from bag to bag.3 Haldane, too, had trouble getting his modeling results taken seriously. In reference to his model-based estimates of the intensity selected pressures favoring industrial melanism, he complains: "If biologists had had a little bit more respect for algebra and arithmetic, they would have accepted the

existence of such intense selection thirty years before they actually did so."3 Most biologists are far more convinced by an analogy to another organism than by an analogy to a bag of beans.

So with all these reasons for not using non-biological models, why fight the consensus? The best reason is that computational models help in understanding how things might work. When a system is too complex to understand, it often helps to understand a simpler system with analogous behavior. Just as a physicist can get some insight into electric waves by watching waves of water, a neuro- physiologist can get insight into real neurons by playing with a simple model of a neural network. The "beanbag" model used by Haldane is far simpler than actual biological genetics, yet working with the model led him to the concept of mutation load, the first accurate estimates of human mutation rates, understanding of the mechanisms of stable polymorphism, and a variety of other insights into real biological systems.

Computational models complement real experimental data in several dimensions. The measurements are precise and exactly repeatable. The costs are low and the time scales are short. It is often possible to perform much larger scale experiments than are practical in a laboratory. For example, a simulation-based computational model of evolution may run for hundreds of thousands of generations in a few hours. The complete "gene pool" over all the generations is available for inspection and analysis. There is no need to worry about imperfections in the fossil record. The cost of all this largeness, of course, is that the model represents a tremendous simplification over real biological evolution. Only specific idealized aspects of real biological organisms are included in the model. The model cannot prove anything conclusive about real biological evolution, anymore than the nervous system of a nematode can prove anything about the nervous system of a mammal. Models of this type can only suggest what might be true. It is still up to the experimenter to determine what actually is.

Sometimes a computational model can suggest which experimental variables are most important to investigate. For example, Fry, Taylor, and Devgan's computer simulation of the mosquito population in Orange County, California, suggested that the use of larvaecides for pest control has very little effect on the overall population of adult mosquitoes.4 A large-scale experiment is now being planned to test this result experimentally. Sometimes a model can suggest the existence of a particular interaction or structure. Poggio and Reichardt 's computational models of the motion detection system in the lobula plate of the fly suggested the existence of specific neurons with a particular functionality.5 Experimental work by Egelhaaf and Hausen confirmed that such neurons actually exist in the fly.6

Computational models can also be used to synthesize experimental data for testing methods of data interpretation. For instance, in phylogenetics there is some controversy as to the most accurate methods of interpreting homologous sequence data to establish relationships of descent. Alternate methods can be evaluated by trying them out on a known evolutionary tree that has been generated artificially. This can be done either on a computer or in a laboratory. The two types of experiments complement one another; the laboratory experiment uses real DNA7 (in this case in viruses), but the computer experiment covers a longer time scale over a wider range of parameters. Recently Huelsenbeck used a parallel computer to simulate the evolution of three million phylogenetic trees.8 By investigating the entire parameter space, he was able to produce a kind of map showing which methods of data interpretation were most accurate in which situations.

A positive example of interaction between models and experimental biology has been in the study of the visual system. Simplified computational models in this area have a long tradition going back to Hebb and McCulloch. There has been a productive history of interaction between vision researchers who concentrate on building mathematical and computer models and those who concentrate on neurobiological or psychophysical experiments. This collaboration has led to substantial insights into the mechanisms for the perception of color, motion, and stereo disparity. There is even progress on understanding the visual recognition of objects. The proceedings of the recent Dahlem conference on "Models in Neurobiology" is an example of this productive interaction between modelers and experimental neurobiologists.9

Biologists will have an opportunity to put their prejudice to the test in the emerging field of "artificial life," which is about the construction and study of such models. A recent artificial life conference included computer models of embryogenesis, immune systems, evolution, behavior, chemical metabolism, and the origin of life.10 For the most part, these models bear the same relationship to real biology as beanbags do to genetics. They are not intended to predict the actual details of the specific situation, but rather to provide simple systems that are analogous to biological systems in a few specific ways. Most of these models will turn out to be useless, but a few will not. Those that are most useful will probably not predict any particular experimental data, but instead they will give some surprising ideas about how something might work.

Displayed on my computer screen is a simulated colony of simple life forms, breeding and evolving in their simulated petri dish. Each genotype is a different color, so I can watch the dance of adaptation as the generations flick by, a few each second. These creatures, co-evolving with a simulated parasite, show some of the same surprising evolutionary behavior as natural life, including speciation, punctuated equilibria, and selection in favor of sexual reproduction. Will models like this suggest some useful new ideas about how real organisms work? My guess is, yes. Next time someone wants to show you something "growing" on their computer screen, you might want to take a look. Even if it's not alive, it may be worth watching.

References

1. Clark, R.W. Einstein: The Life and Times, World Publishing Company, 1971.

2. Sussman, G. and J. Wisdom. "Chaotic Evolution of the Solar System," Science, vol. 257, July 3, 1992, pp. 56-62.

3. Haldane, J.B.S. "A Defense of Beanbag Genetics," Perspectives in Biology and Medicine, pp. 343-359, Spring 1964.

4. Fry, J., C.E. Taylor, and U. Devgan. "An Expert System for Mosquito Control in Orange County California," Bull. Soc. Vector Ecol., 14(2), pp. 237-246, December 1989.

5. Poggio, T., W. Reichardt, and K. Hausen. "A Neuronal Circuitry for Relative Movement Discrimination by the Visual System of the Fly," Naturwissenschaften, 68, pp. 443-466, 1980.

6. Egelhaaf, M., K. Hausen, W. Reichardt, and C. Wehrhahn. "Visual Course Control in Flies Relies on Neuronal Computation of Object and Background Motion," Trends in NeuroSciences, vol. 11, no. 8, August 1988.

7. Hillis, D.M., et al. "Experimental Phylogenetics: Generation of a Known Phylogeny," Science, vol. 255, pp. 589-592, 1992.

8. Huelsenbeck, J. and D.M. Hillis. "Success of Phylogenetic Methods in the Four-Taxon Case," Systematic Biology, in press.

9. "Exploring Brain Functions: Models in Neuroscience," Proceedings of the Dahlem Workshop, Don Glaser and Tomaso Poggio, eds., John Wiley & Sons, in press.

10. "Artificial Life II," Proceedings of the Workshop on Artificial Life, Christopher G. Langton, et al., eds., Addison-Wesley, 1992.

COVER PHOTOGRAPHS

The pictures on the cover were produced by a process of evolution in which a computer program plays the role of the genotype, and a picture on the screen is the phenotype. The "genes" are individual computer instructions such as the instruction to add two numbers. The population begins as individuals produced by random sequences of such instructions and evolves through processes of mutation and recombination between pairs of individuals. The selection process in this case is artificial: a human judges which individuals are most aesthetically pleasing; the most beautiful survive. The human may also pick specific pairs of individuals to mate, a process analogous to selective breeding in plants and animals. The pictures are an animated sequence showing the growth of the individual plants.

The synthesized pictures of plants were produced by a simple model of plant morphogenesis, based on locally acting growth rules. The different plant forms vary in terms of parameters of segment growth rate, branching and budding parameters, etc. The pictures are an animated sequence showing the growth of the individual plants.

The pictures and programs were created by Karl Sims of Thinking Machines Corporation, using a Connection Machine supercomputer.

To bookmark this page click here and then bookmark this unframed version.