Math-Blog |
||
Frankenstein FunctionsPublished on Oct 12, 2010 02:03 pm
Introduction In 1915, after several failed attempts, Albert Einstein promulgated the General Theory of Relativity, a mathematical theory of gravitation that reconciled gravity with his Special Theory of Relativity and explained the gravitational force as the warping of space and time by matter and energy. Amongst other things, the theory predicted a slightly different deflection of light by bodies such as the Sun than the prevailing Newtonian theory of gravitation. In 1919 Einstein became an international celebrity when the English astronomer Arthur Eddington announced results from the measurement of the deflection of starlight by the Sun during a solar eclipse confirming Einstein’s theory. But, the story of Einstein’s triumph is more complicated than that. Einstein patched his theory to agree with observations and the prejudices of his time in 1917. Then, he later discarded the patch when new observations appeared to confirm his original theory of 1915. Recently, in an astonishing about face, physicists and astronomers have resurrected Einstein’s until then embarrassing patch to force agreement between new observations and the reigning Big Bang theory/General Theory of Relativity. This is a recent example of the construction of Frankenstein Functions in which scientists and mathematicians construct arbitrary functions out of many mathematical pieces to match observational data or simply preconceived notions. Einstein Field Equations (General Relativity) The General Theory of Relativity is a set of equations relating the so-called metric of space-time , loosely a measure of the curvature of space-time, to the density of mass and energy variously expressed in modern mathematical notation as: Short version (using Einstein tensor ) is the so-called metric of space and time. is Newton’s Gravitational Constant. is the speed of light. is the so-called stress-energy tensor which loosely represents the density of mass and energy in space and time. The indices and usually run from 0 to 3 (0,1,2,3) or 1 to 4 (1,2,3,4) or over the symbols , , , and referring to the three spatial dimensions and the single time “dimension.” The author uses scare quotes for the time “dimension” because it differs radically from the three spatial dimensions in common experience, although it can be represented in a very similar symbolic mathematical way as the numerical calendar time indicated by a clock. Using standard cosmological units , the equations are written: The long version (using Ricci curvature tensor and scalar curvature ) is: or There is the symmetric” decomposition, into the scalar part: and the traceless symmetric tensor part: The Greatest Blunder There was however a problem. When astronomers, physicists, and mathematicians worked out the implications of the equations of General Relativity, they found that the universe as a whole must be either expanding or contracting. The universe was not stable in the equations of General Relativity. At the time, both the evidence of observational astronomy and the philosophical bias of most scientists was that the universe was neither expanding nor contracting. The universe was static, perhaps of infinite age. Confronted with evidence apparently clearly falsifying his theory, Einstein did what scientists, philosophers, scholars, attorneys, and political activists have done since time immemorial. He patched his theory to fit the observations and prejudices of his time. Einstein added a mysterious extra term known as the “cosmological constant” that counterbalanced the predicted expansion or contraction of the universe: where the extra term is where the constant is known as the cosmological constant. Physically the cosmological constant may correspond to a mysterious energy field filling the entire universe. With a proper choice of the cosmological constant , the universe was static, neither expanding nor contracting. Subsequently, the redshifts of extragalactic nebulae were reinterpreted as due to the motion of the nebulae. That is, the nebulae, recognized as galaxies outside our galaxy the Milky Way, were flying away from us, causing a redshift of light. Over time, the astronomer Edwin Hubble was able to show that the dimmer the galaxy and therefore presumably the farther away (at least on average) the galaxy, the larger the redshift and thus the faster the galaxy appeared to be running away from the Earth. This rather peculiar observation could be easily explained if the universe was expanding as predicted by the original General Theory of Relativity without the cosmological term (the cosmological constant was either zero or nearly zero). As might be imagined, Einstein did what scientists, philosophers, scholars, attorneys, and political activists have done since time immemorial when confronted by evidence apparently clearly falsifying their current theory but confirming the original un-patched theory. He dropped the cosmological term like a hot potato. In his autobiography, the physicist George Gamow recounted an alleged conversation with Einstein in which Einstein described the cosmological constant as the “greatest blunder” of his life. Whether true or not, this quotation was widely repeated in popular physics articles, textbooks, informal conversations by physicists, and so forth until the late 1990′s when observations by the Hubble Space Telescope that were apparently inconsistent with the prevailing Big Bang theory of cosmology were made. However, one could save the Big Bang theory by reintroducing the cosmological term with a non-zero cosmological constant, not strong enough to prevent the expansion of the universe but sufficient to cause an acceleration that would resolve the otherwise falsifying observations. The non-zero cosmological constant was attributed to a mysterious “dark energy” filling the universe, perhaps due to an as yet undiscovered subatomic particle/field predicted by a unified field theory or theory of everything (TOE). The predicted expansion of the universe in the General Theory of Relativity became a theory of the origin and evolution of the universe known as the Big Bang theory. In general, the different forms of the Big Bang theory envision the universe beginning as a point, a “singularity” in the General Theory of Relativity, and expanding, exploding into the universe that we see today. The Big Bang is thought to have happened 14-25 billion years ago. The exact time has varied over the last century with different observations and theoretical calculations. Up until the Hubble Space Telescope observations, the story of Einstein’s “greatest blunder” was widely recounted as a morality tale of the superiority of rational scientific thought over fluffy philosophy or blind prejudice. If only Einstein had stuck to his original “rational” theory rather than being influenced by fluffy (not to mention wimpy) unscientific philosophical or sociological concerns (agreeing with everybody else), all would have been well. Thus, the “greatest blunder” became a tale about the primacy of science over other “ways of knowing,” to use an appropriately fluffy New Age cliche. The Big Bang theory and modern cosmology has a curious history. There have been repeated observations that appear to falsify the theory or major components of the theory such as the theory of gravitation, whether Einstein’s or Newton’s theory of gravity. This has led to the introduction of several new concepts and components expressed in symbolic mathematics like the cosmological term. These include “inflation” to account for the puzzling uniformity of the universe in certain measurements (for example, the cosmological microwave background radiation is extremely smooth which is difficult to explain in many versions of the Big Bang theory) and several types of as yet undetected “dark matter” to account for gross anomalies in the measured angular momentum distribution of galaxies, clusters of galaxies, and so forth. There is an entire industry seeking to detect the mysterious particles that comprise the as yet hypothetical “dark matter” and “dark energy.” The saga of the cosmological term and the other patches to the General Theory of Relativity and the Big Bang theory illustrates a deep mathematical problem that has bedeviled science and human society at least since the ancient Greeks (and probably Babylonians or Sumerians as well) constructed detailed mathematical models of the motion of the planets. It is a problem that appears both in symbolic mathematics, in the effort to construct predictive symbolic mathematical theories of the world, as well as in conceptual, verbal reasoning and discourse. It seems likely that the human mind, unlike present-day symbolic mathematics or computer programs, has a limited, imperfect ability to resolve this problem. What is the problem? Given a set of observations — the positions and motions of the planets, the waveform of speech, the time and intensity of earthquakes, the values of stock prices, blood pressure, any quantitative measurement — it is possible to choose many different sets of building block functions that can be combined (added, multiplied, etc.) to match (or “fit”) the observations as accurately as desired (simply add more building block functions). One can construct not one, but many, in fact an infinite number, of “Frankenstein Functions” that match the data. While these sets of building block functions can be chosen to match the observational data, the “training” set in artificial intelligence terminology, they often will fail to predict new observations. It is often necessary to add new building block functions to save or patch the theory as new observations are made. However, if the building block functions share some characteristics in common with the unknown “true” theory or mathematics, then the theory may give somewhat correct predictions but still be wrong or incomplete. At a conceptual, verbal level, it usually proves possible to devise a plausible, technically sophisticated concept to explain away the apparently falsifying observations and to justify the “patch” expressed in purely symbolic mathematical terms. For example, unified field theories or theories of everything (TOE) usually predict new particles that are otherwise unknown. These new, unknown particles might in turn provide the dark matter or dark energy needed to explain the contrary observations. Certain commonalities among known particles and forces suggest an underlying unity. The ancient Greeks constructed (or inherited from the even older Babylonian civilization) a mathematical theory of the universe, basically our modern solar system, in which the planets orbited the Earth, a sphere about eight-thousand miles in diameter. However, from the very beginning, this theory had a serious problem. Certain planets, notably Mars, actually backed up during their journey through the Zodiac. This was grossly inconsistent with a simple orbit. Hence, the Greeks (and possibly the Babylonians before them) introduced the now infamous epicycles. The planets, envisioned as the Gods themselves, executed a complex dance in which they performed a circular motion around the simple circular orbit around the Earth. This could produce a period when the outer planets — Mars, Jupiter, and Saturn — would appear to back up in the Zodiac, exactly as observed. Yet, the theory never quite worked. Over the centuries and ultimately almost two millenia, astronomers and astrologers (mostly the same thing) added more and more epicycles to create the “Ptolemaic” theory that existed at the time of Copernicus, Galileo, and Kepler. The Ptolemaic theory had hundreds of epicycles, epicycles on top of epicycles on top of epicycles. It was very complex and required extensive time and effort to make predictions using the pen, paper, and printed mathematical tables of the time. There were no computers. It could predict the motion of Mars to around one percent accuracy. This was actually much better than the original heliocentric theory proposed by Copernicus. In fact, Copernicus also used epicycles. A hard headed comparison of the geocentric and heliocentric theories based solely on quantitative goodness of fit measures would have selected the traditional geocentric Ptolemaic theory. It was not until at least 1609 when Kepler published his discovery of the elliptical orbits and possibly even later (Kepler made mistakes) that the heliocentric theories clearly outperformed the Ptolemaic theory. The orbits of the planets around the Sun are almost periodic. The motion of the planets as seen from Earth is quasi-periodic. Thus, if one uses periodic functions such as the uniform circular motion of the Ptolemaic epicycles, one can reproduce much of the observed motion of the planets. The Ptolemaic models had some predictive power. The lesson of Copernicus, Galileo, and Kepler as well as subsequent successes in science seemed to be to prefer “simple” theories with few building block functions, few terms in the mathematical expressions, and so forth. This led Einstein to select his original General Theory of Relativity as the simplest or one of the simplest sets of differential equations consistent with the Special Theory of Relativity as well as known observations (the theory had to largely reproduce Newtons’ theory of gravitation). For many years, popular science and popular physics accounts such as the “greatest blunder” stories embraced this preference for “simplicity” under the banner of Occam’s Razor as the obviously scientific, rational way to do things. It is actually difficult to justify this preference. Today, however, the popular science orthodoxy has changed as otherwise falsifying observations have accumulated. For example,
Incidentally, Professor Weinberg’s article has the ironic subtitle “Science sets itself apart from other paths to truth by recognizing that even the greatest practitioners sometimes err.” This is probably a veiled jab at traditional religion. It is probably doubly ironic in that some forms of traditional religion clearly recognize the fallibility of their prophets. For example, in his letters to his fellow feuding Christians of the first century, the Apostle Paul makes a clear distinction between his personal opinions, which he considers fallible, and divine revelation. With respect to Einstein’s aesthetic judgment, essentially any continuous function can be approximated to arbitrary accuracy by a polynomial of sufficiently high degree — or indeed any of any infinite number of compositions of arbitrarily chosen building block functions. A polynomial is the sum of powers of . For example, or In general, What does this mean? Let consider, for example, an arbitrary function such as the trigonometric sine function . Here is the sine function plotted by Octave, a free Matlab compatible numerical programming environment: x = (0.0 : 0.1 : 20*pi)'; y = sin(x); plot(x,y,'-'); Octave, like many similar tools such as Mathematica, has a built in function, polyfit in this case, to fit a polynomial to data: function [] = plot_sinfit(x,y, n, m) % plot_sinfit(x, y, n) fits polynomial of degree n to data (x,y) in range 0.0 to 2*pi % if nargin > 3 span = m; else span = 3; end myx = x(1:63); myy = y(1:63); p = polyfit(x,y,n) f = polyval(p,x); x = (0: 0.1: span*2*pi)'; y = sin(x); f = polyval(p,x); plot(x,y,'o',x,f,'-') axis([0 span*2*pi -1 1]) end A sixth degree polynomial is fitted to the data in the range 0.0 to 6.28 (the “training” set), but the fitted function is displayed in the range 0.0 to 18.42 . This gives: One can see the agreement with six terms is poor. However, one can always add more terms: The agreement is somewhat better but still poor. One can still add more terms: Now the fit is getting better, but there is still room for improvement. Both examination by eye and a rigorous goodness of fit test would show the mathematical model and the observational data disagree. One can still add more terms: The agreement is even better, although not perfect. One can see the disagreement by looking at a larger range of data (recall the model is fitted to the range 0.0 to 6.28 only): As one moves farther away from the region used for the fit (0.0 to 6.28), the training set in the language of artificial intelligence, the agreement will generally worsen. However, one can make the agreement as good as one wants by adding more and more terms, more and more powers of . It is important to realize that a sequence of powers of can never really work. It will never predict the long term behavior of the data. The data in this illustrative example is periodic. In contrast, powers of grow without bound. Eventually, as grows without bound, the largest power of will dominate the mathematical model of the data and the model will blow up, growing without bound, and failing at some point to agree with the observations, the data. If one used a mathematical model constructed from periodic functions, other than the sine, one could patch together a “Frankenstein Function” that would have some predictive power and share some of the gross characteristics of the actual data. This is what happened with the Ptolemaic epicycles centuries ago. In fact, one can construct a Frankenstein Function out of randomly chosen functions, Gaussians, polynomials, trig functions, pieces of other functions, and so forth that will agree with observational data to any desired level of agreement. Many techniques in pattern recognition and artificial intelligence, such as the Hidden Markov Model (HMM) speech recognition engines and artificial neural networks, are attempts to construct extremely complex mathematical models composed of, in some cases, hundreds of thousands of building block functions, to replicate the ability of human beings to classify sounds or images or other types of data. In these attempts, many of the same problems that have occurred in mathematical models such as the Ptolemaic model of the solar system have recurred. In particular, it has been found that neural networks and similar models can often exactly agree with a training set. In fact, this seeming agreement is often bad. The training or fitting process is often intentionally stopped before completion because while the mathematical model of classification will agree with the training set, it will often fail to classify new data such as a so-called “validation” data set. Even extremely complex models such as those used in speech recognition today continue to fail to reach the human level of performance, possibly for many of the same reasons the epicycles of the Ptolemaic theory failed. Falsifiability and Occam’s Razor These difficulties with mathematical modeling lead one directly to two pillars of popular and sometimes scholarly science: the doctrine of falsifiability, usually attributed to the philosopher of science Karl Popper, and Occam’s Razor. The doctrine of falsifiability holds that science proceeds by the falsification of theories by new evidence. In mathematical terms, one can compare the mathematical theory’s predictions with experimental data, apply a goodness of fit test, and conclusively rule out the theory. This is supposed to differentiate science decisively from fluffy philosophical, religious, mystical, and political “knowledge.” Science may not be able to tell us what is true, but it can tell us conclusively what is false. The doctrine of falsifiability is often touted in discussions of so-called pseudoscience, especially in the context of debates about the theory of evolution and “creation science,” or, more recently, so-called “intelligent design.” Usually, the argument is that falsifiability allows us to distinguish between true science which should be taught in schools and generally accepted and dubious pseudoscientific “knowledge.” The problem with this, as the saga of the cosmological constant now clearly shows, is that theories can be patched and are frequently patched, sometimes obviously and sometimes more subtly. Adding more and more terms to a polynomial approximation is pretty obvious. The epicycles in the Ptolemaic theories were pretty obvious. Actually, add-ons such as the cosmological constant are pretty obvious. On the other hand, quantum field theory and the various unified field theories/theories of everything such as superstrings are so complex and have such a long learning curve for most people that it is often difficult to evaluate what might be a patch (e.g. the concept of renormalization) and what might not be a “patch.” At this point, Occam’s Razor is usually invoked. William of Occam was an English Franciscan friar and scholastic philosopher who lived from about 1288 to about 1348. He was involved in a range of theological and political conflicts during which he formulated his so-called Razor, quite possibly for political and theological reasons quite alien to the modern use (or misuse) of Occam’s Razor. In its modern form, Occam’s Razor is usually expressed as a need not to make ad hoc assumptions or to keep a theory as simple as possible while still agreeing with observations. Of course, it is hard to define an ad hoc assumption or simplicity in practice. In disputes about evolution and creation, Occam’s Razor is often used to attack creationist explanations of the radioactive (and other) dating of the Earth and fossils to million or billions of years of age. This evidence of the great age of the Earth is by far the most difficult observational evidence for creationists to explain. In his criticism of the teaching of evolution, William Jennings Bryan, who was not a fundamentalist (biblical literalist) as many believe, simply accepted the great age of the Earth as did many religious leaders of his time. Here is Steven Weinberg again on the new revised Occam’s Razor:
Of course, in what might be called the strong AI theory of symbolic mathematics, the symbols in the equations must correspond either directly or in some indirect but precise, rigorous way to “principles.” We just don’t understand the correspondence yet. Unless the strong AI theory of symbolic math is wrong and some concepts cannot be expressed in symbolic mathematical form. Where does this leave us? We know rigorously that it is possible to construct many arbitrarily complex functions or differential equations that can be essentially forced to fit current observational data. How do we choose which ones are likely to be true? Human beings seem ultimately to apply some sort of judgment or intuition. Often they are wrong, but still they are right much more often than random chance would suggest. Historically, it seems that simplicity at both the level of verbal concepts and at the level of precise symbolic mathematics has been a good criterion. We can’t really justify this experience “scientifically,” at least as yet. Frankenstein Functions in the Computer Age With modern computers, mathematical tools such as Mathematica, Matlab, Octave, and so forth, and modern mathematics with its myriad special functions, differential equations, and other exotica, it is now possible to construct Frankenstein Functions on a scale that dwarfs the Ptolemaic epicycles. Artificial intelligence techniques such as Hidden Markov Model based speech recognition, genetic programming, artificial neural networks and other methods in fact explicitly or implicitly incorporate mathematical models with, in some cases, hundreds of thousands of tunable parameters. These models can match training sets of data exactly and yet they fail significantly, sometimes totally when confronted with new data. In fundamental physics, the theories have grown increasingly complex. Even the full Lagrangian for the reigning standard model of particle physics (for which Steven Weinberg shared the Nobel Prize in 1979) is quite complex and features such still unobserved ad hoc entities as the Higgs particle. Attempts at grand unified theories or theories of everything are generally more complex and elaborate. The reigning Big Bang theory has grown increasingly baroque with the introduction of inflation, numerous types of dark matter, and now dark energy — the rebirth of the cosmological constant. Computers, mathematical software, advanced modern mathematics, and legions of graduate students and post doctoral research associates all combine to make it possible to construct extremely elaborate models far beyond the capacity of the Renaissance astronomers. The very complexity and long learning curve of the present day models may become a status symbol and protect the theories from meaningful criticism. Conclusion Aesop’s Fables include the humorous tale of The Astronomer who spends all his time gazing up at the heavens in deep contemplation. He is so mesmerized by his star gazing that he falls into a well at his feet. The moral of the tale is “My good man, while you are trying to pry into the mysteries of heaven, you overlook the common objects that are at your feet.” This may have a double meaning in the current age of Frankenstein Functions. On the one hand, scientists and engineers may well have become enamored of extremely complex models and forgotten the lesson of past experience that extreme complexity is often a warning sign of deep problems, the lesson that Einstein initially heeded. It also raises the question of whether ordinary people, business leaders, policy makers, and others in the “real world” need be concerned about these complex mathematical models, usually implemented in computer software, and the difficulties associated with them. Extremely complex mathematical models, some apparently successful, some probably less so, are increasingly a part of life. Complex models incorporating General Relativity are used by the Global Positioning System to provide precise navigation information — to guide everything from hikers to ships to deadly missiles. Widely quoted economic figures such as the unemployment rate and inflation rate are actually the product of increasingly complex mathematical models that bear a less than clear relationship to common sense definitions of unemployment and inflation. Is a “discouraged worker” really not unemployed? Are the models that extrapolate from the household surveys to the nationally reported “unemployment rate” really correct? What should one make of hedonic corrections to the inflation rate in which alleged improvements in quality are used to adjust the price of an item downward? Should the price of houses used in the consumer price index (CPI) be the actual price of purchase or the mysterious “owner equivalent rent.” What is the average person to make of the complex computer models said to demonstrate global warming beyond a reasonable doubt? Should we limit the production and consumption of coal, oil, or natural gas based on these models? How do oil companies and governments like Saudi Arabia calculate the “proven reserves” of oil that they report each year? Are we experiencing “Peak Oil” as some claim or is there more oil than commonly reported? In the lead up to the present financial crisis and recession, the handful of economists and financial practitioners (Dean Baker, Nouriel Roubini, Robert Shiller, Paul Krugman, Peter Schiff, and some others) who clearly recognized and anticipated the housing bubble and associated problems used very simple, back of the envelope calculations and arguments to detect the bubble. Notably, housing prices in regions with significant zoning restrictions on home construction rose far ahead of inflation, something rarely seen in the past and then usually during previous housing bubbles. Home prices in regions with significant zoning restrictions became much higher than would be expected based on apartment rental rates in the same areas. In other words, it was much cheaper to rent than to own a home of the same size, something with little historical precedent. In contrast, the large financial firms peddling mortgage backed securities used extremely complex mathematical models, not infrequently cooked up by former physicists and other scientists, that proved grossly inaccurate. It is likely that simplicity and Occam’s Razor as commonly understood have some truth in them, even though we do not truly understand why this is the case. They are not perfect. Sometimes the complex theory or the ad hoc assumption wins. Nonetheless, Frankenstein Functions and extreme complexity, both in principles (verbal concepts) and precise symbolic mathematics should be viewed as a warning sign of trouble. By this criterion, the Big Bang theory, General Relativity, and quantum field theory may all be in need of significant revision. Suggested Reading/References “Einstein’s Mistakes (PDF),” Steven Weinberg, Physics Today, November 2005, pp. 31-35 Aesop’s Fables, Selected and Adapted by Jack Zipes, Penguin Books, New York, 1992 George Gamow, My World Line — An Informal Autobiography, Viking Press, New York, 1970 About the Author John F. McGowan, Ph.D. is a software developer, research scientist, and consultant. He works primarily in the area of complex algorithms that embody advanced mathematical and logical concepts, including speech recognition and video compression technologies. He has extensive experience developing software in C, C++, Visual Basic, Mathematica, MATLAB, and many other programming languages. He is probably best known for his AVI Overview, an Internet FAQ (Frequently Asked Questions) on the Microsoft AVI (Audio Video Interleave) file format. He has worked as a contractor at NASA Ames Research Center involved in the research and development of image and video processing algorithms and technology. He has published articles on the origin and evolution of life, the exploration of Mars (anticipating the discovery of methane on Mars), and cheap access to space. He has a Ph.D. in physics from the University of Illinois at Urbana-Champaign and a B.S. in physics from the California Institute of Technology (Caltech). He can be reached at jmcgowan11@earthlink.net. Sponsor’s message: Receive free weekly updates about new math books. Don’t miss great new titles in the genres you love (such as Mathematics, Science, Programming, and Sci-Fi): http://anynewbooks.com |
||
Copyright (C) 2010 Math Blog All rights reserved. |