The Empire of Chance: How probability changed science & everyday life

Daniel McNichol
Towards Data Science
32 min readSep 24, 2018

--

Solo Book Club vol. 1 - skimmable notes

Fortuna & her wheel

*** There are a lot of words here, you don’t need to read them all. This short preface explains what follows ***

Preface

I’ve been reading the excellent, seminal history of probability, The Empire of Chance. Figured I’d make some public notes & comments, as is my wont.

By the authors’ own telling, the book is roughly divisible into 4 sections:

  1. We begin with two historical chapters that describe the origins and development of probability and statistics from the mid-seventeenth to the end of the nineteenth century. Here we introduce changing interpretations of the probability calculus, changing attitudes towards determinism, changing conceptions of averages and errors – all, again, in the context of changing applications.
  2. In each of the subsequent four chapters, we focus on one area of broad application: experimental methodology, biology, physics, and psychology.
  3. With chapter 7, we leave the sciences to assess the impact of probability and statistics on daily life, from weather reports to mammography.
  4. Finally, we survey, from something like the victorious general’s hilltop, the territory we have covered.

This first Solo Book Club installment focuses on the first ~half of the first section (Introduction + Chapter 1), which are mostly historical & conceptual accounts. Subsequent installments will treat the following chapters.

I’ve curated excerpts & added some limited commentary. I reshaped the text for concision & digestibility, so if any of this strikes your fancy, I highly recommend buying & reading the original!

Full italic passages & “quotes” are excerpts, my commentary is unformatted.

There’s a lot, so I try to bold essentials in long passages to facilitate skimming.

I recommend reading my (short) recap of the Intro, then skimming the Chapter 1 subsection headings (1.1, 1.2, etc) & my one-line summaries of them, then let your interest (or lack thereof) guide you.

Introduction

Fortuna, chance & science thru history

Fortuna (left) and Sapientia (right) are depicted here in traditional opposition. The slow breakdown of this opposition is the topic of this book. Source: Petrarch, Remède de l’un et l’autre fortune prospère et adverse (Paris, 1524); courtesy of the Bibliotheque Nationale, Paris. [caption reproduced from the book]

The book opens with a historical-philosophical allusion that typifies the scholarship to come: poignant, nuanced & erudite.

Fortuna, the fickle, wheel-toting goddess of chance, has never been a favorite of philosophy and the sciences. In that touchstone of medieval learning, Boethius’ Consolations of Philosophy, sober Dame Philosophy warns that only when Fortuna “shows herself unstable and changeable, is she truthful,” and preaches against the very existence of chance, conceived as “an event produced by random motion and without any sequence of causes.”

science was about causes, not chance.

Already, historical complexities confound my current notions of chance & causality. Probabilistic analysis & understanding is central to scientific practice as I conceive of it, yet here they are opposed. Science is equated with a “sequence of causes”, & this sequence is presumed to be deterministic. Yet Fortuna is also scorned as “unstable & changeable”, ostensibly an attack on chance & indeterminism. But attacking Fortuna as “changeable” only makes sense if Fortuna is conventionally conceived as a ‘fated’ predeterminism, which seems the opposite of chance, & which the Positivistic ‘science’ of the time ostensibly exalts.

Confuzzled.

But these historical-logical knots are unraveled in the passages to come, & ultimately exposited over the next few chapters.

Yet even as Bernard sought to banish chance and indeterminism from physiology, Fortuna already ruled a large and growing empire in the sciences. The laws of the realm were probability theory and statistics. By “taming chance,” in Ian Hacking’s evocative phrase, probability and statistics had reconciled Scientia to her arch-rival Fortuna.

From its beginnings in the mid-seventeenth century, probability theory spread in the eighteenth century from:

  • gambling problems
  • to jurisprudence,
  • data analysis,
  • inductive inference,
  • and insurance,
  • and from there to sociology,
  • to physics, to biology, and to psychology in the nineteenth,
  • and on to agronomy,
  • polling,
  • medical testing,
  • baseball,
  • and innumerable other practical (and not so practical) matters in the twentieth.

This succinctly sketches the narrative arc underlying the development & domains of probability theory, treated in depth in chapters to come. But the term ‘probability theory’ itself is somewhat problematized, as a major thesis of the book emphasizes repeatedly:

For much of its history, probability theory was its applications.

This means that probability theory was as much modified by its conquests as the disciplines it invaded.

  • When, for example, probability became a tool for evaluating compilations of numbers about births, deaths, crimes, barometric fluctuations, dead letters, and other kinds of statistics, the very meaning of probability changed, from a degree of certainty in the mind to a ratio of events counted in the world (see 2.2). [my note: from Bayesian or ‘inverse probability’ to frequentism]
  • When the British polymath Francis Galton invented a way of measuring how much offspring peas deviated from their parent stock, he launched the analysis of correlations (see 2.5; 4.4).
  • Factor analysis has its roots in educational psychology, analysis of variance in eugenics and agronomy, and so on. It was in fact the rule for probabilistic ideas and techniques to originate in highly specific contexts, and to advance on the strength of vivid analogies.
  • The normal or bell-shaped curve at first represented the probability of observational error in astronomy, then of nature’s “errors” from l’homme moyen in sociology, then of anarchic individual gas molecules exhibiting orderly collective properties (see 2.5; 5.6).
  • Eventually the normal curve came to represent the distribution of almost everything, from intelligence quotients to agricultural yields, and shed the particular interpretations derived from its early applications (see 8.1). But for almost a century such concrete analogies were the bridges over which it and other probabilistic notions passed from one discipline to another.

This book is about the applications of probability and statistics to science and life, where “application” is understood in this special sense: the mathematical tool shaped, but was also shaped by, its objects. The mathematical development of probability and statistics has been admirably treated in the work of such scholars as Isaac Todhunter, L. A. Maistrov, O. B. Sheynin, Stephen Stigler, and Ivo Schneider. Our primary concern, however, is not theirs. We analyze how probability and statistics transformed our ideas of nature, mind, and society.

This adequately sets the stage for the historical accounts to come: genealogical rather than metaphysical, more conceptual than technical.

Ch 1: Classical probabilities, 1660–1840

God . . . has afforded us only the twilight of probability; suitable, I presume, to that state of mediocrity and probationership he has been pleased to place us in here. . .
— John Locke (1690) [quoted in original]

1.1 Introduction

Ch 1 opens with the canonical genesis story of modern probability theory:

In July of 1654 Blaise Pascal wrote to Pierre Fermat about a gambling problem which came to be known as the Problem of Points: Two players are interrupted in the midst of a game of chance, with the score uneven at that point. How should the stake be divided? The ensuing correspondence between the two French mathematicians counts as the founding document in mathematical probability, even though it was not the first attempt to treat games of chance mathematically.

This event has become probability’s patient zero to such an extent that an explicit study of “Evidence & Probability before Pascal” had to be published to fill in the antecedent gaps.

The Empire of Chance authors emphasize how this, & Pascal’s later famous wager, “one mathematical and the other philosophical, reveal the double root of the mathematical theory of probability”:

It emerged at the crux of two important intellectual movements of the seventeenth century: a new pragmatic rationality that abandoned traditional ideals of certainty; and a sustained and remarkably fruitful attempt to apply mathematics to new domains of experience. Neither would have been sufficient without the other. Philosophical notions about what happens only most of the time, and about the varying degrees of certainty connected with this unreliable experience date from antiquity, as do games of chance. But before circa 1650, no one attempted to quantify any of these senses of probability. Nor would the spirit of mathematical enterprise have alone sufficed, for quantification requires a subject matter, an interpretation to flesh out the mathematical formalism. This was particularly true for the calculus of probabilities, which until this century had no mathematical existence independent of its applications.

Two motifs for this chapter emerge:

  • The intersection of philosophy & math in the particular milieu of Enlightenment Europe
  • The inseparability of probability calculus from its practical applications for its first several centuries of existence

1.2 The Beginnings

Photo by Sushobhan Badhai on Unsplash

In this subsection: a brief history of the antecedents of probability theory.

The prehistory of mathematical probability has attracted considerable scholarly attention, perhaps because it seems so long overdue. Chance is our constant companion, and the mathematics of the earliest formulations of probability theory was elementary. Suggestive fragments of probabilistic thinking do turn up almost everywhere in the classical and medieval learned corpus:

  • Around 85 B.C., Cicero connected that which usually happens with what is ordinarily believed in his rhetorical writings and called both probabile.
  • In a tenth-century manuscript, a monk enumerated all 36 possibilities for the toss of two dice
  • Talmudists reasoned probabilistically about inheritances and paternity.

Yet none of these flowered into a mathematics of probability.

There’s some detailed exploration (& summary rejection) of various hypotheses to account for this delayed development, but none are ultimately affirmed.

What is found, is that, in keeping with the primacy of application over theory, early probabilists cast questions more in terms of expectation or expected value than the underlying probabilities themselves:

If we return to the two Pascal musings, we discover that although they are recognizably part of what came to be called the calculus of probabilities, they are not cast in terms of probabilities. The fundamental concept was instead expectation, later defined as the product of the probability of an event e and its outcome value V:

P(e)V = E

So, for example, the expectation of someone holding one out of a thousand tickets for a fair lottery with a prize of $10,000 would be $10. As the definition implies, we now derive expectation from the probability, but for the early probabilists expectation was the prior and irreducible notion.

Expectation in turn was understood in terms of a fair exchange or contract.

…These intuitions drew upon a category of legal agreement that had become increasingly important in sixteenth- and seventeenth-century commercial law, the aleatory contract. Jurists defined such agreements as the exchange of a present and certain value for a future, uncertain one — staking a gamble, purchasing an annuity, taking out an insurance policy, bidding on next year’s wheat crop, or buying the next cast of a fisherman’s net.

“Aleatory contracts” mark the intersection of gambling & juridical reasoning, two pillars in the philosophical foundations of probability. They also provide one of the most striking historical asides in this portion of the book:

Aleatory contracts acquired prominence and a certain notoriety as the preferred way of exonerating merchants who made loans with interest from charges of usury. The element of risk, argued the canon lawyers, was the moral equivalent of labor, and therefore earned the merchant his interest as honestly as the sweat of his brow would have.

This recalls the recent work of a contemporary self-professed probability researcher (& self-evident psychopath), which I haven’t yet read.

Moving on, the authors describe how probability theory arose to negotiate a historical conflict between the certainty of faith & the bottomless doubt of extreme skepticism:

  • Pascal’s wager is an example of how reasoning by expectations had become almost synonymous with a new brand of rationality by the mid-seventeenth century.
  • In the sixteenth century, Reformation controversies between Protestants and Catholics on the one hand, and the revival of the sceptical philosophy of Sextus Empiricus and his school on the other, combined to undermine the ideal of certain knowledge that had guided intellectual inquiry since Aristotle.
  • In its place gradually emerged a more modest doctrine that accepted the inevitability of less than certain knowledge, but maintained nonetheless that it was still sufficient to guide the reasonable person in precept and in practice.
  • Aristotle’s dictum from the Nicomachean Ethics was much quoted: “it is the mark of an educated person to look for precision in each class of things just so far as the nature of the subject admits: it is evidently equally foolish to accept probable reasoning from a mathematician and to demand from a rhetorician demonstrative proofs.” [my note: lol]
  • The ultimate result of the Reformation and Counter-Reformation clashes over the fundamental principles of faith and their justification, and of the radical scepticism of Michel de Montaigne and other sixteenth-century thinkers was vastly to erode the domain of the demonstrative proof and to expand that of probable reasoning. Their immediate impact was more devastating, challenging all claims to any kind of knowledge whatsoever.

Thus all of the traditional sources of certainty, religious and philosophical, came simultaneously under attack. Confronted with a choice between fideist dogmatism on the one hand and the most corrosive scepticism on the other, an increasing number of seventeenth-century writers attempted to carve out an intermediate position that abandoned all hope of certainty except in mathematics and perhaps metaphysics, and yet still insisted that people could attain probable knowledge. Or rather, they insisted that probable knowledge was indeed knowledge.

The Empire of Chance was growing, & its foundation in practical applications of daily life only deepened:

In order to make their case for the respectability of the merely probable, these “mitigated sceptics” turned from rarified philosophical discourse to the conduct of daily life. The new criterion for rational belief was no longer a watertight demonstration, but rather that degree of conviction sufficient to impel a prudent person of affairs to action.

The emphasis upon action as the basis of belief, rather than the reverse, was key to the defense against scepticism, for as these writers were wont acidly to observe, even the most confirmed sceptic took their meals just as if the external world existed.

1.3 The Classical Interpretation

Photo by Dan Freeman on Unsplash

This subsection recounts the evolution of the so-called ‘classical’ interpretation of probability at the moment of its formal inception.

Thus the calculus of chance was in the first instance a calculus of expectations, and thereby an attempt to quantify the new, more modest doctrine of rationality that surfaces almost everywhere in seventeenth-century learned discourse.

The first published works on the subject, from Huygens’ little treatise of 1657 to Jakob Bernoulli’s definitive Ars conjectandi of 1713, covered a range of topics that cohere only against this background. Aleatory contracts like gambling (Huygens, Pierre de Montmort, Jakob Bernoulli) and annuities (Johann De Witt, Halley, Nicholas Bernoulli), and later evidentiary problems like the evaluation of historical or courtroom testimony (John Craig, George Hooper, Nicholas and Jakob Bernoulli) constituted the domain of applications for the new theory.

By the end of this period, probability had emerged as a distinct and primitive concept, although most of the applications continued to revolve around questions of expectation for some time thereafter.

Just what these probabilities measured was ambiguous from the outset, and remains a matter of controversy to this day. [allusion to Bayesians vs frequentists, etc]

Originally the word “probability” had meant an opinion warranted by authority; hence the Jesuit doctrine of probabilism, which casuists wielded to absolve almost every transgression on the grounds that one theologian or another had taken a mild view of the matter. However, the mitigated scepticism of the early seventeenth century modified even this qualitative sense of probability.

The proponents of reasonableness spoke not of certainty but of certainties, ranging from the highest grade of “mathematical” certainty attained by demonstration, through the “physical” certainty of sensory evidence, down to the “moral” certainty based on testimony and conjecture. The precise descriptions of these levels varied slightly from author to author, but the notion of such an ordered scale, and the emphasis that most things admit only of moral certainty, remained a staple of the literature from Hugo Grotius’ De veritate religionis christianae (1624) to John Locke’s Essay Concerning Human Understanding (1690) and thereafter. When Bishop Joseph Butler claimed in 1736 that “probabilities are the very guide of life,” he was by then repeating a cliché.

In the context of these discussions, the very meaning of the word “probability” changed from its medieval sense of any opinion warranted by authority to a degree of assent proportioned to the evidence at hand, both of things and of testimony.

The authors describe a gradual progression from qualitative to quantitative conceptions of probability:

These probabilities were qualitatively conceived, and owed much to the language and practice of legal evidence, as the numerous courtroom examples and analogies make clear.

However, mathematicians like Gottfried Wilhelm Leibniz and Jakob Bernoulli seized upon the new “analysis of hazards” as a means of quantifying these degrees of certainty, and in so doing, converting the three ordered points into a full continuum, ranging from total disbelief or doubt to greatest certainty.

Indeed, Leibniz described the fledgling calculus of probabilities as a mathematical translation of the legal reasoning that carefully proportioned degrees of assurance on the part of the judge to the kinds of evidence submitted . The fact that these legal probabilities were sometimes expressed in terms of fractions to create a kind of “arithmetic of proof” (for example, the testimony of a relative of the accused might count only ⅓ as much as that of an unimpeachable witness) may have made them seem mathematically tractable.

The mathematicians who set about trying to measure these probabilities in some non-arbitrary fashion came up with at least three methods:
- equal possibilities based on physical symmetry;
- observed frequencies of events;
- and degrees of subjective certainty or belief.

- The first was well suited to gambling devices like coins or dice but little else;
- the second depended on the collection of statistics and assumptions of long-term stability;
- the third echoed the legal practice of proportioning degrees of certainty to evidence.

Interesting that the authors (& presumably the original probabilists) acknowledge the severely limited generalization of the probability of coin flips, dice rolls & pulls from an urn, yet these are still the primary pedagogical devices & analogies used to convey probabilities. (Something a certain aforementioned psychopath has called the “Ludic Fallacy”)

The various senses emerged from different contexts, and suggested different applications for the mathematical theory.

  • Sets of equiprobable outcomes based on physical symmetry derived from gambling and were applied to gambling — very few other situations satisfy these conditions in an obvious way.
  • Statistical frequencies originally came from mortality and natality data gathered by parishes and cities from the sixteenth century onwards. In 1662 the English tradesman John Graunt used the London bills of mortality to approximate a mortality table by assuming that roughly the same fraction of the population died each decade after the age of six.
  • Eighteenth-century authors collected more detailed demographic data and enlisted probability theory in order to compute the price of annuities, and later life insurance, and to argue for divine providence in human affairs.
  • The epistemic sense of belief proportioned to evidence arose from legal theories about just how much and what kind of evidence was required to produce what degree of conviction in the mind of the judge, and inspired applications to the probabilities of testimony, both courtroom and historical, and of judgment.

The authors go on to describe a fluid transposition between objective & subjective modes among classical probabilists, as well as a growing creep of frequentism [cue ominous soundtrack]:

Latter-day probabilists view these three answers to the question, “What do probabilities measure?” as quite distinct, and much ink has been spilt arguing their relative merits and compatibility (Nagel, 1955). In particular, a bold line is now drawn between the first two “objective” meanings of probability, which correspond to states of the world, and the third “subjective” sense, which corresponds to states of mind.

Yet classical probabilists used “probability” to mean all three senses, shifting from one to another with an insouciance that bewilders their more nice-minded successors. Why were classical probabilists able to conflate these different notions of probability so easily, and often very fruitfully? In part, because the objective and subjective senses were not then separated by the chasm that yawns between them in current philosophy.

In marches frequentism:

Legal theorists of the sixteenth and seventeenth centuries found it plausible to assume that conviction formed in the mind of the judge in proportion to the weight of the evidence presented, and Locke repeated the assumption in a more general context, invoking the qualitative probabilities of evidence: the rational mind assents to a claim “proportionably to the preponderancy of the greater grounds of probability on one side or the other”.

John Locke, David Hartley, and David Hume created and refined a theory of the association of ideas that made the mind a kind of counting machine that automatically tallied frequencies of past events and scaled degrees of belief in their recurrence accordingly.

Hartley went so far as to provide a physiological mechanism for this mental record-keeping: each repeated sensation set up a cerebral vibration that etched an ever deeper groove in the brain, corresponding to an ever stronger belief that things would be as they had been. [seems legit]

Since the mind irresistibly conferred belief in proportion to the vivacity of an idea, the more frequent the conjunction of events in past experience, the firmer the conviction that they would occur again. Locke and Hartley contended that this matching of belief to frequencies was rational.

All however concurred that the normal mind, when uncorrupted by upbringing or prejudice, irresistibly linked the subjective probabilities of belief with the objective probabilities of frequencies. They also showed an increasing tendency to reduce all forms of evidence whatsoever to frequencies, in contrast to the legal doctrines that had originally been the prototype of degrees of belief proportioned to evidence.

For the judge, the probative weight of eye-witness testimony that the accused had been seen fleeing the scene of the murder with unsheathed bloody sword derived from the quality of the evidence, not its quantity. It mattered not how many times in the past similar evidence had led to successful convictions. Locke remained very close to this legal tradition in his discussion of the kinds of evidence that create probabilities: number of witnesses, their skill and integrity, contradictory testimony, internal consistency, etc. He told the cautionary tale of the King of Siam, who dismissed the Dutch ambassador as a liar because his tales of ice-skating on frozen canals ran counter to the accumulated experience of generations of Siamese that water was always fluid. The King erred in trusting the mere quantity of his experience, without evaluating its breadth and variety. Yet Locke also made a place for “the frequency and constancy of experience” and for the number, as well as the credibility of testimonies.

We come finally to this absurd extremity, the logical conclusion of the nascent frequentist paradigm:

Later philosophical writings on probabilities narrowed the sense of evidence to the countable still further. Hume represents the endpoint of this evolution, in which evidence has become the sum of repeated, identical events. According to Hume, the mind not only counted; it was exquisitely sensitive to small differences in the totals: “When the chances or experiments on one side amount to ten thousand, and on the other to ten thousand and one, the judgment gives the preference to the latter, upon account of that superiority”. [!!!]

Finally, this growing tension between the “objective” frequency of events & “subjective” degrees of belief cleaved a lasting bifurcation in probability theory & its practitioners:

The guarantee that subjective belief was willy-nilly proportioned to objective frequencies and also, according to some authors, to physical symmetries allowed classical probabilists to slide from one sense of probability to another with little or no explicit justification. Only when associationist psychology shifted its emphasis to the illusion and distortions that prejudice and passion introduced into this mental reckoning of probabilities did the gap between subjective and objective probabilities become clear enough to demand a choice between the two. It was not so much the development and triumph of a thoroughgoing frequentist version of probability theory that marked the end of the classical interpretation, as the realization that a choice must be made between (at least) two distinct senses of probability. The range of problems to which the classical probabilists applied their theory shows that their understanding of probability embraced objective as well as subjective elements: statistical actuarial probabilities happily co-existed with epistemic probabilities of testimony in the work of Jakob Bernoulli or Laplace.

1.4 DETERMINISM

This subsection describes the strain of epistemological determinism that undergirded the subjective “degree of belief” school of classical probability.
It also “unknots” the “chance” paradox we encountered in the book’s Intro.

the writings of these two towering figures in the history of mathematical probability [Laplace & Bernoulli] also contained the manifestoes that, rightly or wrongly, led to the standard view of the classical interpretation as incorrigibly subjective.

Both maintained that probabilities measure human ignorance, not genuine chance; that God (or Laplace’s secularized super-intelligence) had no need of probabilities; that necessary causes, however hidden, governed all events. Therefore probabilities had to be states of mind rather than states of the world, the makeshift tools of intellects too feeble to penetrate immediately to the real nature of things.

Theirs was an epistemological determinism that maintained that all events were in principle predictable, and that probabilities were therefore relative to our knowledge. Bernoulli remarked that backward peoples still gambled on eclipses that European astronomers could now predict; some day gambling on coins and dice would seem equally primitive when the science of mechanics was perfected.

The very mathematicians who had carved out a place for chance in the natural and moral sciences insisted to a person that chance, in Abraham De Moivre’s words, “can neither be defined nor understood”. They did concede that certain statistical rates varied from year to year and from place to place, but they were confident enough in the underlying regularity of phenomena like mortality to simplify and adjust the unruly data accordingly. Variability, they believed, would prove just as illusory as chance when fully investigated.

In order to unknot the apparent paradox of the ardent determinism of the classical probabilists, we must look beyond probability theory to the panmathematical spirit of the period in which it emerged. Classical probability arose and flourished during a time of spectacular successes in fitting mathematics to whole new domains of experience, from rainbows to vibrating strings. Natural philosophers like Galileo assumed that if nature spoke the language of mathematics, this was because nature was fully determined, at least from God’s viewpoint: the glue that connected causes and effects must be as strong as that which connected premises and conclusions in a mathematical argument.

Determinism thus became a precondition for the mathematical description of nature.

This still bemuses me, though I grasp the historical conditions that fostered the mindset. It just doesn’t seem like an incredible leap of logic to imagine that our knowledge is imperfect AND there is true randomness in the world.

Anyway…moving on with these old dumb geniuses:

Classical probability theory arrived when luck was banished; it required a climate of determinism so thorough as to embrace even variable events as expressions of stable underlying probabilities, at least in the long run.

Determinism made a “geometry of chance” conceivable by anchoring variable events to constant probabilities, so that even fortuitous events met what were then the standards for applying mathematics to experience.

Those standards were not compatible with older notions of chance as real, or with what we might call genuine randomness in the world.

“Chance” and “fortune” had been part of the philosophical vocabulary since Aristotle, meaning variously coincidence (meeting someone who owes you money on the way to the market), absence of purpose (often identified with necessity, as in the “blind necessity” of Epicurean atoms), or an ample endowment of the “external” goods of good health, wealth, beauty, and children (Sorabji, 1980).

All of these meanings survived in ordinary usage, but only one played an important role in classical probability theory. This was the opposition of chance and purpose, particularly divine purpose, of which natural theologians and their probabilist allies like De Moivre made much.

for the classical probabilists “chance” and “luck” that stood outside the causal order were superstitions. If we could see the world as it really was, penetrating to the “hidden springs and principles” of things, we would discover only necessary causes. Probabilities were merely provisional, a figment of human ignorance and therefore subjective.

The classical interpretation of mathematical probability was thus characterized in precept by determinism and therefore by a subjective slant, and in practice by a fluid sense of probability that conflated subjective belief and objective frequencies with the help of associationist psychology.

As Laplace put it in a famous passage, mathematical probability was in essence “only good sense reduced to a calculus”. Its status was less that of a mathematical theory with applications than that of a mathematical model of a certain set of phenomena, like the part of celestial mechanics that described lunar motion. As such, it was held up to empirical test. If astronomical theory failed to predict lunar perturbations, so much the worse for the theory. When the results of classical probability theory did not square with the intuitions of reasonable people, it was the mathematicians who returned to the drawing board.

1.5 REASONABLENESS

Photo by Chris Tweten on Unsplash

This subsection details how classical probability, resolutely pragmatic, was measured against & ultimately subordinate to “reasonableness” & intuition.

The protracted controversy over the St. Petersburg problem was just such a clash between reasonableness and the dictates of probability theory, and illustrates how seriously mathematicians took their task of modeling “good sense.”

The problem was first proposed by Nicholas Bernoulli in a letter to Pierre de Montmort, and published in the second edition of the latter’s Essai d’analyse sur les jeux de hasard (1713). Pierre and Paul play a coin toss game with a fair coin. If the coin comes up heads on the first toss, Pierre agrees to pay Paul $1; if heads does not turn up until the second toss, Paul receives $2; if not until the third toss, $4, and so on. Reckoned according to the standard method, Paul’s expectation (and therefore the fair price of playing the game) would be:

E = (1/2 x $1) + (1/4 x $2) + ... [(1/2)^n x $2^(n-1)] + ...

Since there is a small but finite chance that even a fair coin will produce an unbroken run of tails, and since the pay-offs increase in proportion to the decreasing probabilities of such an event, the expectation is infinite. However, as Nicholas Bernoulli and all subsequent commentators were quick to observe, no reasonable person would pay even a small sum to play the game. Although the mathematicians labeled this a paradox, it contained no contradiction between results derived from assumptions of equal validity.

The calculation of expectation is straightforward, and there is nothing in the mathematical definition of expectation that precludes an infinite answer. Rather, it struck them as paradoxical that the results of the mathematical theory could be so at odds with the manifest dictates of good sense. Applied mathematicians in the modern sense might simply have questioned the suitability of the mathematical theory for this class of problems, but that route was not open to the mixed mathematicians of the eighteenth century. In their eyes the clash between mathematical results and good sense threatened the very validity of mathematical probability.

This is why the St. Petersburg problem, trivial in itself, became a cause célèbre among classical probabilists.

The authors recap a nuanced debate over the ‘paradox’, particular through the lens of the Bernoulli cousins who personify the evolving domains of probability application:

What was at issue between the Bernoulli cousins was not whether probabilistic expectation should model reasonableness, but rather wherein such reasonableness consisted. Nicholas sided with the older sense of equity derived from aleatory contracts; Daniel with the increasingly important sense of economic prudence, derived from commerce.

The prototypical reasonable person was no longer an impartial judge but rather a canny merchant, and the mathematical theory of probability reflected that shift.

So the calculus of probability sought to codify the ‘reasonable’ intuitions of the time, but also reflected the elitism & methodological fervor of the age:

If their calculus yielded results that echoed what the enlightened had known all along — as preface after mathematical preface emphasized was the case — then all the elaborate machinery of equations and calculations did seem a belaboring of the obvious. The probabilists replied that, in Voltaire’s words, common sense was not that common.

Only a small elite of hommes éclairés could reason accurately enough by unaided intuition; the calculus of probabilities sought to codify these intuitions (which the probabilists believed to be actually subconscious calculations) for use by hoi polloi not so well endowed by nature.

The ideal of a calculus of reasoning, a set of formal rules independent of content, exerted a certain fatal attraction for many seventeenth- and eighteenth-century thinkers. The probabilists’ hope of turning the “art of conjecture” into such a calculus echoes the seventeenth-century fascination with method taken to an extreme.

This extreme optimism in mechanistic methods is aptly reflected in “Laplace’s Demon” & the idea of a “Clockwork Universe”. It also ends as most extreme optimism ends…

Classical probability theory was thus at once a description of and prescription for reasonableness.

Two ambiguities, neither ever clearly recognized, confused and ultimately undermined the classical program to render reasonableness mathematical.

  • The first surfaced early on in the debate over expectation sparked by the St. Petersburg problem: there were several distinct brands of reasonableness, and they sometimes led to very different solutions of the same problem. The fair judge and the shrewd merchant did not agree on the proper definition of expectation, but both belonged to the select company of hommes éclairés. However, the probabilists persisted in believing that reasonableness was monolithic, despite endless debate over just how to define it. It took an upheaval of the magnitude of the French Revolution to shatter their faith in the natural consensus of the enlightened few.
  • The second ambiguity concerned just where to draw the line between description and prescription.

Over the course of its long career, the emphasis within classical probability theory slowly shifted from the descriptive to the prescriptive, as a result both of disillusionment with the ideal of reasonableness and of the widening gap between objective and subjective probabilities.

The author’s conclude this subsection with a tidy characterization of the probabilty theory of the time:

These, then, were the hallmarks of the classical interpretation of mathematical probability:

  • a fruitful conflation of subjective and objective senses of probability;
  • a thoroughgoing determinism that firmly denied the existence of real chance and that highlighted the subjective sense of probability in programmatic statements;
  • a commitment to the mixed mathematical goal of modeling phenomena;
  • and above all an identification of the theory with that form of practical rationality that came to be known as reasonableness.

To those schooled in twentieth-century distinctions, the mathematical theory is independent of both its innumerable possible interpretations and its applications, but for classical probabilists they were all of a piece.

1.6 RISK IN GAMBLING & INSURANCE

The next two subsections pack in tons of narrative details of developments in the domains of probability, these are the major points:

Gambling was the paradigmatic aleatory contract, and the very first problems solved by the mathematicians were of this sort.

Despite rapid progress, most probabilists were ill at ease with their stock-in-trade gambling problems and their disreputable associations.

…apparently most gamblers had little appetite for this sort of edification.

Feeling themselves thus at once neglected and despised for their interest in the mathematics of gambling, classical probabilists eagerly turned their attention to other, more respectable types of aleatory contracts:

  • wine futures
  • annuities
  • maritime insurance
  • the expectation of an inheritance
  • dowry funds
  • usufructs

Statistics during this period took the form of demographic data on births, marriages, and deaths because this was information that governments had been requiring parishes to register since the first half of the sixteenth century, for reasons having nothing to do with probability theory.

Mathematicians unhesitatingly read these statistical frequencies as probabilities, and saw in them the means of advancing from gambling to more reputable kinds of aleatory contracts.

But alas, again…

By and large, eighteenth-century buyers and sellers of annuities and life insurance were no more interested in probability theory than the gamblers.

The influence of the mathematical theory of risk on the practice of risk was thus effectively nil for most of the eighteenth century.

Practitioners’ resistance to the mathematical methods of the probabilists were deeply rooted, & still recognizable in strains of anti-empiricism today:

There were other, still deeper reasons why the practitioners of risk resisted the mathematical theory of risk. Long before the advent of mathematical probability and statistics, parties to aleatory contracts like gambling, annuities, and maritime insurance had agreed upon the price of a future contingency on the basis of intuitions that ran directly counter to those of the probabilists.

Whereas the dealers in risk acted as if the world were a mosaic of individual cases, each to be appraised according to particular circumstances by an old hand in the business, the mathematicians proposed a world of simple, stable regularities that anyone equipped with the right data and formulae could exploit.

For the practitioners of risk, accepting the mathematical theory of risk required profound change in beliefs, and in the case of life insurance, also of values. They had to replace individual cases with rules that held only en masse, and to replace seasoned judgment with reckoning.

What the mathematicians dismissed as local perturbations that would cancel one another out in the long run, the practitioners viewed as the very stuff of their trade .

Only good judgment and a thorough versing in these minutiae could price the risk in question.

The practice of risk was not simply astatistical; it was positively antistatistical in its focus on the individual case to the neglect of large numbers and the long term.

The practitioners equated time with uncertainty, for time brought unforeseen changes in these crucial conditions; the probabilists equated time with certainty, the large numbers that revealed the regularities underlying the apparent flux.

Tensions between (often innumerate) domain experts & empirical quants, typified by a bespoke micro-focus on individual cases in the former & an aggregate macro-focus on broader statistical patterns in the latter, are alive & well today. We’ll also hear much more on these themes in chapters to come.

1.7 EVIDENCE & CAUSES

Aleatory contracts were not the only area of application that the classical probabilists took over from the jurists. They also turned their attention early on to problems of evidence, particularly those of witness testimony.

Almost every probabilist from Jakob Bernoulli through Poisson tried his hand at the probability of testimony, and Montmort was exceptional in asking whether such matters were really legitimate applications of the mathematical theory.

The thorniness of this problem spurred the next great leap in frequentism, which was perhaps a leap too far:

Beginning with Jakob Bernoulli’s celebrated theorem, the probabilists addressed the problem of how much success generated what degree of certainty. This meant recasting ideas of cause and effect in terms tractable to probability theory, i.e. relating them to the ubiquitous urn model that became the hallmark of the classical interpretation.

Imagine an urn filled with colored balls in some fixed proportion, from which repeated drawings with replacement are made. Bernoulli’s theorem states that in the limit, as the number of drawings N approaches infinity, the probability P that the observed proportion of colored balls m/N corresponds to the actual proportion p within the urn approaches certainty:

Bernoulli’s theorem amounted to a guarantee that in the long run observed frequencies would stabilize around the “true” underlying value, that regularity would ultimately triumph over variability, cause over chance.

This result was a curious mixture of the banal and the revolutionary.

Banal, because as Bernoulli himself admitted in a letter to Leibniz, “even the stupidest person knows by some instinct of nature per se and by no previous instruction” that the greater the number of confirming observations, the surer the conjecture; revolutionary, because it linked the probabilities of degrees of certainty to the probabilities of frequencies, and because it created a model of causation that was essentially devoid of causes.

Moreover, the new model abandoned all search for mechanisms, for the hidden springs and principles that ran the clockwork of the world.

In Bernoulli’s urn model, numbers generated numbers; the physical processes by which they did so were wholly inscrutable.

The theorem was the cornerstone of the probability of causes, and yet it did not really provide a way of reasoning from known effects to unknown causes even in the restricted sense of frequencies and probabilities. For one was not justified in simply reading off the underlying probability from the observed frequency in any finite number of trials: without some additional simplifying assumption, the frequencies never converge unambiguously to a single value.

Given the probability, Bernoulli’s theorem revealed how likely it was that observed frequencies would approximate that probability to any desired degree of precision.

^ The very definition of frequentism.

What was required was the inverse: Given the observed frequency, how likely is it to approximate the unknown probability?

^ Enter ‘inverse’ / ‘Bayesian’ probability.

Or, as the problem was more often posed, given that an event has occurred so many times before, what is the probability that it will occur again on the next trial? In short, what is the probability that the future will be like the past? These so-called inverse probabilities became the core of the probability of causes. Thomas Bayes and Pierre Simon Laplace independently proved versions of the inverse of Bernoulli’s theorem (Bayes, 1763; Laplace, 1774), whose applications remain controversial to this day (see 3.4).

Bernoulli’s theorem was a mathematical model of causation, particularly useful for detecting the existence of “weak” causes like animal magnetism, while the inverse theorem was a mathematical model of the scientific method itself, of evaluating the status of hypotheses like the preponderance of male to female births in light of new data.

Thus we reach a critical juncture (at least in hindsight) of the history of probability theory, but more on this in future chapters.

1.8 THE MORAL SCIENCES

This short subsection describes the classical probabilists’ engagement with Enlightenment social science & ethics, to which they sought to bring a sort of proto-quantitative sociology, under the assumptions of a ‘clockwork society’.

It was in the moral sciences of the Enlightenment that the reasonable person of classical probability theory was most in evidence. For this reason the probabilists tried long and hard to make theirs the calculus of the moral sciences, a “social mathematics,” in Condorcet’s phrase.

This project was justly rendered quaint in short order:

Their nineteenth-century successors ridiculed the program as an amalgam of the impracticable and the presumptuous, a slur upon the good name of mathematics.

But to the classical probabilists nothing seemed more obvious than that their calculus should be applied to jurisprudence, political economy, and other parts of the moral sciences. In order to understand their confidence, we must first understand the assumptions and aims of the Enlightenment moral sciences, and how these harmonized with those of classical probability theory.

The tension between individualist & structuralist dispositions reemerges, this time between Enlightenment & Modernist camps:

In contrast to the social sciences of the nineteenth century, the students of the moral sciences took the individual rather than society as their unit of analysis. Insofar as they dealt with society at large, they conceived of it as an aggregate of such individuals. Moreover, the regularities that the moral sciences sought to uncover were the result of rational decisions made by these individuals rather than of the overarching structures of culture and society.

Reasoning individuals were in this sense the cause of social regularities; social order flowed from orderly individuals.

Like classical probability theory, the moral sciences were both descriptive and prescriptive. On the one hand, they claimed to reveal the immutable order of human thought and action; on the other, they urged changes in existing social arrangements to better approximate this order.

Many details & historical reference points follow, but the crux of it is:

The probabilists entered the moral sciences through jurisprudence, for reasons having to do with the history of the calculus itself and with the political climate of the time.

It became an urgent political reality in the succession of French regimes between the outbreak of the Revolution in 1789 and the July Monarchy of 1830.

In the service of the moral sciences, mathematics itself took on a moral tinge.

1.9 CONCLUSION

Finally, a summary of this rich & pithy chapter, by way of a preview of the next.

By the time Poisson claimed to have demonstrated his conclusions with all the rigor of mathematics in 1837, the classical interpretation of probability was under attack on several fronts.

These critics also heaped scorn on the probabilities of testimony and of causes; the one for attempting to quantify imponderables like veracity, and the other for substituting armchair algebra for honest empirical investigation.

By 1840, the theory that had been touted as good sense reduced to a calculus struck many mathematicians and philosophers as an “aberration of the intellect.”

For the first time, mathematicians began to distinguish the theory of probability from its suspect applications.

The intellectual and social context which had made the classical interpretation and its characteristic applications conceivable dissolved in the early decades of the nineteenth century. The French Revolution and the social tensions that followed it shook the confidence of the probabilists in the existence of a single, shared standard of reasonableness in a way that decades of controversy over the proper definition of expectation had not.

The reasonable person fragmented and then disappeared altogether, along with the consensus of the intellectual and political elites they were supposed to embody. In the first flush of romanticism, reason itself ceased to be a matter of implicit calculation, and was instead identified with unanalyzable intuitions and sensibility.

The classical interpretation had lost its subject matter. It had also lost its justification for amalgamating objective and subjective probabilities.

Subjective belief and objective frequencies began as equivalents and ended as diametric opposites.

Once the psychological bonds dissolved between objective and subjective probabilties, and between the calculus of probabilities and good sense, the classical interpretation came to seem both dangerously subjective and distinctly unreasonable.

Bayes is sent into exile:

In the mouths of the frequentists “subjective” became an epithet, and they were unrelenting in their criticisms of applications that equated “equally undecided” with “equally possible,” as in many classical applications of Bayes’ theorem.

A handful of prominent mathematicians, most notably Augustus De Morgan and W. S. Jevons in England, upheld one or another variant of the classical interpretation of probability during the middle decades of the nineteenth century, but they were an embattled minority.

Probabilists turned from the rationality of the few to the irrationality of the many.

That’s all for Chapter 1.

Stay tuned for the next installment & if you enjoyed this, BUY THE BOOK!

Also follow me & check out my other posts :)


Follow on twitter: @dnlmc
LinkedIn: linkedin.com/in/dnlmc
Github: https://github.com/dnlmc

--

--

Founder & Chief Scientist @ Coεmeta (coemeta.xyz) | formerly Associate Director of Analytics & Decision Science @ the Philadelphia Inquirer