In the beginning, there was Simulmatics

A new book investigates the origins of data science

Ray Robinson
Towards Data Science

--

The IBM 704, workhorse of the early days of data science (Source: Computer History Museum)

I still remember the first time I heard the term “data science”. It was thrown out by an account manager for one of those pricey IT consultancies my employer was so fond of. “Data science?” said my boss, a battle-scarred veteran of the IT wars. “That sounds like a bunch of statisticians who just got a raise.”

In her new book, “If Then: How the Simulmatics Corporation Invented the Future”, Jill Lepore provides an origin story for data science, or, as it was known back then: “massive data.”

Whether Simulmatics, a tiny company with a short life, actually invented the future of data science is in doubt. The New York Times put a more skeptical headline on its review of Lepore’s book: “The Bumbling 1960s Data Scientists Who Anticipated Facebook and Google.”

I lean more to the Times interpretation; Simulmatics anticipated the future, but it actually invented very little.

That doesn’t take anything away from Lepore’s book, which is a deeply researched, well-written journey through the early days of data science by one of the country’s best-known historians. Lepore is a professor of history at Harvard and writer for the New Yorker whose previous book, “These Truths”,: is a history of the United States.

“If Then” by Jill Lepore (Source: Liveright/W.W. Norton)

The are three stars of the Simulmatics story, described by Lepore as “the long-dead, white-whiskered grandfathers of Mark Zuckerberg and Sergei Brin and Jeff Bezos and Peter Thiel and Mark Andreessen and Elon Musk.”:

· Ed Greenfield, a Madison Avenue ad man who falsely claimed to have attended Yale Law School and the University of Chicago (in truth, he dropped out of Wabash College in Indiana after a year).

· Bill McPhee, a brilliant and apparently quite mad FORTRAN programmer who was once involuntarily committed to a mental hospital by his wife.

· Ithiel de Sola Pool, an expert in quantitative social science and suspected Communist who only got the security clearance required to do government work after a young California congressman named Richard Nixon intervened on his behalf.

Greenfield’s big idea was to bring the power of quantitative social science and computing into his advertising agency, where it would be used to hawk consumer products and get politicians elected. Human behavior, whether voting or deciding which brand of soap to buy, was to be modeled, simulated and — in the end — manipulated using sophisticated algorithms applied to vast stores of data.

Sound familiar? This all actually got started in about 1952.

Greenfield was a passionate liberal and civil rights activist. He wanted to use analytics and technology to help Democrats of the 1950s and 1960s erase the technology advantage then enjoyed by Republicans.

His company, Edward L. Greenfield and Co., developed its own “Social Science Division.” That became the forerunner of Simulmatics, which was founded in 1959 by Greenfield and Pool. “They were confident, and they were cocky,” Lepore writes. “But they sometimes wondered if all of it wasn’t bullshit.”

In the end, Greenfield’s ambitions, and those of his cohorts Pool and McPhee, were simply more than the technology of the day would support. The state of the art at the time was the IBM 704 mainframe. To run programs on its mainframes, IBM had developed FORTRAN (which stood for “Formula Translation”).

And therein lay the problem. Lepore includes this quote from IBM’s FORTRAN documentation:

“The FORTRAN language is intended to be capable of expressing any problem of numerical computation…However, for problems in which machine words have a logical rather than numerical meaning it is less that satisfactory, and it may fail entirely to express some problems.” (Emphasis added).

Human beings, you may have noted, generally don’t make decisions based on numerical calculations. For that matter, many aren’t terribly logical. Simulmatics was tackling a big problem with a less than ideal toolset.

Another problem was that data was hard to come by. Marketing and research companies tended to treat it as a proprietary asset. And the first version of the Freedom of Information Act, which began to open federal data stores, wasn’t enacted until 1967. When data was available, it often had to be input from media like punch cards and reels of tape.

Simulmatics vaulted to prominence in the 1960 presidential election, in which it sold its services to the John F. Kennedy campaign. It was based on the FORTRAN programming of McPhee, who described it in his Columbia Ph. D dissertation as “A Fully Observable Electorate.”

Lepore notes, “It is not, at an elementary level, any different from what Cambridge Analytica sold as its services to the Trump and the Brexit “Leave” campaigns in 2015 and 2016.”

Most innovations have many birthplaces. In the case of Simulmatics, one birthplace was the psychiatric ward at Bellevue Hospital in New York City. McPhee continued to work on the new program from inside the ward after his wife had him committed.

For the Kennedy campaign, Simulmatics ran an analysis based on historical voting patterns among 460 different voter groups. An example group: “Midwestern, rural, Protestant, lower income, female.”

The key recommendation: that Kennedy stop trying to avoid the issue of his Catholicism (and it was very much an issue in 1960) and meet it head on as part of an overall stand against prejudice. That would appeal to two groups where Kennedy was weak: Blacks and Jews. It would also serve to further motivate Catholics.

How big a role Simulmatics played in Kennedy’s election is open to question. But what isn’t in question is that Black voters in the north were a major factor in his narrow victory. Simulmatics immediately began a campaign of its own, publicizing its role in the Kennedy campaign. That set off a controversy over how much Kennedy was, in effect, a computer-controlled president.

And that’s one of the amazing parts of this book. Today, such a revelation wouldn’t create a ripple. Most voters would simply assume that all politicians use technology to segment the electorate and tailor messages to different groups.

In 1960, though, it was a scandal. And it effectively ended Simulmatics work on political campaigns.

But other opportunities appeared. In 1962, Simulmatics was hired to produce real time analyses of election returns for The New York Times. But real time turned out to mean as fast as returns could be received by telephone and teletype, then encoded onto punch cards and submitted via a modem to an IBM data center. The Times quickly lost interest.

The logical follow-on to political campaigns was helping the Pentagon wage psychological warfare. And Vietnam provided the opportunity. “Vietnam,” said Pool in 1966, “is the greatest social science laboratory we have ever had.”

Many social scientists had already split with the Pentagon over Vietnam and stopped competing for Pentagon contracts and grants. For Simulmatics, which wasn’t affiliated with a university, that wasn’t an issue. It had its own team, including some natives, on the ground in South Vietnam conducting surveys in villages throughout the country.

The Pentagon’s Advanced Research Projects Agency (ARPA), which financed the Simulmatics work in South Vietnam, soon became dissatisfied with the results, finding them “not the work of responsible researchers.” ARPA was even more critical when it terminated the contract: “Simulmatics reflects discredit not only upon itself as an organization — it appears more a sham — but upon behavioral research in general.”

With its Pentagon work shut down, Simulmatics moved on to the urban unrest gripping the United States in the late 1960s as Vietnam and the civil rights movement smoldered. It was hired to assist the Kerner Commission, which had been set up by the government to study the causes and effects of riots. It ran simulations on the causes of and solutions to inner city poverty. It concluded, not surprisingly, that the solution was to get out of poverty.

And then business began to dry up. By 1969, Simulmatics was a company that existed only on paper. In 1970, it filed for bankruptcy. By 1974, with Simulmatics a dim memory, the whole concept of computer simulation of human behavior was being called into question. Watergate and Vietnam had increased suspicions around government data collection. And the Privacy Act of 1974 made it more difficult for the government to collect and aggregate personal data.

Greenfield died in 1983 and Pool a year later. The final sad chapter of the Simulmatics story was written in 1998. Bill McPhee, the programmer who wrote the original Simulmatics code in a psychiatric ward, shot himself to death while sitting in front of his computer. He was 77.

Every wave of innovation begins with a long series of things that don’t work.

Simulmatics was a failed corporation built on the Cold War idea that human behavior could be manipulated to the ends of governments, politicians and corporations. But the idea lives on even though the technology of the 1950s and 1960s couldn’t support it. After all, are the algorithms that run on the social media platforms of today anything more than a 21st century version of the Simulmatics idea?

--

--