"The future of data analysis can involve great progress, the overcoming of real difficulties, and the provision of a great service to all fields of science and technology. Will it?"
John Tukey, from ‘The Future of Data Analysis’

In 1962, mathematician John Tukey stated in his paper, The Future of Data Analysis, "For a long time I thought I was a statistician… But as I have watched mathematical statistics evolve… I have come to feel that my central interest is in data analysis…". He then describes data analysis as "a science, one defined by a ubiquitous problem rather than by a concrete subject". In these words, Tukey is stating that while mathematics is simply a set of defined, a priori truths; data analysis is an empirical science where knowledge can be gained through constant experimentation. Tukey appears to more or less define statistics as a subset of mathematics and data analysis. According to Tukey, "Individual parts of mathematical statistics must look for their justification toward either data analysis or pure mathematics".¹
We can see that Tukey’s sentiments echo today, as the term ‘Data Science‘ is becoming an everyday jargon in the tech world. Additionally, many established universities have created ‘Data Science’ degree programs only in the past half-decade. Many of these programs serve as an extension of their already existing statistics curriculums. It’s safe to say that the field has very quickly become its own unique discipline. Tukey closes his paper with a prescient understanding of the importance data will play in the future as computational storage power becomes greater: "…there are situation[s] where the computer makes feasible what would have been wholly unfeasible… where speed and economy of delivery of answer make the computer essential for large data sets and very valuable for small sets".¹
Fast-forward to 1994. BusinessWeek publishes an article titled Database Marketing. The article goes into detail describing the rise of new checkout scanner technologies that were being used in stores during the eighties. The vision was that each scanner would create a transaction record that stored data on each item purchased. This could then, in turn, give retailers insights on what to advertise based on customer demand and purchase history.² In the end, however, this checkout scanner craze didn’t live up to its promise. Companies simply didn’t have the technological infrastructure or the computational power to handle these volumes of data, let alone get insights from them.³
However, it’s important to note that this experiment, even though it failed, represented an early version of a very important concept. It represented a vision of utilizing massive-scale analytics to predict customer desires. And this was back in the eighties, when the internet wasn’t even mainstream yet. In fact, by the nineties, some companies were able to use this sort of tech while it was in its infancy. Blockbuster Video, for example, used its database of memberships and transactions to test a computerized system to recommend movies based on prior rentals.² It’s wild to think that Blockbuster, of all companies, was an early researcher into a content recommendation algorithm!
The following quote from the article almost sounds like it could have been written in an article from this year: "Companies are collecting mountains of information about you, crunching it to predict how likely you are to buy a product, and using that knowledge to craft a marketing message precisely calibrated to get you to do so".²
So what happened in the last two decades that have caused data and analytics to play an essential role in almost every technology organization?

Well to put it simply, technological capabilities increased exponentially, and in a relatively short period of time. The internet became a mainstream tool used by business and individuals alike. And remember the checkout scanner which collected data? Well in this era, there is basically no barrier to storing mass volumes of data. Today, endless amounts of data are collected and available for analysis at a moment’s notice. And this data stems from countless unique domains. This could be healthcare data, social media data, customer data; the list goes on. And businesses are eager to utilize their computational capacity to analyze this data because it serves as the key to their success.
And what about the computational power to store and process this data? Well today, GPU’s are able to process data and execute data-intensive algorithms at speeds exponentially faster than once thought possible.⁴ On top of that, data centers provide warehouses of off-site data storage. Cloud-computing vendors can then offer this storage as-needed to businesses and individuals. And on top of that, completely new paradigms of big data analysis have been created. The most notable project that achieved this was Apache Spark, an open source experiment that began at UC Berkeley’s AMPLab. The advent of Spark completely transformed the landscape of big data computing; as it perfected the paradigm of multi-machine processing. This allowed data to be distributed between multiple clusters and processed in parallel to maximize run-time efficiency.⁵ Additionally, the Spark project was able to perfect the MapReduce programming paradigm through the introduction of Resilient Distributed Datasets (or RDDs) as its fundamental data structure.⁶
What are these businesses’ goals exactly? And how is data the ‘key to their success’?

Well, the possibilities really are endless. Data Science seeps into practically every domain.⁴ And while the buzzword ‘Machine Learning’ is often associated with the exciting world of artificial intelligence, in the real world it is mostly just a means of giving insights to shareholders. And as mentioned, the use cases are plentiful.
First of all, data can have an immensely positive effect on a company’s advertising Strategy. The scientific efficiency of data analytics can save companies advertising dollars. This is because there is less money being wasted on strategies that haven’t been computationally verified.⁷ In the past, you didn’t have the data storing capacity to perform analysis on such large sets. Now, when performing analytics on historical customer and transaction records, companies can be sure that their algorithms are classifying the exact marketing strategies that need to be prioritized.
Next, consider the medical field. Perhaps a pharmaceutical company wants to predict the likeliness of a new drug being adopted. Then they can mine through historical claims data and create a predictor based on diagnosis patterns across various demographic attributes.
Data can even make strides in airline safety. For example, Southwest Airlines and NASA have teamed up on a text-mining project to identify potential hazards by studying air traffic control transcripts and data content generated by airplanes.⁸
I’ll stop listing use cases for now. But the point is, I could keep on going forever if I wanted to. You could write an encyclopedia on each business domain and use case Data Science influences. Its effect on organizational goals has truly been that profound. Whether a business’ goals involve increasing ROI or promoting the public well-being, data will play a role in some shape, way, or form.
Aside: AI versus Automation

The above use cases are more in line with AI, machine learning, and classification. But before we move on to use cases for automation, it’s important to note that automation is not to be confused with AI. While AI (and its subset, machine learning) is meant to mimic what a human can identify; automation is meant to continuously mimic tasks that a human can do. In other words, while AI algorithms have to do with classifying insights, automation algorithms have to do with continuously simulating repetitive tasks.
However, we must also consider the fact that the two are not necessarily mutually exclusive, and they oftentimes work together hand-in-hand. The best use case to illustrate this idea is self-driving technology. In this case, you are automating the task of driving uninterrupted for long periods of time. However, the task is not as menial as a simple copy-and-paste. There are many extraneous factors such as traffic lights, signs, and other vehicles. This is where AI gets supplemented into the mix. The car will need to implement classifiers to watch out for these extraneous factors and learn how to react to them.
Later on, when we get to the topics of data policy and strategy, I’ll probably use the two terms interchangeably. Because the two terms are indeed different, but they’re heavily related. Now that we’ve cleared this confusion, let’s move on to more business goals.
What are some outcomes business’ seek to achieve with automation?

The key goals with automation are pure efficiency and productivity. According to McKinsey, automation alone could raise annual global productivity from 0.8% to 1.4%. And several labor sectors are already utilizing automation. For example, the Australian mining company Rio Tinto has rolled out automated haul trucks and drilling machines which increased productivity drastically.⁴
As for long-haul trucking, it could well be the next labor sector that is immensely disrupted by automation. Consider the fact that 70% of America’s goods are transported via long-haul trucks.⁹ If you could automate all trucks with self-driving technology and get them to continuously run uninterrupted, then you would have an immensely efficient supply chain once though improbable. Industries like these which rely on purely labor-intensive tasks will always see an increase of efficiency with automation technologies.
But automation can even apply to industries that require more interpersonal communication. Take the domain of customer service for example. It is estimated that a majority of customer service interactions are now automated.¹⁰ Amazon and Citibank are just a couple of major corporations whose customer service infrastructures rely on virtual assistants to some extent. Customer service automation is also being heavily implemented in the food service industry. McDonald’s for instance, made a plan in 2018 to add self-service kiosks to one-thousand stores each quarter into 2020.¹¹ Today, we see the result, as kiosks are extremely commonplace in their restaurants. It doesn’t matter if a job requires a social aspect or not. Automation is set to disrupt it in some way.
We went over business goals. But what are some possible negative implications of the ‘Fourth Industrial Revolution’?

Oftentimes, we see the advent of AI being associated with the bleak. We often hear that many of the essential, labor-intensive jobs such as factory work and long-haul truck driving will soon be replaced by AI. This is surely an important ethical implication to consider. While past industrial revolutions created new jobs and displaced old ones, the AI revolution appears to be set to eliminate certain sectors completely. The jobs that are set to replace them, are predicted to be heavy in math, computation, and critical analysis. These white-collar jobs are a world away from the labor-intensive blue-collar ones they will soon replace.
And training the old workforce will be difficult, not only because they are adapting to an entirely new skill-set; but because they may not have an interest in learning these new skills at all. Consider the fact that the average truck driver in the United States is a middle-aged man nearing retirement, and probably without a college degree.¹² At this age, these people probably have no desire to learn to program. On top of that, they’re at the point in their lives where this job is a very important part of their identity. These are all ideas that need to be addressed when we do experience the AI revolution. And I would go so far as to say that governments need to develop an AI strategy in response to these phenomena.
How can governments and companies work together to implement an AI strategy?

We’ve been extensively discussing AI and automation through the lens of their main advantage: productivity. But now we must reconcile this with potential ethical implications. According to McKinsey, policymakers actually have a great incentive to embrace these technologies for the well-being of both their economy and their constituents: "This [productivity growth] will help ensure future prosperity, and create the surpluses that can be used to assist workers and society adapt to these rapid changes".⁴
In other words, productivity and efficiency can cause surplus and prosperity. And as speedy output of goods and services increases, economic surplus will be generated in both the private and public sectors. McKinsey brings up this idea of "public-private" partnerships that can lift developing countries out of poverty through digitization.⁴ But I would go a step further, and say that a partnership of this sort could aid developed countries just as much.
After all, more private revenue from these projects could mean more tax revenue for the government, which can then get pumped back into the people through various government-run initiatives. Remember the concern about certain blue-collar workers being permanently displaced? The government may be able to use this money to provide those displaced workers with social safety nets or a universal basic income.⁴ Perhaps some of these displaced workers can be given the option to participate in some government-sponsored STEM training program. Better yet, these programs can be offered to young students as well to prepare them for a growing workforce which will be in desperate need for new talent.
Countries all over the world are already implementing AI strategy. In 2018, Korea pledged $2 billion to the creation of AI research, jobs, talent, and government partnerships with "start-ups and corporations in the field [of AI]".¹³ And in the same year, Google opened Africa’s first AI research facility in Ghana to commit to "collaborating with local universities and research centers, as well as working with policy makers on the potential uses of AI in Africa".¹⁴ While the AI revolution does indeed have some dreary implications; if implemented correctly, it can provide a new cycle of prosperity where businesses, the state, and citizens all exchange ideas and revenue.
What is ‘The Future of Data Analysis’, as John Tukey stated?

Data, AI, and automation are poised to be the greatest disruptors in technology since computers and the internet. They won’t only disrupt the technological sphere, but will also determine how policy will be dictated for years to come. We explored the idea that the quick rise in technological capabilities jump-started the age of data. But this begs the question: how were these data-driven technologies able to rise so uniformly across all industries and businesses?
Well every company is trying to be at the forefront of what’s new in technology. Amazon made e-commerce mainstream. As a result, brick-and-mortar companies began to invest heavily in their own e-commerce operations in order to keep up. Similarly, tech giants such as Google, Facebook, and LinkedIn obviously have extremely robust data infrastructures. And this pushed companies like Walmart, the king of brick-and-mortar outlets, to develop their own data strategy. It’s no wonder that the percentage of job starters in analytics and data science increased ten-fold from 1990 to 2010.¹⁵ And this is only going to increase.
I’ll end with a prophetic quote by Google Chief Economist Hal Varian from 2009: "I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s? The ability to take data – to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it – that’s going to be a hugely important skill in the next decades…".¹⁶ And it’s true. Currently, there are many students studying to prepare for jobs that don’t even exist yet. The mainstream nature of data, AI, and automation are relatively new. And once these jobs do come to fruition; they will be plentiful, and essential to the operation of our entire society.
Citations & Sources:
[1]: Tukey, John W. "The Future of Data Analysis." The Annals of Mathematical Statistics, vol. 33, no. 1, 1962, pp. 1–67., doi:10.1214/aoms/1177704711.
[2]: Berry, Jonathan. "Database Marketing." Bloomberg.com, Bloomberg, 5 Sept. 1994, www.bloomberg.com/news/articles/1994-09-04/database-marketing.
[3]: Press, Gil. "A Very Short History Of Data Science." Forbes, Forbes Magazine, 15 Oct. 2014, www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/.
[4]: "What’s Now and next in Analytics, AI, and Automation." McKinsey.com, McKinsey & Company, 11 May 2019, www.mckinsey.com/featured-insights/digital-disruption/whats-now-and-next-in-analytics-ai-and-automation.
[5]: Chambers, Bill, and Matei Zahari. Spark: The Definitive Guide. O’Reilly, 2018.
[6]: "Resilient Distributed Dataset (RDD)." Databricks.com, Databricks, 15 May 2020, databricks.com/glossary/what-is-rdd.
[7]: Agrawal, AJ. "Why Data Is Important for Companies and Why Innovation Is On the Way." Inc.com, Inc., 24 Mar. 2016, www.inc.com/aj-agrawal/why-data-is-important-for-companies-and-why-innovation-is-on-the-way.html.
[8]: "Data Mining Tools Make Flights Safer, More Efficient." Nasa.gov, NASA, 2013, spinoff.nasa.gov/Spinoff2013/t_3.html.
[9]: Wertheim, Jon. "Automated Trucking, a Technical Milestone That Could Disrupt Hundreds of Thousands of Jobs, Hits the Road." Cbsnews.com, CBS News, 15 Mar. 2020, www.cbsnews.com/news/driverless-trucks-could-disrupt-the-trucking-industry-as-soon-as-2021-60-minutes-2020-03-15/.
[10]: Schneider, Christie. "10 Reasons Why AI-Powered, Automated Customer Service Is the Future." Watson Blog, IBM, 16 Oct. 2017, www.ibm.com/blogs/watson/2017/10/10-reasons-ai-powered-automated-customer-service-future/.
[11]: Hafner, Josh. "McDonald’s: You Buy More from Touch-Screen Kiosks than a Person. So Expect More Kiosks." Usatoday.com, USA Today, 7 June 2018, www.usatoday.com/story/money/nation-now/2018/06/07/mcdonalds-add-kiosks-citing-better-sales-over-face-face-orders/681196002/.
[12]: Kilcarr, Sean. "Demographics Are Changing Truck Driver Management." Fleetowner.com, FleetOwner, 20 Sept. 2017, www.fleetowner.com/resource-center/driver-management/article/21701029/demographics-are-changing-truck-driver-management.
[13]: Gov’t to Spend 2.2 Trillion Won on National AI Program. Korea JoongAng Daily, 15 May 2018, koreajoongangdaily.joins.com/news/article/article.aspx?aid=3048152.
[14]: Crabtree, Justina. "Google’s next A.I. Research Center Will Be Its First on the African Continent." Cnbc.com, CNBC News, 14 June 2018, www.cnbc.com/2018/06/14/google-ai-research-center-to-open-in-ghana-africa.html.
[15]: Patil, DJ. "Building Data Science Teams." Radar.oreilly.com, O’Reilly, 16 Sept. 2011, radar.oreilly.com/2011/09/building-data-science-teams.html?utm_source=feedburner.
[16]: "Hal Varian on How the Web Challenges Managers." McKinsey.com, McKinsey & Company, 1 Jan. 2009, www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/hal-varian-on-how-the-web-challenges-managers.