“AI”, “ML”, or “Data Science”: A Glossary of Terms

Published in

Towards Data Science

11 min readOct 18, 2019

As a largely self-taught programmer, I have always appreciated resources that don’t make many assumptions about prior knowledge. So when I started writing a series of blog posts on machine learning it seemed natural to start with the basics. This post explains the meaning of the term “machine learning” as well as definitions for the many similar terms.

As we go through these you will probably ask yourself why there are so many technical-sounding terms for what is essentially the same thing, or same set of things. That’s a question I can answer up front: Machine learning and data science combine many different fields of study and each of those fields have their own preferred terms. Combine that with the fact that the field has been around for decades where different nomenclature has fallen in and out of favor and you have a recipe for confusion. For the sake of people new to the field, we’ll cover the whole range of terms that can be applied to machine learning.

Computer Science

It might seem like “computer science” is simply “anything to do with computers”, but it’s a little more nuanced than that. Any computer can be thought of as a combination of hardware and software. The hardware is the circuitry and components that make up the physical computer, the software is the code that runs on that hardware and uses that clever circuitry to generate meaningful output.

The design of hardware is the purview of electrical engineering. The design of software is the realm of computer science. A simple definition of computer science could be “The field involving the creation of software and algorithms, and the programming of computers.” So what about the difference between “programmer” versus “computer scientist”? To some extent they are interchangeable — after all, programming is an integral part of computer science — and some people will use them that way.

For others, however, “computer scientist” refers more to someone who deals with things that require a good knowledge of computer science theory. Things like algorithm analysis and design, or software/systems architecture. For our purposes, computer science is one of the main fields that feed into modern data science. Most data scientists can accurately call themselves “programmers” and many have a background in computer science theory. There’s not a total overlap between the fields — not all data scientists are also computer scientists or vice versa — but there is a lot of overlap.

Statistics

Statistics is a branch of mathematics that takes sets of numbers and reduces them into summaries using various mathematical approaches. Most people are familiar with basic statistics like the average (mean), but there are many different statistical approaches that all amount to reducing a lot of data into some kind of summary value(s).

You may not think of statistics when you hear the terms “AI” or “machine learning” but statistics is the primary contributing field to both of these. From here on out every term we use will have its basis in statistics. Some people go even farther and consider “statistics” an all-encompassing term for most everything listed below. This is less common than using the more specific terms but it helps to point out just how fundamental statistics is to everything that follows.

Data Science

If we were trying to strictly define what we could call “statistics” we would probably choose only the formulas and approaches that are used to generate the summary outputs. In a theoretical sense that’s fine, but in a practical sense, it leaves a lot of knowledge and techniques in a gray area. Collecting data, representing and formatting it, as well as cleaning and modifying data are all essential skills to carrying out data analysis. Casting an even wider net we could say that databases, data pipelines, even distributed computing are all essential to modern day data analysis.

Which raises a question: If statistics is just a set of specific formulas and algorithms then what do we call the combination of all these skills and knowledge that are needed to do practical data analysis? Generally, the answer to that is “data science”, a term that encompasses all the skills used in data storage, manipulation, and analysis.

Artificial Intelligence

The term “artificial intelligence” is all the rage these days but it has a long a somewhat confusing history so let’s talk about that first. In the earliest days of computers, a single machine would run a single program. So you could have a machine that calculated the trajectory of artillery shells, and a machine that processed loan applications, but those could not be the same machine. In those days there was less separation between hardware and software, many machines were hardwired to solve a specific problem and either could not be reprogrammed at all or only with immense effort.

In other words, most computers were purpose-built to solve a single problem. Naturally, early computer scientists wanted ways for computers to do more than that and there were several approaches. The most ambitious was the idea that there existed a “general algorithm”, which was an approach that could be used to solve any problem. The inspiration was that humans are capable of tackling basically anything and we don’t seem to have different “algorithms” in our brain for different problems, so we must use one general approach to problem-solving. If such an algorithm could be discovered for computers then a single machine would be capable of solving a huge array of problems without the need for reprogramming (or programming at all). It would exhibit humanlike intelligence, operated on circuitry, hence the name “artificial intelligence.” In those early days the use of “AI” referred specifically to those approaches that were trying to find the general algorithm for machines.

To date, no one has solved that problem. Computers were able to reach a point where they could solve a range of problems on the same machine, but that was accomplished by using an operating system to load and switch between a number of different, problem-specific programs. This is the basic architecture of nearly every computer in existence today.

Rather than die out the term “AI” evolved. Now it can refer to algorithms that are problem-specific (ie: not general), and the algorithms don’t have to approach the problem in any way resembling how a human might. The distinguishing feature of modern AI is that it solves problems that are considered to require a humanlike level of intelligence. This is a vague term and there’s no technical definition of it, but they generally involve one or more of the following:

They have a huge or potentially infinite number of possible solutions, and/or a huge or potentially infinite number of possible inputs.
They require extensive planning ahead to reach an optimal solution.
There is no “correct” answer, just better and worse ones with the goal being to find the best possible.
The “goodness” or “badness” of a decision depends on a wide range of factors that may be constantly changing.

There are other factors, but these are the most common. An example of an AI problem would be coming up with the order in which a car needs to be built so no step blocks the execution of a later step. For example, putting the door panels on before the windows have been installed would either prevent the installation of the windows or require you to take the door panels off again. This involves a huge number of potential inputs, extensive planning, the goodness or badness of a choice depends on other factors, and there are likely a number of different solutions.

Normally these problems require human specialists to solve, but with the right approach, a computer can solve them as well. When a problem reaches a certain level of complexity it can be considered “AI.” Again, there’s no hard and fast rule for what qualifies but there are a range of problems that have become generally accepted as being AI problems.

Now back to why the term “AI” has shown up everywhere recently: it has to do with machine learning. Machine learning can be applied to a huge range of problems because, in it’s simplest form, machine learning is fairly basic statistics. Machine learning is also widely accepted as being a subfield of AI, so all machine learning solutions are also AI solutions. Thus, if you use a machine learning approach you can call it “AI” and, bluntly put, “AI” sounds much cooler than “machine learning” or “uses statistics.”

“AI” conjures up this Sci-Fi image of impossibly advanced technology, “machine learning” confuses many people who don’t know why a machine needs to learn, and “statistics” makes people think of high school. From a marketing perspective, it’s obvious which one you’d want to use. So while the huge number of things calling themselves “AI Powered” are not technically wrong it’s important to remember that the term could mean little more than just applying statistics to data.

Machine Learning

As we described above the term “artificial intelligence” initially referred to the effort to find a general algorithm that could almost perfectly replicate a humanlike intelligence. Now it refers to any algorithm that solves a sufficiently complicated problem, one that would otherwise require a human to solve. These algorithms use many different approaches to do this.

“Machine learning” describes one common subfamily of AI algorithms, ones that “learn” their logic through statistical analysis of data. This has proven to be one of the most general-purpose and powerful subfields of AI. There are a number of different ways to accomplish this and the details of that occupy entire books, so we won’t go in depth here but we can cover the basics.

The most common machine learning algorithms take a dataset that acts as a knowledge base about the “correct” answers to the given problem. For example: Given a set of housing data the “correct” answer is probably accurately predicting the price of the house based on things like the size, the number of bedrooms and bathrooms, etc.

From there they have a “teacher” and “learner” elements. The learner generates predictions using the input data, and the teacher provides feedback on whether the error of those predictions — whether they were too high/too low, and by how much. The learner can then adjust their predictions based on that feedback. It’s generally impossible to predict every example perfectly, so most algorithms stop when the learner can no longer improve their overall error. There are a number of common subproblems within machine learning, this one is called “supervised learning” because of the teacher element.

As a subfield of AI, anything that qualifies as “machine learning” can also reasonably be called “AI/artificial intelligence” but not the other way around. There are a number of AI problems which are not solved using this learning approach. Having said that, given the huge range of problems to which machine learning can be applied it has become almost synonymous with “artificial intelligence.” While that’s not technically accurate, it’s worth noting that frequently when people say AI they are referring to a machine learning approach.

Statistical Learning

Interchangeable with “machine learning.” With “machine learning” the focus is on what is doing the learning, ie: the computer or ‘machine.’ In “statistical learning” the focus is on how the learning is accomplished, which is through statistics. For the most part, I see “statistical learning” used only in academic circles, but it’s worth knowing because searches for “statistical learning” turn up excellent resources that searches for “machine learning” might miss.

Neural Networks

This term shows up enough to merit its own section even though it is technically a form of machine learning. Neural networks are an individual algorithm within machine learning that has been extremely popular in the last few years and are likely to continue that trend. Modern neural networks, also called “deep neural networks” or “deep learning networks,” are one of the most powerful and capable learning algorithms in use today. You will rarely see something like a decision tree called out by name, but neural networks are frequently.

Part of the reason for this is that, like “artificial intelligence”, the term “neural network” has a very futuristic sound to it, so when neural networks are used it’s more likely that you’ll hear the algorithm mentioned by name for marketing reasons. Another popular characterization to make is that neural networks are “biologically inspired”, “inspired by the human brain” or “designed to function like neurons in the brain”, etc. The implication is that, unlike other algorithms which use abstruse math and statistics, neural networks have this deep basis in human thinking and function better because they “think” like humans do.

To put it bluntly, this is a very misleading characterization that stems from a much earlier era in which some computer scientists truly thought the algorithm would be the beginning of a digital brain. That optimism was very misguided. The original algorithm proved so overhyped and ultimately inadequate that it was nearly forgotten entirely. The modern algorithm has been changed significantly to reach the point it has and is, like most others, inspired by math and statistics more than anything else. Calling it “based on the human brain” is about as accurate as calling a chia pet “inspired by human anatomy.” While those claims are not technically wrong, they both seriously overstate the strength of that relationship.

Big Data

The first time I heard the term “big data” the author was trying to make a play on “big oil” by describing data as a commodity in itself rather than a byproduct of other business. It’s a good term for that but I’ve rarely heard that specific usage since then. When it first came into usage, ‘big data’ was as multipurpose as ‘AI’ is today, but since then it has mostly fallen into a more specific usage focusing on the ‘big’ part.

So now ‘big data’ typically refers to companies with truly colossal amounts of data. Data that requires a whole architecture of servers and storage devices just to handle it. This separates it from other terms that we have defined already. Statistics/AI/machine learning can all be performed on data of basically any size, not just “big data.” Similarly “big data” the way we have defined it can be used for purposes other than data science. Thus they are closely related — but still separate — terms.

Anything With “Analytics”

‘Data analytics’, ‘business analytics’, ‘predictive analytics’, ‘business predictive data analytics’, etc. etc. etc. All of these are really just interchangeable with “data science.” For the most part terms with “analytics” in them come from corporate environments where statistics and statistical analysis have been used for decades, but the terms to describe whoever is doing that analysis tends to change.

Data-Driven

This can refer to anything that uses data science to reach conclusions or make decisions. Typically this term shows up when there are other valid options that are not “data-driven.” For example, a thermostat that learns what temperature to use based on your prior usage can be called “data-driven” because most thermostats are not.

Conclusion

I’ll close this off with a disclaimer. All of the descriptions above are good, nuanced definitions and in my experience are the kinds of definitions you’d get from a majority of people experienced in the field. Having said that there are certainly experienced people with different (sometimes strong) opinions on what these terms mean.

Moreso than many fields computer science tends to have a lot of different terms for the same thing, as well as using the same terms to describe different things and it’s a common source of confusion. You will undoubtedly run into different interpretations of these terms, and that’s fine. The explanations above should give you enough background to figure out the nuances, and if it’s ever essential that everyone is working off of the same definition then an informed discussion is in order.