Monthly Edition

We are an incredible species: we are extremely curious about the world around us, love to learn, and often find new ways and tools of doing just that. One such advance in the last couple of decades has been computation. By improving the architecture of compute engines, we have gotten better at simulating complex dynamics, enumerating large numbers of potential outcomes, and performing quick calculations over those vast sets of possibilities. All of these events have propelled our understanding of the world around us. Machine Learning (ML), in particular, has equipped us with a highly flexible and powerful set of tools to predict, understand, and reason about the future. While the current state-of-the-art methods excel at the first goal, progress has been slow on the last two. One reason for this is perspective. The world is inherently complex.
Furthermore, events or outcomes are often the results of non-linear interactions between a multitude of factors. Often, we do not possess information on the complete list of influential factors or how they interact to bring about an observed outcome. These attributes make predicting real-world events extremely difficult. The ML community has approached this challenge using two fundamentally different model-building paradigms. They are:
- Discriminative
- Generative
The majority of models we learn about in school and hear about in the news fall under the discriminative category. The primary focus of this framework is prediction accuracy. Rather than impose strict assumptions on how factors interact, observed data guides the relationship-discovery process. Deep neural networks trained under this paradigm have superseded human-level performance on a wide variety of tasks, primarily due to an emphasis on predictive accuracy. While this is extremely impressive, these models have their limitations. In particular, highly accurate networks are incredibly complex and require a lot of training data to achieve human-level performance. Often, these models trade interpretability in exchange for superior performance. This facet is both good and bad.
On the one hand, we want very accurate predictions of the future. Precision is what enables us to plan well and thrive in the face of uncertain outcomes. However, hard-to-explain models invite a lot of skepticism and alienates people. This lack of trust is what narrows the scope of use of these powerful algorithms. Complexity also makes it difficult for us to reason about the future. What would happen if I took path A instead of path B? Is path B shorter than path A? Statisticians have developed an extensive toolkit to answer questions like these. Doing so under the discriminative paradigm is tricky as this mode of thinking does not attempt to model the inherent uncertainty explicitly.
The generative paradigm addresses this precise predicament. This school of thought embraces making assumptions about the world rather than shying away from it. Anyone who has taken an intro to stats class knows that statistical models allow us to give uncertainty a functional form. These models let us express confidence in our predictions. They let us hypothesize over potential outcomes. Activities like these enable us to reason about the future and make decisions based on new information. Every stats 101 student also knows that all of these models often make stringent assumptions. Sometimes these assumptions are violated, and sometimes they are just plain wrong.
To make matters worse, some of these models are poor approximations of real-world phenomena. This shortcoming is primarily due to these models’ inability to accommodate more than a handful of potential variables that could influence an outcome of interest. While you, the reader, might want to throw in the towel right about now, I urge you not to. Generative modellers are also wary of constructing poor approximations of the world. However, unlike their discriminative counterparts, they have meshed together concepts from computer science, statistics, and software engineering to address the problems above. The result is a class of models called probabilistic graphical models (PGM).
These graphs allow us to use as many variables as we see fit to model the highly complex world around us. They let us connect some, many, or all of these variables to one another. They also provide us with the flexibility to determine how information should flow through the network. Should information travel from node A to node B or perhaps from node B to node A? Maybe we leave that unspecified because we don’t have strong beliefs over how signals should flow through our approximation of the world. PGMs’ strong assumptions allow them to build sparse models. These assumptions also enable these networks to learn with far fewer data points than discriminative models. These features are akin to a human forming a mental image of how the world works and querying it to determine how to behave in unfamiliar settings.
Aside from building an accurate model, another challenge that has hindered the widespread adoption of PGMs is computation. Storing distributions over numerous random variables and updating their distributions is computationally prohibitive. These limitations tend to be artifacts of how PGMs are encoded using current programming languages and data structures. As such, practitioners have developed a new programming paradigm called probabilistic programming to circumvent these issues. These languages treat random variables and probability distributions as first-class citizens. As a result of these developments and a whole host of related ones in the field (e.g. variational Bayes), increasingly complex PGMs have gotten easier to build. These innovations have also reduced the time to query these models and reason about potential courses of action.
Generative models have not yet enjoyed the attention discriminative models have in recent times. I hope that this intro encourages all of you to take a gander into this alternate frame of thought. If you would like to learn more about this topic in-depth, I highly encourage you all to check out some of the fantastic articles written on this topic.
Abdullah Farouk, Volunteer Editorial Associate at TDS
Introduction to Probabilistic Graphical Models
Directed Graphical Models and Undirected Graphical Models
By Branislav Holländer – 11 min
Making Your Neural Network Say "I Don’t Know" – Bayesian NNs using Pyro and PyTorch
A big part of intelligence is not acting when one is uncertain
By Paras Chopra – 17 min
Generative vs. Discriminative Probabilistic Graphical Models
A Comparison of Naive Bayes and Logistic Regression
By Siwei Causevic – 5 min
Variational Bayes: the intuition behind Variational Auto-Encoders (VAEs)
Understanding the powerful idea that drives state-of-the-art models
By Anwesh Marwade – 7 min
Intro to probabilistic programming
A use case using Tensorflow-Probability (TFP)
By Fabiana Clemente – 6 min
Fundamental Problems of Probabilistic Inference
Why should you care about sampling if you are a machine learning practitioner?
By Marin Vlastelica Pogančić – 6 min
We also thank all the great new writers who joined us recently: Oshin Dutta, Renato Fillinich, Omri Kaduri, Lily Chen, Ben Williams, Ian Gabel, Jonathan Laserson, PhD, Joanna Lenczuk, ChiaChong, Humam Abo Alraja, Jindu Kwentua, Pietro Barbiero, Diana Ford, Pedro Brito, Dennis Feng, Yang Zhou, Dave DeCaprio, Archit Yadav, Jia Yi Chan, Arpana Mehta, Rahul Sangole, Ananya Bhattacharyya, Rohan Sukumaran, Wei Hao Khoong, Arnav Bandekar, Qike (Max) Li, Shayantan Banerjee, Lijo Abraham, Jake Mitchell, Z. Maria Wang, PhD, Denyse, Nam Nguyen, Kie Ichikawa, Lindsay Montanari, Davis Treybig, Vanessa Wong, Pooya Amini, Damian Ejlli, Ph.D, Vedant Bedi, jonloyens, Ibrahim Kovan, Sandipayan, Masaki Adachi, Chia Wei Lim, Reshma Shaji, Vianney Taquet, and many others. We invite you to take a look at their profiles and check out their work.