The world’s leading publication for data science, AI, and ML professionals.

Rage Against The Machine: Bayes Theorem and The Nature of Protest

Laws curtailing the right to peacefully protest are sweeping legislatures across the United States. Traditionally, constitutional scholars…

Photo by Colin Lloyd on Unsplash
Photo by Colin Lloyd on Unsplash

Laws curtailing the right to peacefully protest are sweeping legislatures across the United States. Traditionally, constitutional scholars have framed these types of policies as a balancing test between free speech and public safety. Is there an empirical basis for suggesting the former threatens the latter at this moment in time? That’s what this post is about.

This Spring, after passing one of the most restrictive in the country, the governor of Florida proudly celebrated the "strongest anti-rioting, pro-law enforcement measure in the country."

"If you riot, if you loot, if you harm others, particularly if you harm a law enforcement officer during one of these violent assemblies, you’re going to jail." – Governor Ron DeSantis

I don’t think that is controversial. However, the implied premise – that violent protest is widespread enough to necessitate a forceful denunciation – does not seem to be reflected in the data.

Dr. Erica Chenoweth is one of the nation’s leading scholars on nonviolent protest movements. Her research team maintains a publicly available dataset on protests in the US, which I will use here. They acknowledge that the data is imperfect and incomplete – but I think it works well for our analysis here. We will use it to form our own sense of proportion using Bayes Theorem.

But, why Bayes Theorem? I believe Bayes provides a great tool for this question because it preserves uncertainty in a way that frequentist models don’t. Unlike a mechanistic process that can only produce a narrow range of outputs, Bayes provides the flexibility to accomodate not having the ‘full picture’ in resemblance with how we naturally approach problems.

If I went to a protest tomorrow, I have no idea what is going to happen. I accept that uncertainty – but that doesn’t stop me from subconsciously thinking about what could go wrong. Tempers could flair. The person next to me could punch a police officer. The one behind me could bash the window of an office building. We could be shot at – or we could all lend our voice to the cause of our concern, peacefully and without incident. It’s all plausible.

For those who have been to large demonstrations though, we intuitively assume that, although a few people might get out of hand, most will adhere to socially acceptable behavior and have some level of restraint. What makes Bayes useful is that it’s designed to incorporate that assumption into the analysis; from a starting hunch, we gather new information, integrate it with what we know, and converge on a Probability value more closely approximate to reality. We may start with wildly different assumptions, but we’ll end up pretty close together.

That’s the intuition. Now let’s define it more precisely with a Python implementation.


import pandas as pd
df = pd.read_csv(
    'https://raw.githubusercontent.com/nonviolent-action-lab/crowd-counting-consortium/master/ccc_compiled.csv', 
    encoding='latin'
)
df.columns
df.head()

The columns attribute and the head method give us a feel for what’s available. With the shape method, we can also see that there are nearly 72,000 observations (as this is written). In inspecting the columns, note that we can calculate the probabilities across a wide range of different scenarios. For simplicity, I will pick one.

My guess is that, at a typical Protest, property damage is the likeliest type of occurrence. It’s the lowest cost form of ‘acting out’. On the other end of the spectrum, considering the severity of the consequences, demonstrations where the police are targets of violence are probably quite rare. As such, I will calculate the probability of what I’m guessing falls between these two extremes: protestors violently injuring other protestors. In notation form:

P(Injury|Protest) = P(Protest|Injury) * P(Injury) / P(Protest)

Following my import statements, I declare probability values for the building blocks of Bayes Theorem.

# Unconditional probability of someone being injured in the crowd
injury = df.injuries_crowd_any.mean()
# Unconditional probability of protest (as opposed to march, walkout, boycott, etc.)
protest = df.type.value_counts(normalize=True)[0]
# Probability of protest given that there are injuries
injured = df[df.injuries_crowd_any == 1]
protest_given_injury = injured.type.value_counts(normalize=True)[0]
protest_given_injury
# Calculate posterior probability
injury_at_protest = (protest_given_injury * injury) / protest

My result is 0.00975. This value has a very specific meaning: in the range of all the different methods for public expression – and the associated outcomes – the likelihood of a protest taking form that leads to injured people in attendance is slightly less than one percent. In terms of proportion, it would be interesting to compare this with other policy issues – like domestic violence or gun violence, for example.

I will leave it to the reader to judge whether this likelihood justifies the policy response we are seeing. Yet grounding the debate in empiricism will hopefully help people form their own opinions supported by data.

Thanks for reading!

Photo by Will Reyes on Unsplash
Photo by Will Reyes on Unsplash

Related Articles