The world’s leading publication for data science, AI, and ML professionals.

What Drives People to Protest, and What Leads to the Most Success?

Between the nationwide farmer protests in India and ongoing Black Lives Matter demonstrations, the past months have shed light on a range…

A data-driven multidisciplinary approach to key questions surrounding protests by Malika Mohan & Maria Olmos. Accompanying code can be viewed on GitHub.

Photo by Teemu Paananen on Unsplash
Photo by Teemu Paananen on Unsplash

Between the nationwide farmer protests in India and ongoing Black Lives Matter demonstrations, the past months have shed light on a range of citizens’ grievances with state and national governments, driving them to Protest in order to have their demands met.

These current events prompted us to want to examine the act of protesting further, and explore what protesting looks like around the world, what drives people to protest and what factors may be significant in determining whether or not government actors accommodate protesters requests. We opted to take a multi-disciplinary approach to these questions, conducting a literature review on protests and political power, drawing insights from data visualizations in Tableau, and analyzing statistically significant variables and constructing machine learning models in R.

Data Understanding

In addition to our literature review, we utilized a dataset from the Mass Mobilization Project for our analysis which incorporated protest events that targeted the government and involved at least 50 people between the years 1990 and 2019. Mass Mobilization collected the data by evaluating newspaper sources such as the New York Times, Washington Post, Times of London, and established a baseline of articles that needed to be returned and examined about a particular protest event. The ensuing dataset included variables on the start date and end date of the protest, location, protestor identity, number of protesters, whether or not the protest was violent, protesters demands, state responses and notes/textual information and context to describe the protest event. More information about the data collection process and dataset can be found in the Mass Mobilization Data Project Codebook and User’s Manual. An interesting distinction to note is that while protest data was available for several regions and countries, including the North American countries of Canada and Cuba, there was no protest data available for the United States of America.

Data Cleaning

A lengthy process was involved in making our dataset suitable for analysis via feature engineering, column deletion and checking for Null and error values. The steps undergone included: replacing all the blank spaces and cells with just ‘.’ in it with "NA", deleting irrelevant/meaningless fields such as ID, CCode and Sources, creating a new Start Date field concatenating Start Day/Start Month/Start Year and a new End Date field concatenating End Day/End Month/End Year, creating a new column for the protest length by calculating the difference between the Start Date and End Date variables, deleting rows with majority empty columns/cells, deleting entirely null columns, turning string number variables into numerical (e.g. "dozens"), creating a new text-based csv with the context information from the ‘Notes’ column, and lastly creating a State Response dummy variable to operate as our target variable via if/else functionalities that turned negative high-level state responses (e.g., killings, shootings, ignoring) into "0"s and state accommodation of the protesters demands to a "1".

Exploratory Data Analysis

To begin deriving insights, we conducted a range of exploratory data analysis across Tableau and R assessing distributions and counts of variables and answering key questions about the features of protests around the world as can be seen below.

Visualizations

Image by Authors
Image by Authors

European countries such as France and the United Kingdom have the highest number of protests in the world. In Asia, South Korea and China are the leading countries for protest rates. Furthermore, Venezuela and Brazil are the leading countries in South America for the number of protests.

Image by Authors
Image by Authors

Throughout the years the number of protests has slowly increased with some fluctuations in between. In 2011 we experienced the highest spike since 1990, followed by a higher increase in 2015 with a total of 857 protests.

Image by Authors
Image by Authors

Between the months of January and May we observed the highest levels of protests. Within this range, March had the highest number of protests. Additionally, the number of protests seem to be lower in the second half of any given year.

Image by Authors
Image by Authors

Protestor demands range from a variety of social, economic and political issues. The most common demands are related to political behavior and process. This category includes any demands relating to political reform and the protection of democratic procedures.

Image by Authors
Image by Authors

The most common response from the governments is to ignore the protest, followed by crowd dispersal. These types of government responses are expected as most of the protests were classified as non-violent.

Image by Authors
Image by Authors

The most frequent driver of protests is in response to issues with political behavior and the political process, followed by labor wage disputes. These mirror the types with the longest length of protests; as the same two type of demands have protests that cover the highest amount of days. Similarly, the lowest count and average length of protest is for ones involving social restrictions demands.

Image by Authors
Image by Authors

A breakdown of how states respond to violent vs. non-violent protests, with similar proportions of each types, but less frequency for non-violent ones.

Image by Authors
Image by Authors

The majority of protests do not result in state accommodation, which will make it interesting to explore the factors that do contribute to the ones that are met with accommodation and collaboration by the government. Additionally, the length of protests are typically under one day with the median falling at 0 and the mean at 1.664. The max, however, is a protest that spans several years with the length at 938 days (not depicted in visualization for scale-purposes).

Image by Authors
Image by Authors

Despite the way in which the media often depicts protests (as discussed in the Literature Review section), the majority of protests do not involve violence by the protesters.

Image by Authors
Image by Authors

The participants category with the highest frequency is the range of 100–999 followed by 50–99 indicating they have typically been on the smaller side, however this is frequently not reported as indicated by the significant level of NA’s.

You can continue your own exploration on the count of protest data utilizing our interactive dashboard linked here (preview below), and filter by Country, Time Frame, Protester Demands and State Responses!

Testing

In addition to visualizations, our exploratory analysis involved testing for things like independence and statistical significance in our variables. The most interesting findings included a statistically significant positive correlation between state response and length (the longer the protest, the more correlated it is with a accommodation by the government) and chi-squared tests revealed to us that the relationships between the State Response Dummy variables and Length of the protest and Length and Protester Violence (present or not) were not independent.

Modelling

After iterations and comparisons of various models, and converting categorical variables into factors in order to be able to use them within models, the ones that best suited our data and provided us the most insights were a simple generalized linear regression model (glm) and a more nuanced Random Forest model. These models can be wielded in order to predict if certain factors surrounding a protest are likely to result in state accommodation of requests or not. Additionally, they also helped in confirming variables identified as significant in state accommodation of protester demands, as identified in our exploratory data analysis stage.

For reference, generalized linear models are "a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution" (Wikipedia) and Random Forest models wield ensemble learning by "constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean/average prediction of the individual trees." (Wikipedia)

After constructing a linear model with all of the explanatory variables against the target variable of the created State Response dummy field, the statistically significant variables (p value of < 0.05) included length, protester violence, participants category, and protester demand. Additionally, based on the factors the outputs showed us that specific statistically significant positive relationships were the year 2008, the region of Asia, length, protester violence, a participants range of 2000–4999, and the protester demand of price increases/tax policy . Negative ones included the years 1992, 1993, 2003, 2011, 2016, 2019, the regions Central America, South America and Europe, startmonth, and the protester demand of police brutality. The model had an adjusted R-Squared value of 0.6073, suggesting 60% of the data is explained by the model. More insights into the outputs of this linear model can be found by running the code in the accompanying GitHub.

The trained and tested random forest model had a lower r-squared of 0.1195, but still provided us valuable information based off the node purity and decrease in error with the additional amount of classification trees utilized (500 trees) as can be seen below:

The variables our random forest model opted to include (listed in order of node purity) were: year, start day, start month, participants_category, protesterdemand, length, data region and protester violence (dummy variable on whether or not violence was used in the protest). The higher the node purity indicates that more of the data belongs to a single class, and so variables are able to be split more definitively by these variables because they are more significant in determining whether or not a state is likely to accommodate protesters demands.

Finally, we conducted textual modeling on the context notes that were provided in the dataset, in order to view the highest frequency of different verbiage in describing the protests. The highest frequencies were terms or word stems like (unsurprisingly) protest (20,222), polic (10,252), govern (7984), and demonstr (7597). More can be seen in the visualization below, with the larger and more centralized terms being ones with higher frequency in the dataset:

Literature Review Findings

There was a breadth of protest-related literature available on the Internet, covering a scope of topics including whether or not protests are effective, what mass protests around the world have in common and the way in which the media portrays them.

While the count of state accommodations versus negative responses/demands being ignored that was shared in the Exploratory section of this paper may have been disheartening, research revealed to us that in the right environments and under the right situations, protests do in fact work and hold merit.

As an article in The Atlantic read, "In the short term, protests can work to the degree that they can scare authorities into changing their behavior. Protests are signals: "We are unhappy, and we won’t put up with things the way they are." But for that to work, the "We won’t put up with it" part has to be credible. Nowadays, large protests sometimes lack such credibility, especially because digital technologies have made them so much easier to organize." (Source)

Additionally, the most persuasive protest factors to prompt success have been found to be the size/scale of a protest event and whether the protesters agree among themselves and are unified in their message (Source).

As for the way in which protests are now portrayed in the media, it is often in a negative lens, depicting violence and hostility. This may act as counter-productive and minimize the protesters movements in the fact that it contributes to rhetoric that protesters are just angry and inciting violence or harm to communities, rather than effectively demonstrating the ways in which these feelings stem from the systematic problems they are fighting against.

The framework used to describe this is called the protest paradigm, which is defined as a "routinized pattern or implicit template for the coverage of social protest". The Oxford Handbook contextualizes this as how "When movements start to grow bigger or disruptive enough to engage media attention, the coverage they receive is often antagonistic-as a vast array of research spanning decades of media reporting of protest movements have established. Journalists have accorded hostile treatment to antiwar protests, labor protests, abortion law protests, antipolice demonstrations, antinuclear movements, and antiglobalization protests, among others – often by ridiculing them or portraying them as violent." (Source)

Conclusion & Implications

Some of the most significant takeaways as to the factors that contribute to the success of the protest are that the length and protest violence had positive statistically significant relationships with state accommodation. As some literature points out, however, these may be symptoms of large-scale problems necessitating long and more violent protests and are higher-importance thus prompting more response from the government; it is important to draw the distinction that it, as the other relationships we have referenced throughout, is not necessarily a causal relationship, but a relationship nonetheless.

Additionally, a particularly relevant statistical takeaway amidst current sociopolitical conversations is that there is a negative statistically significant relationship between accommodating demands and police brutality protests (meanwhile, a positive one with price increases/tax policy issues, suggesting the government is more likely to accommodate protests that deal with these topics). Moreover, from a global perspective, governments in Central America, South America and Europe are less likely to respond to protester demands, while Asian governments are more likely. Interesting next steps in this analysis would include an examination of regional policies on free speech, government institutional structures and the inclusion of protest data from the U.S.

With protests being able to be planned and conducted more quickly via social media and digitized community platforms, it will be interesting to see what the future of this form of social change holds. While some researchers contend that it may cause them to potentially lose credibility and the ability to make change due to their increased frequency or reduction to trendified/Instagrammable events, others maintain that it will continue to operate as a way to ignite political change from the outside when done right and in an environment that allows for it.

Given current events and the steady stream of protests intertwined in our media streams and lives, we hope this multidisciplinary glimpse into protests across the world over the last nearly 30 years provided some interesting insights into where and why government protests occur and when they are more likely to result in success!


Related Articles