
Last month, The Intercept published an article claiming that, "In recent presidential cycles, the velocity of edits made to a Wikipedia page have correlated with the choice of vice presidential running mate." The article focuses on Kamala Harris, and the increasing number of edits that took place on her Wikipedia page in June.
The article argues that the pace of edits can be interpreted as signaling the strength of her potential to be named as VP candidate. Interesting. But is it valid?
Well, now that a month has passed and a selection has still not been made, I decided to take a look at these changes for myself. I wanted to see how her rate of edits stacked up against other potential candidates. I was also curious to see if there were any other correlations we could draw with the results, to deepen our understanding of their meaning. So, to Python I turned.
There is no single definitive list of 2020 potential Democratic VP candidates, so since we’re working with Wikipedia, I’ll stay to stay true to the source by collecting a list of from the Wikipedia article, "2020 Democratic Party vice presidential candidate selection." Here are the nominees.

Getting Revision Timestamps
To achieve my goal, I’ll need to retrieve data from Wikipedia about each potential candidate. For this we can use the MediaWiki action API. Let’s get to it!
Take some names
I’ll start by preparing a list of names. We’ll use this list of names to look up timestamps of revisions for their respective Wikipedia articles:
nominees = ['Karen Bass', 'Keisha Lance Bottoms', 'Val Demings', 'Tammy Duckworth', 'Kamala Harris', 'Michelle Lujan Grisham', 'Susan Rice', 'Elizabeth Warren', 'Gretchen Whitmer']
Get some timestamps
Now I’ll deploy a function that allows us to make an API call to Wikipedia, and returns a list of revision timestamps for a given article. We’ll use the requests library for this:
So, if I run this function on the Wikipedia article for Tammy Duckworth, its usage would look like this:
get_revision_timestamps('Tammy Duckworth')
And it would return a list that looks like this:
print(get_revision_timestamps('Tammy Duckworth'))
['2020-08-06T18:19:43Z', '2020-08-06T18:18:43Z', '2020-08-06T18:16:01Z', '2020-08-06T18:15:00Z', '2020-08-06T18:13:51Z', ...]
As you can see, we have returned a list of timestamps, ordered newest-to-oldest, stored as strings. This is only a partial view of the full list, which contains 2484 timestamps at time of writing. That’s a lot of revisions!
Graphing the Timestamps
Now that we know how to get timestamps, we can do this for our full list of nominees. But before we do, let’s figure out how to convert them into a graph. For this, we will turn to matplotlib’s pyplot module. While pyplot can handily manage dates, we first have to prepare our data in a way such that Python can interpret it correctly.
Reverse our list of timestamps
Because our list of revision timestamps was generated newest-to-oldest, we also should reverse it, in order to plot forward in time.
timestamps = get_revision_timestamps('Tammy Duckworth')
timestamps.reverse()
print(timestamps)
['2006-01-11T23:50:14Z', '2006-01-11T23:50:48Z', '2006-01-12T00:04:03Z', '2006-01-12T00:04:45Z', '2006-01-12T00:06:14Z', ...]
Now they are in chronological order. Great!
Convert list of timestamps from strings into datetime objects
Unfortunately, our timestamps are still just strings. To turn them into an interpretable date format, we must convert them into datetime objects. For this, we can use Python’s built-in datetime library. Note that I’m using the datetime module inside the datetime library.
from datetime import datetime
dates = []
for stamp in timestamps:
d = datetime.strptime(stamp, '%Y-%m-%dT%H:%M:%SZ')
dates.append(d)
Okay! We have successfully converted our timestamps into datetime objects:
print(dates)
[datetime.datetime(2006, 1, 11, 23, 50, 14), datetime.datetime(2006, 1, 11, 23, 50, 48), datetime.datetime(2006, 1, 12, 0, 4, 3), ...]
Plotting datetime objects
Now that our dates can be interpreted by Python, we can go ahead and plot them. For this step, we will use pyplot’s "plot_date" function. This function takes two arguments: x values and y values. For x values, we are using the list of datetime objects. For y values, I am using a range of numbers the same length as my list of datetime objects. This will allow me to increment the count (y-axis) by 1 for each date that is plotted (along the x-axis).
import matplotlib.pyplot as plt
plt.plot_date(dates, range(len(dates)))
The "plot_date" function looks at our list of dates, and finds the max and min. Then, it creates an evenly spaced sequence of dates to use as our x-axis, and plots the dates accordingly. It also senses whether our x or y value contains dates, and forces those values along the x-axis.
As with any matplotlib figure, we can adjust the labels and formatting. I’m going to keep it minimal here so that we can cut to some results. I’ve added a title, and put labels on the axes:
plt.title('Tammy Duckworth Wikipedia Revisions')
plt.xlabel('Time')
plt.ylabel('Revisions count')
plt.show()
And the result:

Voila! For fun, here’s the same graph, but for Joe Biden‘s Wikipedia page. I think it’s a nice example of this plotting method’s narrative capacity:

Comparing Page Revisions
So, for the matter at hand. Now that we can get timestamps, convert them to datetime objects, and plot them, let’s do it for our full list of potential Democratic VP nominees. Notice that I start by importing "GetRevisionTimestamps," which is the module that contains my timestamp retrieval function, "get_revision_timestamps." If you want to avoid this import, just copy/paste the defined function somewhere above this block.
Results:









As we can see, not all curves are created equal. These curves vary in the amount of data represented, as well as scale. They also vary in span of time.
Still, we can see some interesting trends already. Some plots appear linear, some stepped, some nearly quadratic. Some have moments of each. Where we see breaks in some plots, we can infer that no revisions were submitted. And yet, we can still chart the upward trajectory through time.
Now, the image above represents all of the revisions for each article. But, what if I want to filter all of the plots to reflect just a specific date range, like in The Intercept’s piece? I could write a function that takes start and end dates as optional arguments, and only plots dates within said range:
Results, filtered and scaled:









Now some results are starting to show. We can see that the trend identified in The Intercept’s article indeed does still hold true for Kamala Harris. While it’s not yet so easy to say what this means, we can also notice some emerging increases for Karen Bass, Tammy Duckworth, and Susan Rice as well. Let’s zoom in a little more; I’ll look at the last three weeks only, just like The Intercept’s article did:









Okay. Now a pattern emerges. Let’s focus on the four candidates with the largest number of recent edits:




Well. This certainly complicates the story. If this measure is to be trusted, it looks like Kamala Harris has some new competition. But, now we should ask ourselves, how much can we trust this measure?
The Intercept article goes into detail about the nature of revisions made to Harris’ Wikipedia page, which certainly makes a convincing argument for a motivated attempt to control her public image on the platform. However, this criticism is fairly distinct from the initial claim made. The original claim states that the selection of a VP nominee correlates with the velocity of revisions to their Wikipedia page. While it may be worthwhile to consider the nature of the revisions made, for now I want to focus on this original claim. If the claim were true, it would suggest that measuring revisions could provide a useful indicator for understanding the Biden campaign’s strategy. But, at least for now, this measure provides an indecisive result.
Contrasting with Google Trends
For fun, and out of curiosity, I went ahead and grabbed some data from Google Trends, to see how they report interest in the same four candidates over the same three week period. Then I plotted the data with pyplot:

The results are similar to what we see from looking at Wikipedia revisions. Keeping in mind that the Wikipedia results are cumulative, and therefore display acceleration differently, we have to be careful about drawing immediate comparisons. Let’s go ahead and correct the Google results to reflect a cumulative measure:

We begin to see a familiar picture again. It appears that, for these candidates at least, the number of Wikipedia revisions made seems to track with overall interest level. So, is it really an indicator of a likely selection?
Additional Notes
Room for Improvement
This plot comparison could certainly be improved. As an example, I’ll mention that while The Intercept piece’s claim dealt with the velocity of edits, the content of their analysis really only dealt with the raw number of edits. If we actually wanted to explore changes to the velocity of edits, one possible method would be to normalize the average number of edits per day recently, to the average number of edits per day typically seen over the article’s lifetime.
I didn’t carry this analysis all the way through, but perhaps somebody else will! So here is a series of functions will return the average number of edits for a given article over a provided unit of time:
For some quick results, I wrote a script that modifies the "avg_wiki_edits" function to generate average recent edits for a given time period and unit of time, then uses "avg_wiki_edits" to generate average lifetime edits for a given time unit. With these two results, it calculates a ratio, and then spits the results out in sentence form:
Normalized Results:
Karen Bass:
Average number of edits per day over article lifetime: 1.94
Average number of edits per day between 2020-07-17 and 2020-08-07: 8.53
Between 2020-07-17 and 2020-08-07, Karen Bass's Wikipedia page has received 4.396907216494845 times more edits per day than average.
Tammy Duckworth:
Average number of edits per day over article lifetime: 2.96
Average number of edits per day between 2020-07-17 and 2020-08-07: 4.67
Between 2020-07-17 and 2020-08-07, Tammy Duckworth's Wikipedia page has received 1.5777027027027026 times more edits per day than average.
Kamala Harris:
Average number of edits per day over article lifetime: 3.81
Average number of edits per day between 2020-07-17 and 2020-08-07: 7.0
Between 2020-07-17 and 2020-08-07, Kamala Harris's Wikipedia page has received 1.837270341207349 times more edits per day than average.
Susan Rice:
Average number of edits per day over article lifetime: 2.79
Average number of edits per day between 2020-07-17 and 2020-08-07: 6.06
Between 2020-07-17 and 2020-08-07, Susan Rice's Wikipedia page has received 2.172043010752688 times more edits per day than average.
When we normalize the activity, Karen Bass moves to the top of the pack. Interesting. But will it translate into a VP pick? Only time will tell.
Do you have any ideas on how these graphs could be refined? Did you glean any insights from the results? Have ideas to make the process better, or push the analysis further? If so, please share them in the comments!