There’s no need denying Data Science could be quite boring – especially if you don’t come from a Computer Science background and loathe the perspective of spending endless afternoons looking for wrong indentations or converting lists into strings and viceversa.
Mastering data science skills, however, is rapidly becoming an asset in many fields of knowledge, and data-driven methodologies such as Natural Language Processing or Social Networks Analysis could provide exciting insights even to traditional research in the Humanities— just look at the brillant work done at the Stanford Literary Lab.
Therefore, I’ve decided to devote my first post on Medium to show how data science techniques could be quite handy in analysing cultural products – the case study being one of my all-time favourite shows, CBS’s classic whodunit Murder, She Wrote (1986–1994).
![You already guessed what she writes about. This WordCloud contains every word from the series' episode titles, scraped from Wikipedia and plotted thanks to this wonderful library. [Credits: image by author]](https://towardsdatascience.com/wp-content/uploads/2020/11/1UtV5TnrhYgcn7pBP8ZRKSQ.png)
Although I do not expect you to have binge-watched all the 264 episodes (+ 4 television films) of MSW, I reckon you’ll probably be familiar with the show’s general premise: retired English teacher Jessica Fletcher (Angela Lansbury) from fictional Cabot Cove, Maine, becomes a bestseller mystery writer and starts to bear witness to a ludicrous number of murders, which she regularly solves through her analytic skills (and a good deal of mind tricks). To be fair, the abnormal increase in mortality rate which seems to follow her wherever she goes led conspiracy theorists to speculate she’s the actual murderer and the BBC to label the sleepy hamlet of Cabot Cove "the murder capital of the world" (not really, though) – in any case, the series offered viewers light-hearted and entertaining mysteries for more than a decade.
This article offers a data science-oriented perspective on MSW briefly looking at two interesting aspects of its structure, namely the network of its characters and the distribution of its fictional locations. Both projects required extensive web scraping and data manipulation with Python, but I’m not going to bother you with code: those interested will find links to the relevant Google Colab notebooks below. The results will not reveal shocking truths about the show you and your grandma loved to watch, but they will nonetheless offer a nice demonstration of the visualization power of data science tools.
Jessica’s Web
[The reference notebook for this section (MSW Interactions Parser) was provided by my good friend Michele Lacchia, who did the hard work long before I learned Python properly. You’ll see the difference between his terse script and my messy attempts later on; in the meanwhile, don’t forget to visit his outstanding blog.]
Even the casual watcher can easily recognise a fixed structure in MSW: most episodes begin with Jessica arriving in some place and getting involved in homicide investigations, and end with the culprit getting caught thanks to her deductions. This translates in a lot of episode networks where Jessica acts as the gravity point, linking together a number of background figures which will most often never come back (even though their actors certainly will, as the showrunners seemed to love casting the same people again and again in different roles)
Conversely, if the episodes are set in one of the the series’ two main venues, Cabot Cove or New York, troubles come knocking at Ms. Fletcher’s door, usually in the form of a relative/old friend/former student/totally random recent acquaintance who needs help to avoid conviction for a crime s/he obviously didn’t commit. Here the episodic structure is slightly altered because of some returning characters, but the focus on Jessica remains sharp.
Accordingly, to avoid drawing a network crowded by a hundreds of one-time extras, only characters which appeared at least three times during the series’ run were considered. To make things easier, the network was built on co-reference, i.e. a link (edge) was drawn between two characters (nodes) when they appeared in the same episode. This approach is based on the (empirically true) assumption that each character within an episode usually interacts with any other, forming what social networks scholars call a clique.
On the technical side, character names and interactions were extracted from the ‘Cast’ tab of each MSW episode on IMDB (sample) and exported into a .csv file which I fed to Gephi, the visualisation suite I use. Never blindly trust your algorithm: I double-checked the results with these lists of MSW regulars and I found six people whose names were not correctly recognised on IMDb. I made some adjustements, and even though three very marginal characters (Cabot Cove Gazette publisher Ben Devlin, beautician Corinne and NY porter Ahmed) remained out of the picture, the overall network representation could be accepted as reliable.
Here’s the visualization of the 988 interactions between the top 30 characters of Murder, She Wrote:
![The Fletcherverse. [Credits: image by author]](https://towardsdatascience.com/wp-content/uploads/2020/11/1RGdt7JphT_mzJw0WeVo7ew.png)
The graph above employs the Force Atlas layout, which basically "pulls strongly connected nodes together and pushes weakly connected nodes apart", while the colours reflect the characters’ primary residence (red for Cabot Cove, green for New York, violet for other cities).
At first glance: there’s little doubt the show gravitates around Jessica, with the nearest co-leading characters being Cabot Cove residents Dr. Hazlitt (William Windom), who is the second most-credited MSW character, and Sheriff Mort Metzger (Ron Masak), who overshadows its predecessor Sheriff Amos Tupper (Tom Bosley) by virtue of his longer tenure as Cabot Cove’s chief law enforcer.
What is interesting, though, are the roles and relations of minor recurrent characters, such as Eve Simpson (Julie Simpson), Cabot Cove’s realtor, who appears to form a small cluster with the other ladies from Loretta Spiegel’s beauty parlor (upper right of the graph). Even more prominent is the clique on the left, centred around reformed jewelry thief and insurance investigator Dennis Stanton (Keith Mitchell), who shares a lot of screentime with his secretary Rhoda Markowitz, his boss Robert Butler and bulldog policeman Lt. Catalano.
Most non-Cove characters, however, are quite isolated: if we set aside Jessica’s two favourite nephews, Grady Fletcher (Michael Horton) and Victoria Griffin (Genie Francis), which are also linked to their spouses, a number of people display only a single tie with the protagonist. This broad category includes both cops regularly helped by the sleuth (such as Liutenants Gelber and Caceras) and longstanding fan favourites such as charming MI6 operative Michael Haggarty (Len Cariou) or hard-boiled private eye Harry McGraw (Jerry Orbach), whose popularity was enough to grant him a spin-off on his own (the short-lived _The Law & Harry McGraw_, 1987–88).
Looking at the graph, it might have made more sense to focus a spin-off on Dennis Stanton: among all main characters not stricly related to Jessica, he is the only one to have a cluster of his own (as pointed out in the comments, however, Stanton was introduced after the McGraw spinoff, when the producers weren’t up for such experiments anymore). This eventually depends on the six episodes, in Seasons 6 and 7, in which he features as the leading character: because of Angela Lansbury’s reduced commitment to the series, the showrunners had indeed to insert "bookend" episodes where she acted only as narrator and the investigations of Stanton and his team featured prominently.
Fans, however, tended to dislike these "filler" episodes without Ms Fletcher, and NBC producers ultimately decided to abandon Stanton and bet on McGraw for a new show – even though the experiment proved unsuccessful and Orbach soon returned to star as a recurring character in MSW.
The Atlas of Murder
[Here’s the reference Colab notebook: MSW Episode Localizer. I did it by myself, so expect low-quality code, but it works.]
Another interesting task data science could tackle was finding where each MSW episode is set and then plotting Jessica’s travels on a map: despite many episodes playing in Cabot Cove and New York, where she moves in Season 8 to teach at Manhattan University, the action often unfolds in other cities and towns Ms. Fletcher visits for personal (family gatherings of all sorts, visits to old friends) or professional (book tours, conferences, awards) reasons.
To sum up the process behind the results: I started from this nice blog, which contains a detailed plot summary for all MSW episodes. I scraped all plot summaries with Beautiful Soup, cleaned them with regular expressions and eventually performed Named Entity Recognition (NER) on the text chunks with SpaCy – I was looking specifically for words which fell in the GPE/LOC categories, indicating various types of locations. The results were put in a dictionary, with places as keys and frequencies (how many times an episode is set there) as values; the dictionary was plotted on a map with Basemap (I know I should have used Cartopy, but it does not go along well with Colab and, well, I just find Basemap handier).
After all this trouble I realised there was a simpler way – as teachers (should) remark in Data Science 101, an insufficient exploratory data analysis (EDA) often results in doing a lot of job for tasks which could have been solved in an easier way. The blog posts were actually tagged from the start with the episode’s (approximate) location; therefore, one could have just scraped the tags with BeautifulSoup and plotted them straight away.
It turned out this latter method was more precise, despite my early assumption that extracting locations from the plot summaries would have yielded more geographical details – it did, in a couple of times, but it also missed loads of important places where episodes were set.
Furthermore, I must confess I couldn’t, in all conscience, let Cabot Cove out of the picture just because, well, it’s entirely fictional. Accordingly, I added as stand-in the village which is most commonly cited as its real-life inspiration, Boothbay Harbor (by the way, external shots of Cabot Cove were actually taken in Mendocino, California).
So, here’s the world map:
![Around the World in (many more than) 80 Murders. [Credits: image by author]](https://towardsdatascience.com/wp-content/uploads/2020/11/1uv0QQ5J8wkd5kK6H4kFOTw.png)
As expected, almost all episodes are set in North America, with a dozen in Europe (especially in London and Ireland, where Jessica’s roots lie) and some occasional trips to Asia (Japan and Hong Kong), Africa (Cairo) and Oceania (Australia). Let’s zoom on the USA:
![United States of Fletcher (USF). [Credits: image by author]](https://towardsdatascience.com/wp-content/uploads/2020/11/1JRCkAYiCn_fFQuT7xuO8Kw.png)
It seems Jessica has traveled quite widely across her country; despite an apparent dislike for the Corn Belt, she has visited most U.S. states, with two clear clusters in California and in the Northeastern coast. Broadly speaking, all zones of the map are occupied, and the NER on tags even missed some more precise locations, such as the Arizona county in which "The Secret of Gila Junction" (S12E3) takes place.
This wraps it up for today – this example of data analysis was quite simple, but it yielded some interesting results. Hope you enjoyed the read, and please do comment for corrections and improvements. Until next post!