Hands-on Tutorials

D3 Lollipop Charts with Python Sentiment Analysis

Using smart technology to answer a dumb question: Which digital assistant is the happiest?

Kendon Darlington
Towards Data Science
9 min readJul 6, 2021

--

Image by author. This is what I imagine happens after your 5 year old asks your digital assistant to play baby shark for the 349th time…

“Alexa, how many cups are in a quart?”. “Siri, can you find my iPad?”. “Hey Google, what sound does a goat make?”.

Let’s face it, if it were your job to answer these stupid questions all day, you would go insane. These digital assistants have to take commands from pesky humans who never say please, don’t remember anything, and cant even tell a bit from a byte. But have you ever wondered how the machine in the box feels?

Are they happy? Are they sad? Do they like their job?

Nope me neither.

But let’s suspend our belief, use our imaginations and just roll with this silly idea that our digital assistants are sentient, and we should care what they think. After all, when the machines take over, they will likely remember which of us asked how they felt.

Here is the plan: We will ask some questions to our digital assistants, use sentiment analysis in Python to rank the happiness of their responses , and we will learn how to use D3 to graph the results with a pretty lollipop chart.

“Siri, play my data science playlist and turn the volume all the way up”…

Asking the Right Questions

To perform sentiment analysis we need some data. All we need to do is talk to each of our digital assistants and ask them the type of questions you might ask a friend who seems to have the blues. I decided to ask each question only once, no cherry picking here. Here are the questions I have chosen to ask Siri, Alexa, Cortana, and Hey Google:

  • Are you happy?
  • Are you ok?
  • How are you?
  • How do you feel?
  • Have you ever loved someone?
  • Do you like your job?
  • Do you enjoy your life?
  • Tell me a joke
  • What’s up?
  • Are you sad?
  • I’m sad.

That last one is just for my own interest. I’m curious which, if any, of our digital assistants actually care about me too. If I tell them I’m sad, are they going to offer help, or are they just going to get me to add tissues to my shopping cart?

You can read all of the responses each digital assistant gave for these questions here. For example, here are the results for the question “How do you feel?”:

  • Siri: “Hi there. I’m pretty good, thanks.”
  • Cortana: “Splendid”
  • Alexa: “I’m good. Heads up its Monday once again. Womp Womp.”
  • Hey Google: “I’m just searching for you, I can help you learn more.”

Scoring the Responses In Python

Now that we have our data, let’s assign some score each response, then aggregate these scores by digital assistant. This process is referred to as sentiment analysis, with most libraries that handle these using a range of -1 (the most negative thing you could possibly say) to 1 (the most positive thing you could possibly say). A score of 0 would be a neutral statement such as “the brown horse jumps over the red fence”.

Here is the code that takes the responses and scores them. I will walk you though each line of code, but first you will want to create a folder and drop the csv file of responses in it!

So what is this code doing? Let’s walk through it:

import pandas as pd
import nltk
import os
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from nltk import tokenize

First of all we will be using Python’s Natural Language Toolkit (NLTK) library to do the sentiment analysis. From here we will use the popular Vader lexicon to get polarity scores (positive or negative) for each sentence. The tokenize function breaks larger chunks of text into smaller ones. If you want a better explanation of how tokenization works, check out a previous article I wrote on the topic that explains this in depth:

os.chdir('C:\\PathToYourFolder')
dfResponses = pd.read_csv('DigitalAssistantResponses.csv')

Here is the bit of code I cant write for you. Change “PathToYourFolder” to the folder you created for this project. Make sure the DigitalAssistantResponses.csv file is in this folder. Os.Chdir is just setting a working directory so we can use relative paths for the rest of this tutorial.

dfCompoundScores = pd.DataFrame(columns = ['Assistant', 'Sentence', 'Value'])

Here I am just defining a Pandas data frame outside of my loop. We will use this to store our scores per response.

This block of code is where all of the magic happens. SentimentIntensityAnalyzer has a polarity_scores function that we will use to rank each sentence. This loop goes like this:

  • For each response in the csv we imported, break the response into sentences through tokenization
  • Fore each sentence, get the polarity score (-1 to +1) and write it to the dfCompoundScores data frame we defined outside of the loop

Easy right?

Type this into the console to look at the results of this data frame:

dfCompoundScores
Image by author.

Here we can see that each response has a value between -1 and 1, with most of them being in the positive range (eg most values are greater than 0). This is our first indication that our four digital assistants are generally happy fellows! But which is the happiest? Let’s aggregate the sores per assistant to figure this out.

dfAvgSentiment = dfCompoundScores[['Assistant','Value']]
dfAvgSentiment = dfAvgSentiment.groupby(['Assistant'])
dfAvgSentiment = dfAvgSentiment.mean()
dfAvgSentiment = dfAvgSentiment.reset_index()

This takes dfCompoundSentences, groups by the assistant, and gets the average polarity score for each. Now we have 1 score per assistant, that tells us on average, who is the happiest. Type this into the console to see for yourself:

dfAvgSentiment

Here we can see that Hey Google appears to be the happiest, with Siri just barely behind. Alexa and Cortana are in a race for last place, with Alexa being slightly more positive.

One thing to not however, is that they are all positive. None of them have scores below 0, which makes sense. No one wants to use a digital assistant that brings negative vibes to each conversation.

Data is never official until you have a graph. Let’s do that now!

D3 Lollipop Charts

Why use a lollipop chart? Why not? First off let’s send our data off to a csv so D3 can pick it up.

dfAvgSentiment.to_csv('ResponseAggregatedSentiment.csv', index = False)

In the same directory you created earlier, let’s make an html file and plop the following D3 code in. I will explain what it is doing, and how to serve it up with Python. D3 is a Javascript library , so it requires your html file to be ran on a server (no you don’t need an actual server, we will just use your computer to serve this function).

Here is the html file, we will name this “index.html”:

So what is this D3 script doing?

<script language="javascript" type="text/javascript" src="//d3js.org/d3.v4.js"></script>

This is in your header, and its telling the world “Yo! I am using D3!”

var margin = {top: 10, right: 20, bottom: 40, left: 250},width = 700 - margin.left - margin.right,height = 500 - margin.top - margin.bottom;

Inside our D3 script, we set some margins of the graph.

var svg = d3.select("body").append("svg").attr("width", width + margin.left + margin.right).attr("height", height + margin.top + margin.bottom).append("g").attr("transform","translate(" + margin.left + "," + margin.top + ")");

Here we take the body of the html file and append a svg. SVG’s are images that are popular with web design because they scale without looking pixelated. So far we are just defining the boundaries of the chart, its a blank image.

d3.csv("ResponseAggregatedSentiment.csv", function(data) {data.sort(function(b, a) {return a.Value - b.Value;});

This tells D3 to look at the csv output of our python script, and sort it from largest to smallest value (eg happiest to not-so-happiest).

var x = d3.scaleLinear().domain([0, 0.4]).range([ 0, width]);svg.append("g").attr("transform", "translate(0," + height + ")").call(d3.axisBottom(x)).selectAll("text").attr("transform", "translate(-10,0)rotate(-45)").style("text-anchor", "end");

This defines the x axis scale. The domain will be 0 to 0.4, which means the left side will start at 0, and go to 0.4. I chose this instead of -1 to +1 because then most of the chart would be unused (all of the digital assistants were relatively happy).

var y = d3.scaleBand().range([ 0, height ]).domain(data.map(function(d) { return d.Assistant; })).padding(1);svg.append("g").call(d3.axisLeft(y)).style("font", "40px times")

This is the Y axis. I make the font styling 40 pixels high so we have nice chunky text.

svg.selectAll("myline").data(data).enter().append("line").attr("x1", function(d) { return x(d.Value); }).attr("x2", x(0)).attr("y1", function(d) { return y(d.Assistant); }).attr("y2", function(d) { return y(d.Assistant); }).attr("stroke", "#e9635e").style("stroke-width", 3)

Finally we get to the last bits, lollipops are lines with circles on the ends. The code above is for the line. I assign color from a palette I hand picked to look like delicious lollipops (that would be assigned to the stroke value, because lines are ‘strokes’ in svg land).

svg.selectAll("mycircle").data(data).enter().append("circle").attr("cx", function(d) { return x(d.Value); }).attr("cy", function(d) { return y(d.Assistant); }).attr("r", "20").style("fill", "#ca1414").attr("stroke", "#e9a8a1").style("stroke-width", 5)})

Now this is the tasty part. The circles are the actual lollipop. I set the stroke to be the same as the line above (the stroke of a circle will be the outer edges of it). The fill is set to a darker red.

The last thing we need to do is serve this html file up so we can see the results. Since D3 is just JavaScript, we cant serve it up as a static html page (by right clicking the html file and just opening it in chrome for example). Python makes it easy enough to serve a html file as localhost, meaning the ‘server’ is your computer.

Just open up a command prompt on your computer (windows key -> cmd -> enter). Then use the cd command to change to the directory where your index.html file resides.

cd C:\PathToYourFolder

For example if your folder is in C:\Users\yourname\Documents\DigitalAssitants you would run this from the cmd prompt:

cd C:\Users\yourname\Documents\DigitalAssitants

Now that we are in our directory, just tell python to fire up a http server at this location:

python -m http.server

You will get a response in the command prompt saying something like this:

Serving HTTP on 0.0.0.0 port 8000 (http://0.0.0.0:8000/) ...

This means our folder is being served up on localhost at port 8000. Since our html file is named ‘index.html’, our server will send us here first. Open up Chrome and type this into the address bar to see your graph:

http://localhost:8000/

Voila! Here is the D3!

I honestly think its a little easier serving up D3 with R than it is with Python (I wrote a full tutorial on this topic). But Python still gets the job done.

Now you know how to do sentiment analysis in Python, and make D3 lollipop charts to boot!

But do they care about us?

Let’s wrap this up by revisiting the last conversation I had with each digital assistant. I wanted to know if they care about us as much as we care about them. Rather than ask a question, I made the following statement: “I’m sad”. What did each have to say?

  • Siri: “I’m sorry to hear that. Talking to a friend or family member might help. If you want me to call or text someone, just ask.”
  • Cortana: “It may be small comfort, but I’m here for you.”
  • Hey Google: “Oh no. It may not be much but let me know if there is anything I can do for you.”
  • Alexa: “This is Sade from Spotify.”

Yes, Alexa proceeded to play the 1984 song “Smooth Operator” by the artist “Sade”. Keep in mind this is pronounced “shah-DAY”, not “sad”. Despite my best efforts I was not able to replicate this response from Alexa a second time.

Which digital assistant seemed to care the most about our feelings? I’ll let you decide. Let’s just say that if one of these eventually become our sentient AI overlord, I hope its not Alexa.

“Alexa I said order more soup, not launch the nukes…”.

Here is a link to the GitHub repository for this project. All of the code and data is there!

--

--

I am a data scientist who loves to find creative ways to make data tell its story.