
In recent times there has been a rise of smart home devices – these devices anticipate user input, responds, and can automate control of home amenities. Companies such as Amazon and Google have come out with such devices, there are numerous models of Amazon’s Echo devices and Google’s Nest Hub devices so determining which one is best is surely a difficult choice.
In this article we will be looking at reviews of Amazon’s Echo devices found [here](https://towardsdatascience.com/cleaning-preprocessing-text-data-for-sentiment-analysis-382a41f150d6) on Kaggle using NLP techniques. In my previous article here, we covered in more detail, the data preprocessing steps which are the necessary first steps in our NLP project, here we will mention them briefly.
So let’s get started!
First, we import the necessary libraries:
import pandas as pd
import numpy as np
import pickle
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix, roc_auc_score, roc_curve, auc, log_loss
import gensim
from gensim import corpora
from gensim.models import LdaModel, LdaMulticore
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
After preprocessing the data, we pickled the cleaned file, so here we open up our pickled file:
with open('alexa_reviews_clean.pkl','rb') as read_file:
df = pickle.load(read_file)
Let’s take a look at the first few lines of the dataframe:

First we will perform topic modeling, to preprocess our data, we must transform our new_reviews into a corpus.
# CREATE DICTIONARY TO COUNT THE WORDS
count_dict_alex = {}
for doc in df['new_reviews']:
for word in doc.split():
if word in count_dict_alex.keys():
count_dict_alex[word] +=1
else:
count_dict_alex[word] = 1
for key, value in sorted(count_dict_alex.items(), key=lambda item: item[1]):
print("%s: %s" % (key, value))

Our output shows that there are a numerous amount of words that occur only a few times in the total reviews, so we will remove those words that occur less than ten times and then create our corpus.
# REMOVE WORDS THAT OCCUR LESS THAN 10 TIMES
low_value = 10
bad_words = [key for key in count_dict_alex.keys() if count_dict_alex[key] < low_value]
# CREATE A LIST OF LISTS - EACH DOCUMENT IS A STRING BROKEN INTO A LIST OF WORDS
corpus = [doc.split() for doc in df['new_reviews']]
clean_list = []
for document in corpus:
clean_list.append([word for word in document if word not in bad_words])
clean_list

Next, we will use clean_list to perform topic modeling using LDA (Latent Dirichlet Allocation). By using an iterative process, LDA maps documents to a distribution of topics, this distribution of words in the topics builds up over the iterative process.
Let’s create the inputs of the LDA model using corpora:
corpora_dict = corpora.Dictionary(clean_list)
corpus = [dct.doc2bow(line) for line in clean_list]
And finally train our LDA model:
# TRAIN THE LDA MODEL
lda_model = LdaModel(corpus=corpus,
id2word=corpora_dict,
random_state=100,
num_topics=3,
passes=5,
per_word_topics=True)
# See the topics
lda_model.print_topics(-1)

Based on the results above, my opinion is that the three most common topics are: users commenting how much they love the product, ease of use, and sound quality. Again, these are only my opinion, you may draw your own conclusions from these results.
Sentiment Analysis
Next, to find out if the sentiment of the new_reviews matches the rating scores, I performed Sentiment Analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner) and took the average positive and negative score. VADER is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed on social media. First we define function sentimentScore, which uses Vader’sSentimentIntensityAnalyzer()
to take in a string and return a dictionary of scores in each of four categories— negative, neutral, positive, and compound (computed by normalizing the negative, neutral and positive scores). We then apply this function to new_reviews.
# DEFINE FUNCTION TO CALCULATE SENTIMENT SCORE
def sentimentScore(sentences):
analyzer = SentimentIntensityAnalyzer()
results = []
for sentence in sentences:
vs = analyzer.polarity_scores(sentence)
print(str(vs))
results.append(vs)
return results
sentiment = sentimentScore(df['new_reviews']

As you can see, we have values for each review, but the reviews themselves are not present so we need to join this with our current dataframe df. Next, we will turn sentiment into a new dataframe _sentimentdf and then join with df to create a new dataframe _echovader.
sentiment_df = pd.DataFrame(sentiment)
df.index = sentiment_df.index
sentiment_df['rating'] = df['rating']
echo_vader = pd.concat([df, sentiment_df], axis=1)
echo_vader.head()

Yes, it worked! We now have the sentiment scores for each review all in one dataframe. From here we can look at the sentiment ratings for different variations of Echo models.


Looking at the different variations of Amazon Echo models, the average positive sentiment rating of the reviews is 10 times higher than the negative, suggesting that the calculated sentiment rating scores are reliable.
Lastly, using a Count Vectorizer (TFIDF), I looked at the words that contributed to positive and negative sentiments. First, I separated the negative and positive sentiments and then plotted the words.
neg_alexa = df[df['sentiment']=='negative']
pos_alexa = df[df['sentiment']=='positive']
from sklearn.feature_selection import chi2
tfidf_n = TfidfVectorizer(ngram_range=(2, 2))
X_tfidf_n = tfidf_n.fit_transform(neg_alexa['new_reviews'])
y_n = neg_alexa['rating']
chi2score_n = chi2(X_tfidf_n, y_n)[0]
scores = list(zip(tfidf_n.get_feature_names(), chi2score_n))
chi2_n = sorted(scores, key=lambda x:x[1])
topchi2_n = list(zip(*chi2_n[-10:]))
x_n=range(len(topchi2_n[1]))
fig, ax = plt.subplots(figsize=(16,9))
ax.barh(x_n, topchi2_n[1], align='center', alpha=1, color='salmon')
plt.title('Alexa Negative Feedback', fontsize=24, weight='bold')
# x-axis
plt.xlabel("Feature Score", fontsize=22, weight='bold')
plt.xticks(fontsize=18)
#y-axis
labels = topchi2_n[0]
plt.yticks(x_n, labels, fontsize=18)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(True)
ax.spines['left'].set_visible(True)
fig = plt.gcf()
plt.show()
plt.draw()


From these graphs, we can see that for some users, they thought that the Echo was not worth the money and did not like the sound quality. For other users, they thought that the Echo was easy to use and worked well. NLP projects such as these are helpful for Amazon as it can look into the feedback of their users and understand which areas need improvement. Next steps include separating the variations of Echo models to see how the positive and negative feedback differ and which model is the best.
Thank you for reading! Here is a link to the Github repo 🙂