The world’s leading publication for data science, AI, and ML professionals.

Machine Learning is Not Just for Big Tech

Using Natural Language Processing to Support Small Businesses.

Imaged Approved by Owner: Owners of Altomonte's Italian Market (From Left to Right: Vincent, Maria, Frances, the late Michele)
Imaged Approved by Owner: Owners of Altomonte’s Italian Market (From Left to Right: Vincent, Maria, Frances, the late Michele)

A family-owned and operated Italian delicatessen, Altomonte’s Italian Market, has been serving traditional Italian delicacies in the Greater Philadelphia area since the owners immigrated to the United States over 50 years ago. Starting off as a small one-room butcher shop in Germantown, Altomonte’s has grown into a business with two stores catering to thousands of customers a week. The owners, Frances and the late Michele, have blended Italian/American traditions with 21st-century ideas, especially with the help of their son, Vincent, and daughter, Maria. Altomonte’s still runs most of its analysis by tradition, including handpicking the steers for their butcher department intuitively knowing how much meat customers will buy depending on the time of year. Moving into the 21st century, they have incorporated technological advancements, such as touch screen kiosks for sandwich orders as well as other means of innovation to their business’s operations. These advancements to stay concurrent have also included building an internet presence, where there has been an extreme success with their social media platforms. So, where can Altomonte’s grow next? Can incorporating Machine Learning (ML) in their everyday operations continue to help build Altomonte’s Italian Market and Deli? The answer is yes. A family-owned and operated Italian Market, not a big tech firm from the Silicon Valley, can benefit from ML analysis.


Natural Language Processing

Natural Language Processing (NLP) is an area of concentration in Machine Learning that analyzes human speech and text to extrapolate the document’s hidden ideas. "Text Mining" in NLP is the idea that we can gain further knowledge from a body of text, or "corpus," using different transformations on words and NLP algorithms. Various approaches can be taken to better understand the theme of a body of text. Topic Model Analysis allows for an analyst to extract information and propose the topics or summaries of an overall body of text. Sentiment Analysis models, rather than looking for different topics and titles for describing data, look at the hidden emotion in the data. There are various algorithms used in both approaches, from a Latent Dirichlet Allocation model to Bigram models to Neural Networks. While I don’t plan to show you how I created an Artificial Neural Network (ANN) that can convert English slang text to Italian and detect your sarcasm, I can show you how NLP techniques were able to help support and provide insights to a Traditional Italian Market.


Sentiment Analysis

I could go into the finer details and statistical inferences from the analysis I conducted for Altomonte’s, instead, I thought I would make it interesting and explicate how I trained a Convolution Neural Network (CNN) to make sentiment predictions on Facebook reviews of the market. So sit back, enjoy a nice plate of spaghetti and meatballs (or let’s skip to dessert and have a cannoli), and enjoy!

What is Sentiment Analysis?

Sentiment analysis is the process of deciphering the hidden attitudes, feelings, or emotions of a text. "Sentiment" is defined as "an attitude, thought, or judgment prompted by feeling." Sentiment Analysis allows us to attach a feeling to the words we are trying to communicate from behind a screen.

Convolutional Neural Networks

Generally, Convolutional Neural Networks (CNN) have shown promising results in the field of object detection and imagery classification. CNNs can be used for text analysis if their input dimensionality is changed from 2 dimensions (image height and width) to 1 dimension for a sequence of text. In a convolution layer, the kernel slides along the sample, taking calculations at each stoppage point along the string of data. The stride of a kernel dictates how many steps or units a kernel should move along a sample. For example, a 1-dimensional CNN with a stride of 2 will move 2 units across the sentence sequence to make calculations, CNN’s are linear in nature therefore an activation function is applied to a linear output to create a nonlinear component that can then be used for classifying samples.


Sentiment Analysis: Altomonte’s Italian Market Online Reviews

For the analysis, the following python libraries were used:

import keras
from keras.layers import Input, Conv1D, Embedding , MaxPooling1D, GlobalMaxPooling1D, Dense
from keras.models import Model
from keras.preprocessing.text import Tokenizer
from keras.optimizers import Adam
from keras.preprocessing.sequence import pad_sequences
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

Data

The training data was a corpus that compiled online reviews from Yelp, TripAdvisor, and Google Reviews. The reviews covered the past 10 years of operation for Altomonte’s. Each review in the training set had an associated rating from 1 to 5, where 1 is considered bad, and 5 is considered excellent. The Facebook reviews have no rating, supporting their use as the test data to predict their hidden meaning.

df = pd.read_csv('Altomontes_reviews.csv')
df.head()
Figure: Output of the .csv file
Figure: Output of the .csv file

As shown above, the data is a list of reviews and has the "Month", "Year", "Rating", and "Platform" features for each review.

Data Processing

Once the data was loaded in as a data frame, it was restructured into two columns, one containing the reviews and one containing the sentiment based on a review’s respective rating. First, the columns that would not be utilized in the model building process were dropped.

df = df.drop(['Month','Year','Platform'],axis=1)
df2 = df

Next, a new binary column called "Label" was created that had a 1 if a review was deemed positive and a 0 if a review was deemed negative. It was decided that reviews with ratings 3 or above were positive and 2 or below were negative.

df['Label'] = [1 if x >=3 else 0 for x in df['Rating']]

The next part of preprocessing involves turning the words, known as "tokens", of the sentences into the sequence of numbers.

MAX_VOCAB_SIZE = 10000 
tokenizer = Tokenizer(num_words=MAX_VOCAB_SIZE)
tokenizer.fit_on_texts(X_train)
sequences_train = tokenizer.texts_to_sequences(X_train)
sequences_test = tokenizer.texts_to_sequences(X_test)

A max vocab size of 10000 was set to ensure the sequences did not get too large. First, a tokenizer was created and the reviews were tokenized. More on how tokenizer works can be found here. Then, the tokenized reviews were turned into sequences to be read by the CNN. Once the words are tokenized, they then need to be mapped to an index. This ensures that each word has its own unique number sequences.

word2idx = tokenizer.word_index
V = len(word2idx)

This data had 3,248 unique tokens (much less than the arbitrary max of 10,000). Now that the text has been indexed and transferred into sequences, it needs to be padded with zeros. Padding ensures that each input of the data running through the neural network is of the same shape in size. Using post padding, 0’s are added to the end of each sequence to make them all the same size as the longest sequence.

data_train = pad_sequences(sequences_train,padding='post')
data_test = pad_sequences(sequences_test,maxlen=T,padding='post')

The data is now finished being processed and the CNN can be trained on the Altomonte’s Italian Market Online Reviews.

Model

For the model, a CNN with three 1-Dimensional layers was formulated. The initial input was a vector with the shape of (T, ) where T is the length of each of the sequences. The embedding layer turns each of the indexed variables into vectors with a shape size of 20.

D = 20 
i = Input(shape = (T,))
x = Embedding(V +1 , D)(i)
x = Conv1D(16,2,activation='relu',)(x)
x = MaxPooling1D(2)(x)
x = Conv1D(32,2,activation='relu')(x)
x = GlobalMaxPooling1D()(x)
x = Dense(1,activation='sigmoid')(x)
model = Model(i,x)

Once the model was created, it was compiled using binary cross-entropy as the loss function. Binary cross entropy was used since the criteria being evaluated was if the review was correctly identified as a 1 (positive) or a 0 (negative).

model.compile(loss='binary_crossentropy',optimizer = 'adam',metrics = ['accuracy'])

Once the model is compiled, it can be fit on the training and testing data created earlier.

r = model.fit(data_train,y_train,epochs=5,validation_data=(data_test,y_test))

The achieved accuracy was 87.8% after 5 epochs which is not bad. One problem that occurred with the training was that the data set was small and therefore, overfitting was a major concern. Decreasing parameters (less layers) and some dropout was helpful in overcoming overfitting. Altomonte’s should consider collecting more reviews in the future to get a more generalizable model for predictions.

Predictions

An example of a review from the Altomonte’s Italian Market Facebook Page (hey, check it out!):

"Highly recommend!!! I never knew about Altomontes until a friend dropped off a meal at my house recently. My husband and I decided to go and check it out and while we were there, we met the owner and she was the sweetest person ever! She basically gave us a tour. We bought the chicken marsala for dinner and it was wonderful! We also bought the Brooklyn pizza for lunch and it was delicious! We sampled their coffee and they let us taste a cannoli, biscotti and a cookie. Very good! We bought shells, marinara sauce, a bottle of wine, some cheese etc. Everything looked good, taste good and we could spend hours there! We will be back!! Maybe today?"

Reading this review, we can intuitively see that this has a positive sentiment, however, what does the model predict?

review = tokenizer.texts_to_sequences(instance)
pred_list = []
for sublist in review:
    for item in sublist:
        pred_list.append(item)
predict_list = [predict_list]
review = pad_sequences(flat_list, padding='post', maxlen=T)
prediction = model.predict(review).round()

Converting the code into a padded sequence, a prediction can be made on its sentiment. The prediction is rounded to 0 or 1.

array([[1.]], dtype=float32)

As you can see, the review was predicted as 1, or positive (with approximately 88% accuracy)! And just like that, we have a CNN prediction model that Altomonte’s could use to gauge customers’ attitudes and feelings towards their business operations from unrated reviews!


Conclusion

Most Common Words in Altomonte's Italian Market Online Reviews (Extracted from complete analysis)
Most Common Words in Altomonte’s Italian Market Online Reviews (Extracted from complete analysis)

There are many Machine Learning techniques that small businesses can incorporate into the structure of their operations to help get a better understanding of the business productivity and performance. Natural Language Processing is practical for businesses in the Food industry to get a deeper meaning of what the customer thinks rather than just looking at reviews and making an educated guess. For Alomonte’s Italian Market, there are many other NLP techniques not discussed that could be executed. Topic Model Analysis of the reviews could provide feedback of what customers notice and like when they walk around Altomonte’s enterprises. One takeaway and recommendation for small businesses to get the best results from ML analysis is to start collecting and storing huge amounts of data and start incorporating NLP models. Thanks for reading!

Sources

  1. Géron, A.: Hands-on machine learning with Scikit-Learn and TensorFlow : concepts, tools, and techniques to build intelligent systems. O’Reilly Media, Sebastopol, CA (2017).
  2. Vasiliev, Yuli. Natural Language Processing with Python and Spacy: A Practical Introduction. San Francisco: No Starch, 2020. Print.
  3. Use of the picture was approved by Altomonte’s Italian Market Inc.

Related Articles