The world’s leading publication for data science, AI, and ML professionals.

Customer Satisfaction Prediction Using Machine Learning

Predicting Customer Satisfaction for the purchase made from the Brazilian e-commerce site Olist.

This Article Includes:
1.Introduction
2.Business Problem
3.Problem Statement
4.Bussiness objectives and constraints
5.Machine Learning Formulation
   i Data Overview
   ii.Data Description
  iii.Machine Learning Problem
   iv.Performance Metrics
6.Exploratory Data Analysis(EDA)
      a.Data Cleaning and Deduplication
      b.High Level Statistics
      c.Univariate Analysis
      d.Bivariate Analysis
      e.Multivariate Analysis
      f.RFM Analysis
      g.Conclusion
7.Data Preprocessing and Feature Engineering
8.Model Selection 
9.Summary
10.Deployment
11.Improvements to Existing Approach
12.Future Work
13.Reference

1. Introduction

The e-commerce sector is rapidly evolving as internet accessibility is increasing in different parts of the world over the years. This sector is redefining commercial activities worldwide and plays a vital role in daily lives nowadays. It has been also observed that the top categories of goods that are frequently ordered by the consumers are clothing, groceries, home improvement materials e.t.c and the percentage of these products may significantly increase in the future.

In general, we can say e-commerce is a medium powered by the internet, where customers can access an online store to browse through, and place orders for products or services via their own devices(computers, tablets, or smartphones).

Examples- e-commerce transactions, including books, groceries, music, plane tickets, and financial services such as stock investing and online banking.

There are mainly four types of e-commerce; these are shown in figure 1.

Figure -1 Types of E-Commerce Image by Paritosh Mahto
Figure -1 Types of E-Commerce Image by Paritosh Mahto

The main advantages of e-commerce are it is available 24 hours a day, seven days a week, a wider array of products are available on a single platform. The disadvantages are limited consumer services as it is very difficult to demonstrate each product to the consumer online, time taken to deliver the product.

Machine Learning can play a vital role in e-commerce like sales prediction, prediction of the next order of the consumer, review prediction, sentiment analysis, product recommendations e.t.c.It can also provide services through e-commerce like voice search, image search, chatbot, in-store experience(augmented reality) e.t.c.

2. Bussiness Problem

Olist is an e-commerce site of Brazil which provides a better platform to connect merchants and their product to the main marketplace of Brazil. Olist released this dataset on Kaggle in Nov 2018.

The data-set has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allow viewing orders from multiple dimensions: from order status, price, payment, and freight performance to customer location, product attributes and finally reviews written by customers. A Geo-location data-set that relates Brazilian zip codes to lat/long coordinates has also been released.

  • This business is based on the interaction between consumers, Olist store, and the seller.
  • At first, an order is made by the consumer on the Olist site. This order is received by Olist store, based on the information of the order (like the product category, geolocation, mode of payment e.t.c) a notification is forwarded to the sellers.
  • After that product is received from the seller and delivered to the consumer within the estimated delivery time.
  • Once the customer receives the product, or if the estimated delivery date is due, the customer gets a satisfaction survey by email where he can give a note for the purchase experience and write down some comments.
Workflow diagram by Andre Sionek, Image Source[ https://www.kaggle.com/andresionek/predicting-customer-satisfaction]
Workflow diagram by Andre Sionek, Image Source[ https://www.kaggle.com/andresionek/predicting-customer-satisfaction]

3. Problem Statement

  • For a given historical data of the customer predict the review score for the next order or purchase.
  • This problem statement can be further modified to predict customer satisfaction (positive or negative) for the purchase made from the Brazilian e-commerce site Olist.

4. Business objectives and constraints

  • No latency-latency requirement.
  • Interpretability of the model can be useful for understanding customer’s behaviour.

5. Machine Learning Formulation

Here, the objective is to predict the customer satisfaction score for a given order based on the given features like price, item description, on-time delivery, delivery status, etc.

The given problem can be solved either by multiclass classification problem(predict score [1,2,3,4,5] ), binary classification problem(0 as negative of 1 as positive), or Regression problem(for predicting scores)

5.i Data Overview

Source:- https://www.kaggle.com/olistbr/brazilian-ecommerce
Uploaded In the Year : 2018
provided by : Olist Store

The data is divided into multiple datasets for better understanding and organization.

Data is available in 9 csv files:
1. olist_customers_dataset.csv (data)
2. olist_geolocation_dataset.csv(geo_data)
3. olist_order_items_dataset.csv(order_itemdata)
4. olist_order_payments_dataset.csv(pay_data)
5. olist_order_reviews_dataset.csv(rev_data)
6. olist_orders_dataset.csv(orders)
7. olist_products_dataset.csv(order_prddata)
8. olist_sellers_dataset.csv(order_selldata)
9. product_category_name_translation.csv(order_prd_catdata)
Data schema by Olist Store, Image source[https://www.kaggle.com/olistbr/brazilian-ecommerce]
Data schema by Olist Store, Image source[https://www.kaggle.com/olistbr/brazilian-ecommerce]
  • The olist_orders_dataset has the order data for each purchase connected with other data using order_id and customer_id.
  • The olist_order_reviews_dataset has the labelled review data for each order in the order data table labelled as [1,2,3,4,5] where 5 being the highest and 1 being the lowest.
  • We will use reviews greater than 3 as positive and less than equal to 3 as negative reviews.
  • The data will be merged accordingly to get the final data needed for the analysis, feature selection, and model training.

5.ii. Data Description

The number of columns and rows with columns name of each .csv file are shown in this data frame:

Description About all columns/features are shown below:

5.iii. Machine Learning Problem

The above problem can be formulated as a binary classification problem i.e for a given order and purchase data of a consumer predict the review will be positive or negative.

5.iv. Performance Metrics

  • Macro f1-score-
  • Confusion Matrix

  1. Exploratory Data Analysis(EDA)

As far as we have understood the business problem and formulated the machine learning problem statement. We also understood about the datasets and most of the features. Now, we will do Exploratory Data Analysis on this dataset and to get more insights into the features.

The first step that I followed is to read all the .csv files and checked the columns in each CSV file and their datatypes. After this, all the data are merged according to the given data schema. Further, I performed the data cleaning and did different analyses on the dataset.

6.a Data Cleaning

  • Handling missing values

The merged final data have many null values. The maximum number of null values are present in the column review_comment_message which is of object dtype.Columns like order_approved_at , order_delivered_carrier_date and order_delivered_customer_date are also, have null values. These missing values are either replaced and dropped. The codes are shown below.

  • Data Deduplication

As you can observe the duplicate rows like the row with order_id 82bce245b1c9148f8d19a55b9ff70644 all the columns are the same. we can drop these rows keeping the first.

5.b High label Statistics

The final data after merging, cleaning, and deduplication has the following features –

The merged data has 32 columns and it has categorical features like order_status,payment_type, customerstate, and product category _name_english. One column named review_comment_message has text data that is in Portuguese. There are few numerical features also. The description of the numerical features are shown below-

We can observe from the above table that-

  • For the price and freight value of an order. The maximum price of an order is 6735 while the max freight is around 410 Brazilian real. The average price of an order is around 125 Brazilian real and freight value is around 20 Brazilian real. The order with a minimum price of 0.85 Brazilian real has been made.
  • For payment_value, the maximum payment value of an order is 13664 Brazilian real. Also, We can observe the statistics like percentile values, mean and standard deviation values, count, min, and a max of other numerical features.

Correlation Matrix-

Image by Paritosh Mahto
Image by Paritosh Mahto

Now let us observe the target variable i.e review score, the scores greater than or equal to 3 are considered as 1(positive) and otherwise 0(negative). From the distribution of the target variable, we can observe 85.5% of the total reviews are positive and 14.5% are negative. From this, we can conclude that the given dataset is skewed or imbalanced.

Image by Paritosh Mahto
Image by Paritosh Mahto

5.c Univariate Analysis

In this eCommerce dataset, there are mainly four types of payment methods are used these are credit card, baleto, voucher, and debit card.

Note: Baleto ==> Boleto Bancário, simply referred to as Boleto (English: Ticket) is a payment method in Brazil regulated by FEBRABAN, short for Brazilian Federation of Banks.It can be paid at ATMs, branch facilities and internet banking of any Bank, Post Office, Lottery Agent and some supermarkets until its due date.

Image by Paritosh Mahto
Image by Paritosh Mahto
  • from the above plots, we can observe that most of the orders are paid using a credit card and the second most used payment method is boleto.
  • The percentage of each mode of payment is shown in the pie chart which shows amongst all payments made by the user the credit card is used by 75.9% of the users, baleto is used by 19.9% of the user and 3.2% of the user used voucher and debit card.
Image by Paritosh Mahto
Image by Paritosh Mahto

We can observe from the above Pareto plot also that 96% of the customers had used a credit card and baleto. lets us see how this feature is related to our target variable.

Image by Paritosh Mahto
Image by Paritosh Mahto

We can observe from the above-stacked plot that most of the customers who used credit cards have given positive reviews. Also, for the boleto, voucher, and debit card users, it is the same. From this, we can conclude that this can be our important categorical feature for the problem.

Now let’s do a univariate analysis on the column customer_state. This column contains state codes for the corresponding customer_id. The name of the states and the state codes are shown below on the map of Brazil.

Map of Brazil, Image source[ https://st4.depositphotos.com/1374738/23094/v/950/depositphotos_230940566-stock-illustration-map-brazil-divisions-states.jpg ]
Map of Brazil, Image source[ https://st4.depositphotos.com/1374738/23094/v/950/depositphotos_230940566-stock-illustration-map-brazil-divisions-states.jpg ]

The top three populous states of Brazil are São Paulo, Minas Gerais, and Rio de Janeiro and we can also observe from the plot shown below that 66.6 % of the orders are received from these states which mean most of the customers are from these states.

Image by Paritosh Mahto
Image by Paritosh Mahto

Also from the stack plot of reviews per state shown below, we can conclude that most consumers from each state have given positive reviews. In SP state from the total reviews of 40800, 35791 reviews are positive and for RJ state 9968 reviews are positive from the total reviews 12569. The consumer_state can be our important feature for the problem.

Image by Paritosh Mahto
Image by Paritosh Mahto

As we know product categories are one of the important features in this business to know the top-selling product categories I plotted a bar graph shown below-

Image by Paritosh Mahto
Image by Paritosh Mahto

As we can observe, the most ordered products are from the bed_bath_table category, health beauty, and sports_leisure between 2016 and 2018.

There are few timestamp features also in this dataset like order_purchase_timestamp ,order_purchase_timestamp ,`order_approved_at,**order_delivered_customer_date** ,**order_estimated_delivery_date** e.t.c. I did a univariate analysis on the timestamps after extracting attributes like a month, year, day, day of week e.t.c. The data given is of 699 days and the timestamp between which data is collected is **2016–10–04 09:43:32-2018–09–03 17:40:06`** .

Image by Paritosh Mahto
Image by Paritosh Mahto

The evolution of the total orders received is shown above, the maximum number of orders are received in 201711. Also, we can observe the growth of Olist from 201609 to 201808.The analysis of the orders and reviews based on the attributes extracted from order_purchase_timestamp has been **** concluded.

Image by Paritosh Mahto
Image by Paritosh Mahto
  • From the subplot titled Total Reviews by Month, we can observe that the highest % of positive reviews amongst the total reviews between 2016 to 2018 are given Feb i.e 9.8%.In May and July amongst the total reviews, there are more than 9.0% of reviews are positive.
  • From the second subplot titled Total Reviews by Time of the day, we can conclude that a maximum number of orders are received in the afternoon and the highest % of positive reviews are given at that time i.e 32.8%.
  • From the third subplot titled Total Reviews by day of the week, we can conclude that a maximum number of orders are received on Monday and the highest % of positive reviews are given on that day and Tuesday i.e 13.9%.

Univariate Analysis on numerical features-

  • Distribution of product price per class
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Distribution of frieght_value per class
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Distribution of product_height per class
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Distribution of product_weight_g per class
Image by Paritosh Mahto
Image by Paritosh Mahto
  • The above distribution plots show the distribution of each numerical feature for both the positive and negative classes. We can observe that there is an almost complete overlap of both the distribution for the positive and negative classes which suggests that it is not possible to classify them based only on these features.

6.d Bivariate Analysis

There are more than 10 numerical features in this dataset but from the correlation matrix shown above, we can observe most of the features are cont linearly related. For bivariate analysis, only four features are selected and plotted in a scatter plot.

  • From the two scatter plots titled Distribution of price vs freight_value per class and Distribution of price vs freight_value per class respectively, we can observe It is very hard to say anything about the reviews based on these plots as data points are not separable based on reviews these are completely mixed data.
  • Distribution of price vs freight_value per class
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Distribution of price vs product_weight_g per class
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Pair Plots

A pair plot is plotted shown below for the features product_photos_qty, product_name_length,product_description_length as these have negative correlation values with the review_score column. All the scatter plots between the features are completely mixed up not separable based on reviews. We can say that none of these features is helpful for classification.

Image by Paritosh Mahto
Image by Paritosh Mahto

6.e Multivariate Analysis

In a multivariate analysis, The evolution of sales and orders between 2016 and 2108 has been plotted. From the plot, we can observe that there is the same pattern of total sales and the total order per month between 2016 and 2018.

Image by Paritosh Mahto
Image by Paritosh Mahto

6.f RFM Analysis

For the given data of customers, I did an RFM analysis on this data.RFM analysis is basically a data-driven customer behaviour segmentation technique.RFM stands for recency, frequency, and monetary value.

RFM stands for-
Recency - number of days since the last purchase
Frequency - number of transactions made over a given period
Monetary - the amount spent over a given period of time

Python code for calculating recency, frequency, and monetary-

Output after creating RFM is shown below-

To know more about this behaviour segmentation technique you can visit here-

RFM Analysis Guide: 6 Key Segments for RFM Marketing

The distribution recency, frequency, and monetary of all the customers are shown below.

Image by Paritosh Mahto
Image by Paritosh Mahto
  • From the first plot of recency, we can observe that most of the users stayed with Olist for a long duration which is a positive thing but the order frequency is less.
  • from the second plot of frequency, the most number of transaction or order is less than 5. from the third plot of monetary the maximum amount spent over the given very period is seems to less than 1500 approx.

The square plot of the behaviour segmentation of the customers shown below.

Image by Paritosh Mahto
Image by Paritosh Mahto
  • Based on the RFM_Score_s calculated for all the customers I categorized the customers into 7 categories :
'Can't Loose Them' ====  RMF_Score_s  ≥  9
'Champions' ==== 8 ≤ RMF_Score_s < 9
'Loyal' ==== 7 ≤ RMF_Score_s <8
'Needs Attention' ==== 6 ≤ RMF_Score_s <7
'Potential' ==== 5 ≤ RMF_Score_s < 6
'Promising' ==== 4 ≤ RMF_Score_s < 5 
'Require Activation' RMF_Score_s <4
  • From the above square plot, the highest percentage of customers lies within the area of category potential. Few areas are also there with coloured in blue scale which shows the percentage of consumers who require more attention so that they can retain in Olist.
  • We can use either RMF_Score_s or RMF_Level as a feature to solve this problem.

After merging, data cleaning, and data analysis of data we will get the final data which can be used further for preprocessing and feature extraction.

6.g Conclusion

* The target variable/class-label is imbalanced.We should be carefull while choosing the performance metric of the models.
* From the univariate analysis of payment_type we observed that 96 % of the user used credit card and boleto and concluded that this can be our important feature.
* Also,from the univariate analysis of consumer_state we found that 42% of total consumers are from the SP(São Paulo), 12.9 % are from RJ(Rio de Janeiro) and 11.7 % are from MG(Minas Gerais).
* After analyzing the product_category feature we observed that the most ordered products are from the bed_bath_table category, health beauty, and sports_leisure between 2016 and 2018. The least ordered products are from security_and_services.
* The different timestamps seem to be important features as many new features can be explored from these. we observed within 2016–18 the total number of orders received is increasing till 2017–11 and after that their a small decrement. from the month, day and time we observed the most number of orders are received in the month of Feb, on Monday and afternoon time.
* The numerical features like price, payment_value, freight_value,product_height_cm,product_length_cm doesnot seems to be helpful for this classification problem as observed from univariate and bivarate analysis.Also we can say linear model like KNN, Naive Bayes might not work well.
* RMF Analysis is also done to understand whether new features can be created from this or not and we found that one numerical feature or categorical feature can be extracted from this.

  1. Data Preprocessing and Feature Engineering

After the data analysis, we came to know about different categorical and numerical features. All categorical features seem to be preprocessed, we need not do preprocessing for these features. But, there is also a column named review_comment_message which contains text data. We have to do text preprocessing before the featurization of these data.

Preprocessing of Review Text

Since we have text data in the Portuguese Language, we have to be careful while choosing stemmer, stopwords, and while replacing and removing any special character or any word. I selected nltk library for this and from this I have imported stopword using from nltk.corpus import stopwords and imported RSLP steamer using RSLPStemmer() .

As we replaced the null values in the reviews data with 'nao_reveja' we have to remove words like 'não' & 'nem' from the stopwords. After this, we have to remove or replace links, currency symbols, dates, digits, extra space, and tab e.t.c. The preprocess function is shown below-

Review text column before and after preprocessing is shown below-

  • Before
  • After

Vectorization of text data

Now, we have preprocessed text, for converting these text data I used FastText from gensim library to convert words into a vector with TF-IDF vectorizer().tfidf values give more weight to the most frequent words.

The code snippet is shown below-

for loading FastText model (for 300 dim)

vectorizer function :

After this, the features which are not useful are dropped from the final data. The columns which are dropped are shown below.

col= ['order_id',
 'customer_id',
 'order_purchase_timestamp',
 'order_approved_at',
 'order_delivered_customer_date',
 'order_estimated_delivery_date',
  'customer_unique_id',
 'order_item_id',
 'product_id',
 'seller_id',
 'shipping_limit_date',
 'order_purchase_month_name',
 'order_purchase_year_month',
 'order_purchase_date',
 'order_purchase_month_yr',
 'order_purchase_day',
 'order_purchase_dayofweek',
 'order_purchase_dayofweek_name',
 'order_purchase_hour',
'order_purchase_time_day','customer_city','customer_zip_code_prefix','product_category_name']

Now, we have our final data with preprocessed text data, categorical and numerical features. we can split the data using from sklearn.model_selection import train_test_split with stratify=y as we have imbalanced data. we also have categorical features which are not encoded yet. for encoding categorical features I used CountVectorizer(binary= True) function, encoding of order_status feature is shown below.

The numerical features are also scaled using from sklearn.preprocessing import Normalizer .All the vectorized features are stacked using from scipy.sparse import hstack to form X_train and X_test.

Baseline Model Selection

We have existing vectorized features now we will build few basic models and we will choose the one for our baseline model. After this, we will try to improve on the baseline by adding new features and to add new features I will again do some EDA. The models which are used are the random model, the Naive Bayes Model, and Logistic Regression Model. The output scores and confusion matrix are shown below.

  • Random Model
Train f1-score 0.42757018716907186
Test f1-score 0.4261466361217526
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Naive Bayes
Train f1 score:  0.7424119484527338
Test f1 score:  0.7472580717762947
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Logistic Regression
Train f1 score:  0.8165306456031154
Test f1 score:  0.8062989397776499
Image by Paritosh Mahto
Image by Paritosh Mahto

The logistic Regression Model is performing better than the other models. The logistic Regression Model is chosen as a baseline model. From both the train and test confusion matrix, we can observe that the False Positive and False Negative values are still very large let’s try to reduce these values by adding some features and through feature selection methods.

Feature Engineering

Along with the existing features,16 new features are added to the data. The details about these features are shown below.

After adding new features, we have a total of 29 numerical features and 5 categorical features, and text data(300 dim). we will again run the selected baseline model i.e Logistic Regression Model and check the output from it. The output is shown below.

Train f1 score:  0.8258929671475947
Test f1 score:  0.8167104393178897
Image by Paritosh Mahto
Image by Paritosh Mahto

We got the better train and test f1-scores but still, we have high FP and FN values. To reduce these values and increase the scores I tried Autoencoder Model for feature selection.

Feature Extraction/Selection using AutoEncoder Model

Autoencoder is a type of neural network that can be used to learn a compressed(reduced dimension) representation of raw data. An autoencoder is composed of an encoder and a decoder sub-models. The encoder compresses the input and the decoder attempts to recreate the input from the compressed version provided by the encoder. After training, the encoder model is saved and the decoder is discarded.

The encoder can then be used as a data preparation technique to perform feature extraction on raw data that can be used to train a different machine learning model. source-https://machinelearningmastery.com/autoencoder-for-classification/

The architecture of the model is shown below. I used a dense layer, BatchNormalisation Layer, and leakyRelu as an activation layer.

Image by Paritosh Mahto
Image by Paritosh Mahto

I ran this model for 10 epochs, and after running the model I saved the encoder part.

Loss Vs Epoch graph

Image by Paritosh Mahto
Image by Paritosh Mahto

Then, important features are extracted from the encoder model.

code snippet:

The baseline line model is run again with the extracted features. The output is shown below.

Train f1 score:  0.8457608492913875
Test f1 score:  0.8117963347711852
Image by Paritosh Mahto
Image by Paritosh Mahto

The f1-score is increased and also we got decreased FP and FN values. We will now run different machine learning and Deep Learning models and select the best model for our problem.


  1. Model Selection

Model selection is the process of choosing one of the models as the final model that addresses the problem. As we have seen the baseline model performance, now to improve the scores we will try the model selection process for both different types of models (e.g. logistic regression, SVM, KNN, Decision Trees, Ensembles, etc.) and models of the same type configured with different model hyperparameters (e.g. different kernels in an SVM).

Machine Learning Models

We start with different classification models to Neural Network models. The models that we are experimented with within this case study are as follows:-

  1. Logistic Regression
  2. Linear Support Vector Machine
  3. Decision Tree Classifier
  4. Random Forest Classifier
  5. Boosting Classifier (XGBoost/LightGBM/AdaBoost/CATBoost)
  6. Stacking/Voting Ensemble techniques

For each model, I did hyperparameter tuning using RandomizedSearchCV .The summary of all the outputs with the best hyperparameters of all the models is shown below.

We can observe the best model based on the test f1-score is vot_hard the code snippet and the output is shown below.

Code snippet of all the models can be found here-

Paritosh/ml-models – Jovian

Deep Learning Models

As we have seen different machine learning models, now we will try different neural network models. I tried 5 different models and used these models to form deep learning stacking models.

Model-1

In Model-1 I build a simple model with 5 dense layers as hidden layers and 1 dense layer as output layer. At the output layer sigmoid as the activation function is used. The input given is the extracted features from the encoder model. I used a custom f1-score metric to evaluate these models.

  • Model Architecture
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Epoch Vs Loss Plot
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Output
Train f1_score: 0.8632763852422476
Test f1_score: 0.8218909906703276
Image by Paritosh Mahto
Image by Paritosh Mahto

Model-2

In Model- 2 I used 4 CONV1D layers,2 MaxPooling1D layers with one dropout and one flatten layer as hidden layers, and 1 dense layer as output layer. At the output layer sigmoid as the activation function is used. The input given is the extracted features from the encoder model but with the reshaped array.

  • Model Architecture
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Epoch Vs Loss
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Output
Train f1_score: 0.8610270746182962 
Test f1_score: 0.8253479131333611
Image by Paritosh Mahto
Image by Paritosh Mahto

Model-3

In Model -3 I used LSTM Layer, embedding layer, BatchNormalisation layer, dense layer, dropout, and flatten layer to build the model with multiple inputs and one output. The input data are explained below.

  • Input_seq_total_text_data- I used ** the Embedding layer to get word vectors for the** text data columns. I also used predefined fasttext word vectors for creating an embedding matrix. After this used LSTM and got the LSTM output and Flatten that output.
  • Input_order_status- given order_status column as input to embedding layer and then trained the Keras Embedding layer.
  • payment_type- given payment_type column as input to embedding layer and trained the Keras Embedding layer.
  • Input_customer_state- given customer_state column as input to embedding layer and trained the Keras Embedding layer.
  • Input_product_category- given product_category column as input to embedding layer and trained the Keras Embedding layer.
  • Input_rfm – given RFM_Level column as input to embedding layer and trained the Keras Embedding layer.
  • Input_numerical- concatenate remaining columns i.e numerical features and added a Dense layer after that.

All these are concatenated at the end and passed through different dense layers.

  • Model Architecture
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Epoch Vs Loss
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Output
Train f1_score: 0.860330602322043 
Test f1_score: 0.8327277576694923
Image by Paritosh Mahto
Image by Paritosh Mahto

Model-4

Model -4 also used LSTM Layer, Conv1D, embedding layer, BatchNormalisation layer, dense layer, dropout, and flatten layer to build the model but with two inputs and one output. The input data are explained below.

  • Input_seq_total_text_data- I used ** the Embedding layer to get word vectors for the** text data columns. I also used predefined fast text word vectors for creating an embedding matrix. After this used LSTM and got the LSTM output and Flatten that output.
  • Other_than_text_data- All categorical features are converted to a one-hot encoded vector and then concatenated along with numerical features using np. hstack
  • Model Architecture
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Epoch Vs Loss
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Output
Train f1_score: 0.8815188513597205 
Test f1_score: 0.8218441232242646
Image by Paritosh Mahto
Image by Paritosh Mahto

Model-5

Model -5 also used LSTM Layer, Conv1D, embedding layer, BatchNormalisation layer, dense layer, dropout, and flatten layer to build the model but with two inputs and one output. The input data are the same as model 4.

  • Model Architecture
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Epoch Vs Loss
Image by Paritosh Mahto
Image by Paritosh Mahto
  • Output
Train f1_score: 0.8508463263785984
Test f1_score: 0.8287339894123904
Image by Paritosh Mahto
Image by Paritosh Mahto

Deep Learning Stacking Model

As we have seen 5 deep learning models now we will use an ensemble method where the above 5 models are used as sub-models and each will contribute equally to the combined prediction and XGBClassifier is used as the final model i.e meta-model. We will build two stacking models one with hard stacking another with soft stacking and the output will be compared with the above model’s output. The code snippet for each step for building both hard and stacking is shown below.

step-1 Loading five sub-models

step-2 Predictions From each model

step-3 Stacking the Predictions and passed through the meta-classifier

  • Hard Stacking

Output :

Image by Paritosh Mahto
Image by Paritosh Mahto
  • Soft Stacking

Output:

Image by Paritosh Mahto
Image by Paritosh Mahto

As we can observe the outputs of both stacking models are performing better than the above deep learning models. The code snippets of all the deep learning models can be found here –

Paritosh/dl-models-2 – Jovian


  1. Summary
  • From EDA we concluded that the given dataset is skewed and with correlation values, we have seen most of the numerical features are not linearly related which means that simple ml classification might not work well.
  • The review data are in text form we did preprocess of these data and also the addition of new features helped improve the scores. The autoencoder model helped more in feature selection and performance enhancement.
  • In the model selection part, we have gone through different ml model and dl model performances. After comparing all the results we can conclude that the model is a deep learning stacking model i.e soft stacking model. Now we will use this model while the deployment process.
  1. Deployment

The best stacked deep learning model is deployed using streamlit and Github. Using streamlit uploader function I created a CSV file input section where you can give raw data. After that, you have to choose the unique customer id and corresponding order ids and the prediction will be shown as an image.

webpage link-

https://share.streamlit.io/paritoshmahto07/customer-satisfaction-prediction/main/app.py

Deployment Video-

  1. Improvements to Existing Approaches

In the Existing approach, two regression models are used with nine new features for the prediction of review scores got RMSE of 0.58. In the existing approach, different models are also not used, and also limited features are used. In our approach that we followed had achieved a very good result than the existing approach and also we used a new text featurization method along with an autoencoder model for feature selection due to which the performance of the models increased.

  1. Future Work

The vectorization of the Portuguese text data with different methods can improve the result. Also, the Addition of new features and parameter tuning of the DL models can be done to achieve a better result.

My LinkedIn Profile

https://www.linkedin.com/in/paritosh07/

My Github Profile

paritoshMahto07 – Overview

  1. References

i.Existing Solution-

Predicting Customer Satisfaction

ii. Data Analysis and Visualisation-

E-Commerce Sentiment Analysis: EDA + Viz + NLP ✍

iii. RFM Analysis-

Recency, Frequency, Monetary Model with Python – and how Sephora uses it to optimize their Google…

iv. Autoencoder Model

Autoencoder Feature Extraction for Classification – Machine Learning Mastery

v. Stacking ML Models

Stacking Ensemble Machine Learning With Python – Machine Learning Mastery

vi. Stacking DL Models

Stacking Ensemble for Deep Learning Neural Networks in Python – Machine Learning Mastery

vii. Mentorship

Applied Course

Thanks for reading, have a good day! 🙂


Related Articles