Dynamic Pricing using Reinforcement Learning and Neural Networks

An intelligent system that can increase e-commerce sales and profits, awarded by Correlation One and Softbank as one of the top three projects in their Data Science program.

Published in

Towards Data Science

5 min readFeb 16, 2021

Dynamic Pricing. Image by source. Reposted with permission.

The main goal of this project was to develop a dynamic pricing system to increase e-commerce profits by adapting to supply and demand levels.

The pricing system should be able to manipulate a product’s final price in a robust and timely manner, reacting to offer and demand fluctuations in a scalable way.

First, a simulator environment was created to mimic the fluctuation of order levels based on a few variables. Then, this simulation environment was used to train a Deep Reinforcement Learning agent to choose the best pricing policy for maximizing profits.

Data Preparation and System Architecture

The dynamic pricing system architecture consists of three fundamental parts. The PostgreSQL Database, hosted on Amazon RDS, the Flask API and Dash dashboard, hosted on Amazon EC2.

Flask API is a Python RESTful framework that handles HTTP requests. It has two main uses, applying the reinforcement learning algorithm and providing access to data. It processes the data using a trained PyTorch model, saving the results in the database. It also provides a HTTP access to the data in JSON format.

Before utilizing the e-commerce data for analysis and modeling, an extensive cleaning process was carried out. The main objective was to merge the datasets into a time series format. For any given product we gathered:

Competitor’s price (when available)
Average price
Average shipping value
Number of orders
Product’s type
Product’s group

During the ETL process, a few adjustments had to be made to ensure data quality. For instance, when analyzing competitor’s prices, we noticed that the dataset contained some products being sold for R$0.00 in brief time periods. Since it is highly unlikely that the product was being offered for free, such records were dropped from the dataset.

From the values that remained, outliers caused by manual errors (e.g.: a product that sells for R$199.99 being advertised for R$19.99) were removed. This was achieved by excluding those values that fell out of the range of the mean price of a product plus or minus three times the standard deviation of its price.

Clustering the products

Modeling each product individually has many obstacles. When a product is first created, there is not enough historical data to model it as a time series. Another disadvantage occurs whenever a product has its supply interrupted.

On the other hand, creating a single model for all of the portfolio would generate poor models, since this approach mixes products with very different behaviors.

We solved the one-model-fits-all vs one-model-by-product trade-off by using a mixed approach. We clustered the products by a combination of type, group and price.

Clustering Representation. Image by source. Reposted with permission.

For each product group+type combination, a categorization of 4 possible price ranges was created (A, B, C or D), according with the quartile ranges.

By taking this approach, the sparsity of the data is reduced and new products with short time series can be priced based on similar items.

Using Reinforcement Learning

Training a reinforcement learning solution using a real scenario often takes a lot of time and, as the agent does not have any experience in the beginning of the process, it may take bad decisions that could end up causing undesired losses.

To avoid these problems we created an environment simulator using various models: Linear Regression, Decision Trees, Random Forests, Support Vector Machine, eXtreme Gradient Boost and Facebook’s Prophet.

We ended up choosing the Linear Regression enviroment simulator because of its higher interpretability. Then, we applied a Reinforment Learning method called Deep Q-Learning.

Reinforment Learning Cicle. Image from source.

In a nutshell, a software agent that would take actions based on the current state of an environment. After taking an action, a reward would be given to the agent, scoring how good or bad the chosen action was. By experimenting actions and evaluating the rewards along time, the agent is trained to make the most appropriate decisions.

In the e-commerce dynamic pricing problem we could map these concepts to:

Environment: marketplace (Amazon, for example)
State: lowest price in the market, inventory levels, current date features (day of the week, current month and year, holidays, etc.), shipping values to key locations, and many others.
Agent: dynamic pricing algorithm
Action: to increase or to lower prices, or to offer free-shipping
Reward: total profit generated by the agents decisions

A fully connected Neural Network with 4 hidden layers of 30 nodes each was used. The input layer receives the state information (e-commerce’s prices, date parameters, inventory, shipping values, competitor’s prices), while the output layer consists of 10 possible actions: setting the retail price by multiplying the cost of the item by increments of 2.5 percentage points.

This way, the agent will never sell a product at a loss, leaving the agent with the task of choosing the optimal balance between demand prices to optimize profits.

Results

In order to compare the results, both the original e-commerce pricing policy and the trained agent pricing policy were used on the simulator environment.

Analyzing the financial results, the Reinforcement Learning agent outperformed the baseline pricing policy by 3.48%. This profit increase could improve merchants’ satisfaction with the e-commerce platform, which might rise the engagement rates.

There may also be an improvement to the merchant’s pricing workflow, since manually operating changes in price is very time consuming.

Dynamic Pricing Results. *Image by author.*

This approach could further be used in many other industries, like tourism, transportation and agriculture.

Hope you enjoyed and if you have any doubt, reach us out.

Thanks for reading!

Co-authors: Francisco Magioli, Henrique Nascimento, Leonardo Gomes Cardoso e Renato Candido.