Alternative Ways to Recommend Airbnb Listings Using Natural Language Processing

Aisulu Omar
Towards Data Science
6 min readSep 9, 2019

--

This project came from my love for Airbnb. If you put enough time into searching, you can find unique places for decent prices. The goal of this project is to improve your Airbnb search ability. I was planning a trip to Austin, and I realized I don’t have any information on the neighborhoods there, I won’t have a car there, and I don’t want to spend a lot of time using public transportation. I want to stay in a cool neighborhood, where I can find places I like within a walking distance.

Unfortunately for me, Airbnb currently doesn’t make it easy to search for accommodations based on their proximity to specific things & places. Here is the interface of Airbnb. You can filter by dates, guests, work trip, etc., but I can’t get a sense of where to stay if I like hip areas near bars and coffee shops.

I decided to do a data science project to address my desire for search specificity in Airbnb.

Data

Airbnb contains terms of use that prohibit unauthorized scraping of data by automated agents, but you can find ready-to-use datasets for the biggest cities stored at insidearbnb.com.

I decided to work with a dataset for listings and their reviews in Seattle, since I am more familiar with the city I live in. I also used two different datasets for this project, to develop two alternative ways of generating recommendations, each one described in sections below.

Part I

Recommendation system #1

My goal for this part is to create more personalized recommendations of Airbnb that will match a specific request. As an example, what if I want recommendations for a studio apartment in a hip neighborhood, around bars and coffee shops.

For the first recommendation system, I used a dataset with a detailed description of listings in Seattle. This dataset included information about each listing’s neighborhood, host, amenities, reviews scores, availability, price, and location.

I used an easy trick with cosine similarity, that allowed me to find listings according to my request. I was able to accomplish it in one function and a few steps:

  • Tokenizing text content of a data frame using Count Vectorizing
  • Transforming the tokenized text column into a matrix
  • Transforming a personalized request, which is a text input of a function.
  • Finding a cosine similarity between matrices of a request and content of a data frame
  • Selecting five rows with the highest cosine similarity

This is the function:

This was great, but at this point, the code was stored in my Jupyter notebook, and no one could use it. Fortunately, I can deploy my code into a Flask app and create a website through Heroku, a free web-hosting service, which will allow others to make use of this search functionality.

Deploying model into Flask APP → Heroku

  • The first step of this process is to save attributes for the main functions into a pickle. 🥒
  • Next, I created a py file with my main function (web API)
  • Another .py file included backend of the Flask app
  • Lastly, here’s HTML for a simple frontend interface.

Once I was able to call the working app through my terminal, I started working on deploying it through Heroku. I used an easy guide to deploy my app in Heroku.

Check out the result below! (And sorry for primitive front end interface.)

Part II

Recommendation system #2

The goal for the second part of this project is to create a collaborative filtering recommendation engine that gives users recommendations based on Airbnb listings people with similar tastes and preferences liked in the past. This method predicts custom ratings on a per-user basis by using the similarities between users as a reference.

For the second recommendation system, the Airbnb dataset I worked with included reviews for the listings in the dataset I used in Part I, reviewer ID, and listing ID.

Let’s go over my process:

My dataset didn’t include a rating or sentiment for the review, so to understand the sentiment of reviews, I used NLTK’s package SentimentIntensityAnalyzer, which allowed me to calculate a sentiment score for every comment on each listing in the dataset from Part I. Below, you can see a sentiment score for each comment (the rows) generated in the polarity column at the far right.

Below is the plot of the distribution of sentiment scores for all the Airbnb users’ reviews. Notice that the majority of comments have positive sentiment values.

Before fitting sentiment scores into a model, I had to transform it, so the scores are normally distributed. The graph below shows the normal distribution.

For creating a robust recommender model, I used the Suprise library. The library was presented by Nicolas Hug on PyParis 2017. To be clear, Surpise isn’t a recommendation engine. It’s a library that allowed me to create a benchmark for using the following algorithms:

  • Grid Search CV to find the best parameter
  • SVD (Singular Value Decomposition) to predict a rating for a user-item pair based on the history of ratings

I used RMSE as my accuracy metric.

Grid Search CV was able to determine the best RMSE score of 0.0995 with the best parameters, shown below.

Now that I had the best score, the next step was to write a function that provided three listings recommendation for each user, which you can see below.

I created a new dataset and applied the above function to all the users. Here is what the final dataset with three recommendations looks like:

Summary

I ended up with two useful recommendation systems. I used the first recommendation system to plan my trip to Austin, since it could give me the best Airbnb based on a search query consisting of my specific needs. The second recommendation system is a more standard recommendation system based on similar users’ preferences, which is widely used by companies like Netflix, Airbnb, Amazon, etc.

References:

To understand the complexity of Surprise library, I watched the following video: https://www.youtube.com/watch?v=z0dx-YckFko

And I also referenced this amazing data science blogger Susan Li: https://towardsdatascience.com/building-and-testing-recommender-systems-with-surprise-step-by-step-d4ba702ef80b

Github repository: https://github.com/AisOmar/Airbnb_recommendation

--

--

I am a data scientist who loves coming up with different theories and uncover the insights using data and algorithms.