Three Important Questions – Location, Room Type, and Price before Booking Airbnb

Boston is the capital and the most populous city in the State of Massachusetts in the United States. Its economy, culture, history, and education attract hundreds-thousands of tourists each year. I have been eager to travel to this beautiful city so long and eventually make my plan this March. However, an unexpected global pandemic locks me down in NYC and delays my plan. After staying home, I have been planning the next trip to Airbnb Boston with data science techniques. I think that infuse data science into a trip plan would be more scientific and interesting. If you are interested, you can also consider this blog as a funny and may-be insightful guidance for your next trip to Boston.
The open dataset I will use coming from here and complied on 10 June 2020. The original dataset consists of 3440 listings, 16 features of Airbnb across 25 Boston neighborhoods. In this post, I will provide you with data visualization and machine learning solutions for three main questions that you would care about: Location: What regions do you have more choice or you are more likely to stay in Boston Airbnb? Room Type: What types of rooms are most popular for stay? Price: What are the important features to influence price? Could you predict the price of Boston Airbnb?
Preliminary Data Visualization
Firstly, I review paired relations using seaborn. Which provides general information and patterns among 9 usefully numerical features.

Some insightful points (Check Figure 1.):
- Latitude: From South 42.25 to North 42.40, the number of listings for Boston Airbnb increases.
- Longitude: From West -71.15 to East -71.00, the number of listings for Boston Airbnb increases.
- The Number of Reviews and Reviews Per Month has a positive correlation.
Then I applied Spearman Correlation Heatmap(Figure 2.) to review the correlation among 9 features.

Some insightful points:
- Latitude is positively correlated with the price (r=0.31) and longitude (r=0.30).
- The number of reviews and reviews per month are positively correlated (r=0.44).
- Availability 365 and calculated host listings count are positively correlated (r=0.25).
Furthermore, for analytical purposes, I also deal with outliers and remove the rows that price is above $500, dummy the features in room type, and exclude the minor room types (Figure 3.): Hotel room, Shared room.

Therefore, The new Spearman Correlation Heatmap with 11 features(Figure 4.) should be more accurate.

Lastly, the more important points:
- Price and Home Type: Entire home/apt is positively correlated to price (r=0.67). The private room is negatively correlated to price (r=-0.66). The average price for Entire home/apt is higher than the average price for the private room (will be shown later).
2. Latitude is positively correlated to price. As latitude increases from south to north, the prices of Airbnb may tend to increase.
3. The number of reviews and reviews per month are positively correlated.
4. The number of reviews and reviews per month is negatively correlated to minimum nights (Which are required by hosts).
Location: What regions do you have more choices or will you be more likely to stay in Boston Airbnb?
Figures 5. shows the number of listing Airbnb across 25 different neighborhoods in Boston.
The TOP 5 neighborhoods that have most Airbnb are Dorchester, Downtown, Jamaica Plain, Roxbury, and Back Bay.

Figure 6. shows the proportion of Airbnb across neighborhoods in Boston.
Check this out. Remarkably, Dorchester has an easily higher proportion of Airbnb compared to other neighborhoods, at 12%.

Figure 7., a density plot shows the distribution of Airbnb across Boston. The brightest area has the highest amount of Airbnb. You can also review the actual map of Boston Airbnb in Figure 8.
I find that Boston Airbnb is highly populated in longitude from West -71.08 to East -71.06 and in latitude from South 42.34 to North 42.36.


Look at them closely in Figure 9. An Airbnb Scatterplot in 25 different neighborhoods across Boston. Longitude and latitude are represented on the x-axis and y-axis.

If you would like to expect what neighborhoods have a higher chance to find your Airbnb, the TOP 5 neighborhoods with their locations that are indicated in Figure 10. would provide useful information into your plan.

Room Type: What types of rooms are most popular for stay?
Figure 11. shows that the Entire home/apt and Private room are the most available room type considering the number of listings.

If you consider minimum nights to stay, Figure 12. shows that minimum nights (required by hosts) of 91, 1, and 2 probably give more choices for travelers.

If you look at Figure 13. and Figure 14., you will find there are about 576 Airbnb listings not available in all 365 days (either will be very popular or permanently closed).
Interestingly, there are also about 452 Airbnb listings available in all 365 days.


Next, Let’s review some statistics about different room types
Figure 15. shows that travelers tend to stay longer in the Shared rooms than in the Private rooms and Entire homes/apt.

Figure 16. and Figure 17. both confirm that Private room and Entire home/ apt have a higher average number of reviews than Shared rooms and Hotel rooms.
The confirmation may indicate that the Entire home/ apt and private room are your choices of popular room types.


Lastly, Figure 18. shows the average days of availability in a year by room type.
The shared room has a much lower 131.94 day of availability. But it only has 16 listings. Therefore data would be biased and should not be considered as a decisive point for popularity.
Compared to the Hotel room, Private room and Entire home/ apt have lower days of availability in a year. Especially Entire home/ apt has about fewer 30-days availabilities than Hotel room. Therefore, we can probably assume that the Entire home/apt is more popular.

Price: What are the important features to influence price? Could you predict the price of Boston Airbnb?
After dealing with outliers, dummy variables, missing values, I use 3357 observations and 11 variables to build three models: Linear Regression, Lasso Regression, and Random Forests. The Response variable is the price.

Meanwhile, I find the average price by private room is about $81.22. The average price by the entire home or apartment is much higher at about $189.38.
The actual average price for Boston Airbnb from the test dataset is about $147.85. The predicted average price for Boston Airbnb from the test dataset is about $149.53. Meanwhile, the predictions are built upon Random Forests. Plus, you can also check the distributions of actual prices vs. predicted prices for Boston Airbnb in Figure 20.

Model Performances

The R-squared(R²) at 0.549 indicates that the Random Forest model best explains the variability of the response data. The Mean Absolute Error(MAE) at 40.423 indicates that the Random Forest model has a lower absolute difference between prediction and actual observation. Which means it has lower prediction errors. Clearly, Random Forests is the best among the three models.
Feature Explanation: Coefficients of Regression Model, Tree-built Feature Importance Method, Shapley Value

Figure 22. shows the coefficients in Lasso Regression (alpha=0.1). Obviously, as the variables latitude, room type entire home/ apt, and longitude increase, the response variable price will increase. Reversely, the increase of room type private rooms will lead to a decrease in the price. The results of the coefficients also agree with my expectation that location and room type are important influencers for the prices.

Figure 23. shows the feature importance ranking plot from the Random Forests model. Room type Entire home/apt, Latitude, and Longitude are still the most important features to predict the price. Interestingly, calculated host listings count and host id is bigger influencers in Random Forests.

I also use Shapley Value to analyze and explain predictions in Random Forests predicting prices of Airbnb.
As it indicates in Figure 24, the Shapley value plot can further show the positive and negative relationships of the predictors with the target variable price[1].
- Feature importance: Variables are ranked in descending order.
- Impact: The horizontal location shows whether the effect of that value is associated with a higher or lower prediction.
- Original value: Color shows whether that variable is high (in red) or low (in blue) for that observation.
- Correlation: A high level of the "Room type Entire home/ apt" content has a high and positive impact on the price. The "high" comes from the red color, and the "positive" impact is shown on the X-axis. Similarly, "minimum nights" is negatively correlated with the target variable price.

Figure 25. is a simpler version of Shapley Value indicating the average impact of each variable on the model’s output price in descending order and ignoring positive/negative prediction for the price. Of course, Shapley Value could be used to explain more complex models such as deep learning magically. Next time, you can use this algorithm to explain your BlackBox of deep learning models to your audiences.
Conclusion:
Using Data Science is not only can help to make business decisions but also can make life more interesting and scientific. Within data science for Boston Airbnb so far, I will apply these guidances into the next trip to Boston:
- Location: What regions do you have more choice or you are more likely to stay in Boston Airbnb?
TOP 5 neighborhoods for you to make decisions are Dorchester, Downtown, Jamaica Plain, Roxbury, and Back Bay. Geographically speaking, you would like to locate longitude between west -71.08 and east -71.06, latitude between south 42.25 and north 42.40 in Boston.
- Room Type: What types of rooms are most popular for stay?
Generally speaking, you would have a higher chance to find your Airbnb within the room type Entire home/apt and Private room. Making a comparison between the two, Entire home/ apt has higher numbers in terms of listings, average reviews per month. Private room has higher numbers in terms of average minimums nights, the average number of reviews, and average days of availability in 365 days.
- Price: What are the important features to influence price? Could you predict the price of Boston Airbnb?
Lasso Regression and Random Forests both agree that Location(Longitude & Latitude) and Room Type (Entire home/apt &Private room) are important to predict prices of Boston Airbnb.
Very interesting, the feature importance function in Random Forest and Shapley Value both suggest that calculated host listing count is important while it is contributed zero in the Lasso Regression in the term of coefficients.
If you care about the price, you might choose a Private room for your next trip. Otherwise, from Southwest to Northeast, the price of Boston Airbnb tends to increase.
This is my first post on Medium. I hope it helps! I welcome feedback and constructive criticism. You can contact me on LinkedIn: https://www.linkedin.com/in/lanxiao12.
Before you go, the codes can be found to my GitHub here. Happy coding, happy life!
Special Thanks to Menoua Keshishian.
Reference:
[1] Dr.Dataman, Explain Your Model with the SHAP Values(2019), Towards Data Science