This will make you know how much you need to travel with Airbnb

based on Seattle, Boston Airbnb Datasets

Sean Kim
Towards Data Science

--

Photo by Zhifei Zhou on Unsplash
Photo by Lance Anderson on Unsplash

Do these pictures urge you to catch a flight? If your answer is yes, you are a ‘Hodophile’ like me. A brief retreat from the reality and being an observer is an awe-full feeling — it gets you to appreciate what you have, what you can do, your friends, and family. Traveling lets you be yourself again — Hooray! Honestly, one of my favorite memories is sitting on a bench and having a beer watching people passing by in Bulgaria (It was okay there, at least nobody said anything).

I don’t know how affluent you are, but as a student, budget was always a problem that held me back from hopping on a flight. I have been using Airbnb for quite a long time now as my travel partner because I like more freedom I can have using Airbnb than hotel or other accommodations and furthermore, I happened to find home-like place with less price. However, visiting different cities, I’ve observed the Airbnb price varies. It was a high time I traveled into the data world.

Datasets are composed of information about Seattle and Boston Airbnb from Kaggle and each dataset has 3818 and 3585 rows. My goal was clear: “understanding Airbnb price”. To this end, I had some questions on my mind and I’d love to share my answers to these with you.

1. Is there ACTUALLY a price difference in two cities?

To give you the answer, Yes there is a difference. Let’s see how I reached this answer. Since price unit is $1,000 and thanks to some outliers (very expensive Airbnb’s), it was very hard to distinguish the difference between two cities.

Figure 1: price distributions of Seattle and Boston

After the log transformation of the price, I could see the distinction more clearly.

Figure 2: log-price distributions of Seattle and Boston

We observe that Boston Airbnb price is in general higher and is more spread out. Seattle log-price distribution shows more symmetric aspect while Boston log-price distribution seems slightly left-skewed implying that Boston rental price is formed high.Technically, the log-price means of Seattle and Boston were 4.68 and 4.94 and standard deviations 0.57 and 0.65.

According to Trulia, this corresponds to median rent per month. On average, Boston’s median rent per month now is formed around $2,900 and on the contrary, Seattle’s is around $2,700. Therefore, this results should come as no surprise. However, what is interesting is that median sales price shows the opposite trend in two cities: median sales price of real estate is on average $690,000 in Seattle, and is $610,000 in Boston. Using the concept of price to rent ratio, we achieve 17.53 for Boston and 21.30 for Seattle. As a general rule of thumb, if the ratio is from 16 to 20, then it is risky to buy a property and if it is 21+, it is much better to rent than buy. Therefore, it seems Boston is a more promising place to buy a house.

2. How many Airbnb properties are owned by the same host?

Since I confirmed that there exists price distinction, I wanted to find more why’s. My first interest was whether a few ‘super’ hosts determine the market price. Is it the case?

Figure 3: Average property numbers owned by a host

Boston hosts have more Airbnb properties on average by a bit. This might imply that fewer hosts determine the price of Airbnb properties. But we can’t tell anything yet with this, so let’s dig deeper.

Figure 4: Distribution of number of properties in Seattle
Figure 5: Distribution of number of properties in Boston

Looking at individual distributions, we can confirm that there are more ‘super’ hosts in Boston. Therefore, we can suppose those ‘super’ hosts set Airbnb prices higher in Boston. But I did not say they actually do. With the prediction model, I will be able to tell where this predictor is ranked.

3. How does the price spread based on location — evenly or unevenly?

My next question was whether location affects Airbnb price in two cities because it seems natural that busy location has higher price to stay in. I used the zip code as a proxy for location information.

Figure 6: Seattle location distribution by zip code
Figure 7: Boston location distribution by zip code

When it comes to the degree of the spread, the location distributions of two cities do not show much difference according to standard deviation. However we do notice a pattern that it is not evenly distributed. When we compare the distributions against rent maps (Seattle and Boston) from Trulia, high proportion locations correspond to high rent areas; therefore having a higher impact on price distribution.

Figure 8. Seattle and Boston rent heatmaps

After all, most Airbnb properties are crowded at the hotspots of the cities.

4. Does my model predict the price well?

Well, nothing is perfect — at least in data science world. However, I am pretty happy about the performance of my prediction model. In Table 1, the predicted prices are not too far off of actual ones.

Table 1: True price values and predicted price values

[I know this might be daunting. This is only for folks who are curious:

I employed the notorious Extreme Gradient Boosting (a.k.a. XGBoost) regressor as a framework together with exhaustive grid search cross validation. At each iteration the model was evaluated with 5-fold cross validation and found the best parameters that give the best performance.

The test RMSE (Root Mean Square Error) has reduced as compared to cross validation RMSE and came out to be around 78.0201 per $1,000, which is promising — it means the model performed better in the wild than practice! ]

If you want more details, please visit my Github!

5. What are the most important predictors for the price?

Figure 9. Top 10 important features for Airbnb price

Now, we obtained the list of top 10 important features which goes:

  • zip_has: number of Airbnb properties in the location of the zip codes. An indicator of how busy the location is.
  • bathrooms: number of bathrooms.
  • host_days: days that host has been hosting Airbnb properties.
  • bedrooms: number of bedrooms.
  • minimum_nights: minimum nights you need to book for.
  • extra_people: extra fee for more guests than the host set.
  • beds: number of beds
  • guests_included: number of guests Airbnb property can hold.
  • availability_365: number of days available in a year.
  • number_of_reviews: number of reviews.

If you see the list above, it seems natural that they play an important role to predict the price. First, if the place is a busy place such as a tourist sight, there should be more demand for accommodations; thus increasing the price. The number of bathroom, bedroom, and bed reflect the level of comfort. No one would prefer to wait in line before the bathroom or sleep on the couch. If you have been hosting a Airbnb property quite a long time, you should know how the market fluctuates. From a data science perspective, you are more predictable because you are more likely to lead the market trend than a fresh host. Minimum nights, extra guest fee, and maximum guests number are rules set by the hosts and factored directly into the calculation for the price. Therefore, it is convincing that these three factors are ranked high. Lastly, the number of days available in a year and the number of reviews reflects the popularity of an Airbnb property. Naturally, high demand leads to a higher price.

We don’t see ‘host_has’ (number of properties a host owns) ranked in top 10 list (actually ranked 13th) — A takeaway here is our instinct is not the best tool in some cases and the world needs data scientists! (haha)

Conclusion

In this project, we have delved into the Seattle and Boston Airbnb datasets and found some interesting patterns:

  1. Boston is more expensive city to travel with Airbnb than Seattle. This phenomenon complies with the current median rent per month trend in two cities — Boston is more expensive to rent a property.
  2. Hosts in Boston own more Airbnb properties on average and there are more ‘super’ hosts. Therefore, it might imply that a few super hosts can sway average price in the city.
  3. High proportion locations correspond to high rent areas in both Seattle and Boston. Therefore, we may assume rent, location, and Airbnb price are closely related.
  4. From important features, we know that popularity, location, level of comfort, and property rules impact on the Airbnb price.

This project set out with questions such as if there is a difference in price in two cities, how prices spread based on location, and what the most import predictors are. Hopefully, this project answered them and provide you with insights of what can influence Airbnb price. I believe this prediction model can provide customers with information on the travel budget and plan, and hosts with reference to set a price for their properties.

Still, there is plenty of room for improvements. I welcome any feedback and suggestions with open arms — At the end of the day, we data scientists share! Doing this project I have an itch for a trip and I might as well. Wherever you are traveling — either around the world or data world. I wish you the best of luck on your journey. Bon voyage!

Connect with the Author

You can connect with him on LinkedIn

If you want more details, visit his Github

--

--