Using Python to Find Myself A Rental Home

Narrowing my search for homes by Web Scraping, Web Maps, and Data Visualisation

Reuben Lee
Towards Data Science

--

HDB flats in Singapore

‘Hello World!’

As my winter exchange programme in Europe is coming to an end, reality dawned upon me: I needed to find a new apartment to stay in Singapore.

This is simple! All I have to do is go on to these websites and essentially key in a few search parameters to find my optimal apartment. Having rented units previously, this process is fairly familiar and comfortable. However, after crash coursing through Python for a bit, I thought this would be a great way to create a more efficient search process, and to test my skills by dealing with a real world problem.

Before we delve into the project, do take some time to visit the robot.txt file of iProperty. While they do allow bots to scrape their site, it is always important to exercise prudence and not to misuse this privilege given to us.

Context

Now, what do I look for when finding a rental apartment? To put things simply, my optimal rental apartment can be described as a function of these two variables:

  • Near the MRT (metro) station for ease of commute anywhere.
  • Reasonably priced (as I am a broke student).

Conventionally, these are the steps that I would take to find that optimal apartment:

  1. Search on a property website, find a unit with a good price.
  2. Look up the address on google maps to see if it is located near an MRT station.
  3. If the location is what I wanted, great! I will jot it down somewhere, somehow.
  4. If the location isn’t what I wanted, then I would repeat steps 1 to 3 again.

The steps above look easy, if it were repeated 3 or 4 times. However, for any prospective renter out there, we know that it is rarely the case. We would usually have to run through many iterations to create a large enough pool of units for us to compare and make a decision. Furthermore, after the process mentioned above, there comes the process of talking to the agent, finding out of the unit is all prim and proper according to my liking, and so on. If at any point in time something goes wrong, then it will be back to steps 1 to 4 again.

There has to be a way where I could compile all my desired search results in one workspace, at a click of a button for more efficient comparison.

I already have my sights set on finding a 3-bedroom apartment in an area called Ang Mo Kio, which is located in District 20 of Singapore. As for the website, I will be focusing mainly on iProperty. Keep in mind these parameters as they will be repeated throughout the course of this post: District = 20, bedrooms = 3.

With that in mind, I thought of forms of visual aids that might help me with making such a decision. They are:

  • An interactive map whereby it can show me the location of all the rental homes in an area.
  • A graph depicting the distribution of prices around the area so that I know what to expect.

Let me do just that with Python! The link to this project’s repo can be found here. Now lets get started.

Part 1: Data sourcing

The data that I need can be sourced from the property website. Problem is, it is not the usual excel or csv files that I am used to. The data comes in the form of HTML, which looks a little bit different…

To go about this, I can use Python’s Beautiful Soup 4 package. It allows me to scrap the HTML data from the website. Before scraping though, some formalities need to be added:

What are the information that I need? Price, Address, and Size of the apartment perhaps. Let me start off by scraping the data for Address. I am not a HTML expert, but I do know that various forms of information are contained within some element types. Figure 1 will demonstrate that. By inspecting the source code, we can see, for this particular unit, the information about its address is contained in this element <a>, under the main <div> of class ‘fsKEtj’.

Figure 1: Inspecting the source code

The code to extract the information, and to store it in the empty ‘address’ list is as follow:

Figure 2

Output for address?

Figure 3: First five rows of the addresses scraped from the website

Great! Now that is done, we just have to repeat the same process of finding the element which contains the Price and Size information, and to extract that as well by adjusting the code in Figure 2.

That was page one of the iProperty search page, which showed me 10 apartments. I would need a lot more than that to perform a proper comparison. In this post, I would choose to extract data from 5 pages of search results, giving myself a pool of apartments to choose from. How can that be done? Luckily for me, the URL of iProperty is fairly simple:

https://www.iproperty.com.sg/rent/district-20/hdb/?bedroom=3&page=1

As you can see, it only contains three parts: the district, number of bedrooms, and the page number. That makes life a lot easier, as I can now just loop through pages 1:5. In between each loop, I also planted a time.sleep function, just to let my web scraper behave sort of human-like and prevent myself from being blocked from the website.

With that, I now have a reasonably sized pool of apartments with three variables: Price, Address, and Size. After combining them into one data frame using pandas (noting also that there is no error showing that the number of rows are different implying that each unit displayed would have an address, price, and size), I would now need to clean the data. Let us have a look at the first five rows of the dataset.

Figure 4: First five rows of the full data set

Right of the bat, you can see that there are various aspects of the data that would need to be adjusted. For one, I want the Price and Size column to be numeric. Hence ‘SGD’, ‘Built-up :’ and ‘,’ has got to go. Also, upon closer inspection for the size column, there are several units that are denoted in ‘sq. m.’ instead of ‘sq. ft.’. These would have to be cleaned as well. All the data cleaning steps taken can be seen in the script in my project repo. Have a look at the cleaned data set in Figure 5!

Figure 5: The cleaned data set

Almost there! Notice anything wrong with the units? There seems to one ‘rogue’ apartment (index 1), which is tiny in size (100 sqft), yet extremely pricey ($2400/month). As a precautionary measure, I put in a check to remove such units.

Now that I have my full data set for all apartments listed on iProperty, with parameters district = 20, bedrooms = 3, lets go into creating an interactive map.

Part 2: Creating an Interactive Map

Admittedly, this part is the most challenging aspect of the project. In order to plot the apartments in my data set on a map, I would need the coordinates of the said apartments. Here I would use the Geopy package to retrieve the coordinates based on the addresses in my data set, and Folium to utilise these coordinates to create the map and its interactive features.

First, we loop through all the addresses that are contained in the data set, and extract out the latitude and longitude of each apartment. These two variables would then be added into the main data set.

Figure 6
Figure 7: Full data set with new latitude and longitude columns

After obtaining the coordinates of each apartment, we are now ready to plot them all on the Singapore map! Let us create the base map:

location = [1.3521,103.8198]
sgmap = folium.Map(location,zoom_start = 12)
Figure 8: Base map of Singapore

Now time to add markers! For my markers, I included two features that would help me consolidate information more effectively.

  • Each marker is colour coded based on their price. All the rental prices are ordered, and their respective percentiles are calculated. Green markers indicate that the rent of the apartment is within 0–30th percentile, orange 30th — 70th percentile, and red indicates any rental price above the 70th percentile.
  • There will a pop up at each marker showing the address of the apartment as well as its price.

The end product looks like this:

Figure 9: Interactive map with all rental units in the Ang Mo Kio area

From the map, I can see that there are about 5 units that are close to the MRT (metro) station. There are two relatively expensive units (in red), two moderately priced (orange) and one relatively cheap unit (green). If I were to click on the markers of those units, a pop up as shown in the map will appear.

Part 3: Displaying Rental Price Distributions

This part is fairly simple, since I have all the data that we need. All I would need to do now is to use Matplotlib and plot charts that would help me better understand the distribution of prices in this area. I prefer to use a boxplot:

The boxplot shows me the distribution of prices, and the average price. It allows me to infer that most 3-bedroom units in Ang Mo Kio cost about $2300 to $2600 a month.

Conclusion and Future Improvements

There are numerous aspects in which this project can be improved upon. One area that I would very much love to explore in the future would be to include a ranking system in displaying apartments. For instance, factors such as distance to MRT (metro), distance to supermarket, distance to schools, etc can be used to assign a score to a particular apartment, and apartments that have higher scores will be more favourably displayed on the map. For now, my free time is up, and I would have to settle for the current model that I have.

Another aspect would be the fact that all of this is done on a notebook IDE. Should I try to let others use this tool, they would first need to access a notebook environment. In the future, I would definitely be looking to turn this into a web-based application!

With this project, I created a simple tool that would help me find a future home more efficiently. It has been an enjoyable journey toying around with Python and its libraries, and I will definitely be back for more!

References

https://towardsdatascience.com/master-python-through-building-real-world-applications-part-2-60d707955aa3 — a wonderful guide on using Folium

--

--