Data Analysis: predicting the housing market using Python

W. Weldon
Towards Data Science
6 min readMay 31, 2019

--

Overlook of Seattle from the top floor of an apartment near Westlake. Photographed by W. Weldon.

Since the burst of the housing bubble in 2008, we entered the recession and hit the rock bottom in 2012. The housing market has been raising again and the price has surpassed the 2008’s peak. For years, Seattle’s housing market has been ranked top 3 fastest growing cities, by 85% year-to-year growth rate.

As the housing price is pushing to another new peak. I am sympathetic about affordability because I moved 10 times in the past 9 years in Seattle since the housing price directly impacts the rental market as well. Everyone wants to live in a comfortable place, especially for families with children. However, once you buy a house, you have another problem to worry about.

Owning a house probably becomes your largest asset when the mortgage loan is paid off. When the price of a house fluctuates, as a house is an investment like stocks and bonds, it scares people because the house value could dip below the mortgage balance. In a perfect storm scenario where you lost your job and became delinquent in monthly payments, you could both lose the house and owe the bank. (They auction your house below the mortgage balance. The money you own = mortgage balance — sold price.) As passive and vulnerable as a house owner seems, how could you become aware of the next storm? I am going to analyze some home sales records for the past two years and tell you what is going on in the housing market.

Can you afford a house?

Using Octoparse, I scraped around 6,000 sold homes in 2018 and 2019, with data including the number of bedrooms, sold price and zip codes. Here are the steps:

A. Scrape the data

Step 1: Scrape a list of URLs from Trulia.
Step 2: Load the list into Octoparse.
Step 3: Select extracted data fields from Octoparse.
Step 4: Save and Run extraction.
Step 5: Export the CSV.
(There are more step-by-step tutorials in their blogs.)

B. Use Python to analyze the data[4]

Step 1: Read the CSV file and split each row in sold price, number of bedrooms, square footage and sold date.
Step 2: Store all data in a list of tuples where each item in the list are values labeled price, bd, sqft, date.
Step 3: I show a total of 5365 homes were sold in Seattle in 2018 and 2019 up to today.
Step 4: Find out how many homes were sold in 2018 and 2019 (up to today). It shows that

houses sold in 2019: 2309
houses sold in 2018: 3056

Step 5: Filter the sold homes by the number of bedrooms. I want to see the price by the number of bedrooms. In particular, what are the average, median and max prices for different numbers of bedrooms? Here is a table visualizes the data.

Home sales in the second half of 2018 and the first half of 2019 by bedroom size

What do the sold homes tell us?

I use Python to calculate the numbers of bedrooms and sold price so as to observe any relationship between the number and price. I came up with three numbers for the sold price: mean, median and maximum.

3-bedroom homes are the most popular among sales, with 1,937 sold. While 3-bedroom homes have a median price of $734,000, the mean is slightly higher than the median, which says more 3-bedroom homes are sold at a higher price (implies they are less affordable.) The second most popular home type is 2-bedroom homes with 1,577 sold at a median price of $600,000. Next comes the 4-bedroom homes with 892 sold at a median price of $823,000.

Home sales by ZIP codes

What does the household income look like in these areas?

In 2018 and 2019 (up to today), there were around 6,000 sold homes in Seattle. When the data is visualized by zip codes, we can see 98103 and 98115 have the most sold homes, 392 and 383 units perspectively. How does it relate to household income in that area? Since those areas are close to where tech giants like Google, Amazon, Adobe campus, yet are not as expensive as those in downtown, people with both high income and daily commute need favor living north of downtown for convenient access to the I-5 freeway.

What does it mean to own a 3-bedroom home priced at the median?

Let’s break down the numbers. There are 36% of the sold homes in the $734k median category. It implies most homeowners can afford at this median price of $734k in 2018 and 2019. I calculate the mortgage to see how much you actually need to make in order to afford that median-priced home. It comes as below:

Though with monthly debts, you want to live joyfully at a debt-to-income ratio of 36%, your family needs to make $190,000/year in order to afford a 3-bedroom home sold at $734k. How many people do you beat out with a $190k income? According to Pew Research[1], you are in the top 25% which places you in the upper-income tier in the Greater Seattle area. The report also says the income held by the bottom 90% shrank by 12.8% whereas the top 10% makers held 50% of all America’s income.

Home price over time by the number of bedrooms

Thoughts?

The housing market in Seattle is pretty flat in terms of price since a year ago. For people who bought a home early, they are having an appreciation in the home value. In contrast, people who just bought the home within a year, it is hard to say when the price is headed north again. Although the number of sales every week is slightly decreasing, looking at the high median prices of 2, 3, 4-bedroom homes and how much you need to make to afford a home, it is not surprising that the middle class starts to lose ground[2][3] in the metropolitan areas. The growth in the income couldn’t keep up with the rise in house price. I don’t see the booming prosperity, but an increasing gap between the rich and the rest. We have been putting so much effort to equalize and civilize society. Yet from Wars on Drugs by Nixon, War on Crimes by Johnson, to Wars on Immigrants by Trump, I still don’t see the freedom and greater good of the American Dream. In opposite, I only see people lose their jobs and homes throughout the years.

[1] https://www.pewresearch.org/fact-tank/2018/09/06/are-you-in-the-american-middle-class/

[2] https://www.pewresearch.org/fact-tank/2018/09/06/the-american-middle-class-is-stable-in-size-but-losing-ground-financially-to-upper-income-families/

[3] https://www.dailymail.co.uk/news/article-6744515/Middle-class-poor-Americans-continue-lose-ground-richest-rich-income-wealth-measures.html

[4]

--

--