Creating a map of house sales

Nadine Amersi-Belton
Towards Data Science
5 min readMay 6, 2020

--

In this tutorial, I will guide you step by step to create a map displaying houses sales using Bokeh, with a colour mapping indicating sale price. I wanted the viewer to be able to distinguish at a glance which neighbourhoods are most expensive to live in . This is also a useful visualisation for price prediction modelling purposes to get a sense of how important location is and whether new features relating to location may be useful to engineer.

We will be using a dataset with sale prices for the King County area, which can be found on Kaggle but the principles can be applied to a dataset of your choice.

This is the end result:

First things first, let us ensure our dataset is fit for purposes. Note that Bokeh requires Mercator coordinates to plot data points on the map. Our dataset had latitude and longitude coordinates so a function was required to transform these. Assuming all data cleaning steps (e.g. missing values) have been completed, this is how the data should look.

df.head() 
Pandas DataFrame with columns zipcode, mercator_x, mercator_y and price

Time to plot!

I will break the code down and explain what each element does.

Let us begin by importing the necessary tools from Bokeh.

from bokeh.io import output_notebook, show
from bokeh.plotting import figure, ColumnDataSource
from bokeh.tile_providers import get_provider, CARTODBPOSITRON
from bokeh.palettes import Turbo256
from bokeh.transform import linear_cmap
from bokeh.layouts import row, column
from bokeh.models import ColorBar, NumeralTickFormatter

One of Bokeh’s stand out features is the built-in collection of map tiles. Whilst you could use google maps, this requires obtaining an API key. We will use Carto DB. Let’s select this tile with the get_provider() function.

chosentile = get_provider(CARTODBPOSITRON)

Next we will choose a palette for colour mapping. We want to ensure our visualisation is effective in distinguishing where the most expensive houses are located. Bokeh has a fantastic choice of pre-defined palettes and depending on the granularity you wish you can select how many colours from the palette to use. There are also a few extended palettes with a larger spectrum, which is what we’ve gone for here.

palette = Turbo256

We will also define the source to be used, which is where the data is coming from. The most common type is ColumnDataSource which takes in a data parameter. Once the ColumnDataSource has been created, it can be passed into the source parameter of plotting methods which lets us pass a column’s name as a stand in for the data values.

source = ColumnDataSource(data=df)

Next, let’s define our colour mapper. This is a linear colour map, applied to the price field. We set the low and high parameters to be the minimum and maximum price respectively.

color_mapper = linear_cmap(field_name = ‘price’, palette = palette, low = df[‘price’].min(),high = df[‘price’].max())

A neat feature is including additional information when a user hovers over a data point. This is defined using tooltips. We have chosen to display the price of the house and zipcode. The @ symbol means that the value is from the source.

tooltips = [(“Price”,”@price”), (“Zipcode”,”@zipcode”)]

Now we are ready to create the actual figure. Let us call it p. We will give it a title and set the x and y axis types to be Mercator so that when rendering it displays latitude and longitude ticks, which we are more accustomed to. We will also define our axes labels, for completeness.

p = figure(title = ‘King County House Sales 2014–2015’,
x_axis_type=”mercator”, y_axis_type=”mercator”,
x_axis_label = ‘Longitude’, y_axis_label = ‘Latitude’, tooltips = tooltips)

Let us add our chosen map title using the add_tile() function.

p.add_tile(chosentile)

Each house will be plotted as a circle by specifying x and y as the Mercator coordinates x_merc and y_merc. The colour of the circle will be determined by the linear colour map we defined above.

p.circle(x = ‘mercator_x’, y = ‘mercator_y’, color = color_mapper, source=source)

Let us add a colour bar, which will serve as a visual aid in understanding our colour mapping. By default, Bokeh uses scientific notation where appropriate but in this instance, I thought it looked better to use plain notation for the price. The default causes the price label to overlap with the colour bar so we need to add a value to label_standoff. Unfortunately there is no straightforward way to add a title to the colour bar, so for now, we make do without.

color_bar = ColorBar(color_mapper=color_mapper[‘transform’],                     formatter = NumeralTickFormatter(format=”0,0"),
label_standoff = 13, width=8, location=(0,0))

Let us specify that the colour bar should be on the right of the figure by creating a layout.

p.add_layout(color_bar, ‘right’)

Finally we would like the map to be displayed in notebook and shown.

output_notebook()
show(p)

Ta da!

Map of King County with markers for house sales

From this visualisation, it is immediately obvious that the most expensive homes are located on the waterfront and in particular in Medina, Clyde Hill and Mercer Island. There is also a notable difference between North and South, with the Northern part demanding higher prices.

We can also zoom in to get insights on a particular neighbourhood.

Clyde Hill Neighbourhood House Sales

And there you have it! I hope you found this short tutorial useful. Should you wish to tweak things further, I would recommend diving into Bokeh’s documentation. It is easy to follow, even if you’re a beginner like me! I would also love any feedback you may have.

To see the my project including data exploration and predictive modelling using linear regression, here is the GitHub link.

--

--