Plotting a map of London Crime Data using R

David Morison
Towards Data Science
5 min readApr 19, 2017

--

I recently decided to play around and get familiar with the GIS (geographic information systems) packages available in R. I’ve always been interested in maps and I’m one of those who can spend as much time exploring Google maps as the average American spends on Facebook (maybe not quite as much).

This write up is aimed more towards the experience I had in learning how to use R for GIS purposes and the steps I took to achieve an objective of plotting some points on a map.

Data

First of all I needed some data to work with. Having read a number of blogs based on the crime figures from the UK police data I decided to do a bit of exploration with this data. It was perfect for my objective of this project as it includes the month a crime took place, the latitude and longitude co-ordinates and the type of crime.

My area of interest was London as this is where I live. The Greater London area is covered by the Metropolitan Police Service and the City of London Police data sets. I chose 2016 as the study period and downloaded the monthly data sets.

(I only realized at a later stage when plotting the data that the Metropolitan Police Service data included crimes from other cities in the UK. This was easily corrected by including an extra step in my data preparation phase outlined in the next step.)

The other data needed for this project was the GIS Shapefiles for which I was wanting to be able to plot the London borough boundaries. These are available from the government’s London Datastore website.

Data preparation

After unzipping the downloaded data it contained a csv file for each month all in separate folders.

I wrote a script to read all these separate csv files into R and then at the same time carry out the necessary sub-setting and binding to ultimately form one dataset.

For the purpose of this project I retained only the variables “month”, “longitude”, “latitude”, and “crime.type”. I chose to exclude all missing data as I didn’t want to use any estimates as I was going to be plotting exact co-ordinates on a map for specific types of crime.

The final data consisted of 988,848 crime incidences and 14 types of crime.

Creating the plots

There are a number of packages needed in R for reading in Shapefiles and transforming either the co-ordinates to match the map projections for your plots or vice versa. These packages included: rgdal, rgeos, sp and ggplot2. There are various other packages that can be used to achieve similar results.

Firstly, the shapefile is imported into R using the “readOGR” method and then the projection needs to be transformed such that it matches the format of the latitude and longitude co-ordinates I have in the crime data set I curated. This proved to be the trickiest part of this project and took a large amount of Googling to work it out.

ldn1 <- readOGR(file.path(data), layer = “london_sport”)
proj4string(ldn1) <- CRS("+init=epsg:27700")
ldn1.wgs84 <- spTransform(ldn1, CRS("+init=epsg:4326"))

Once this is done it’s easy going from here especially if you are familiar with ggplot2. I took this in steps and first wanted to see how the plot of the map came out. This is done using the geom_polygonmethod available with ggplot2 and the transformed shapefile data from above:

map1 <- ggplot(ldn1.wgs84) +
geom_polygon(aes(x = long, y = lat, group = group), fill = “white”, colour = “black”)
map1 + labs(x = “Longitude”, y = “Latitude”, title = “Map of Greater London with the borough boundaries”)

The next step was to add a layer of all the crime locations and differentiate between the types of crime:

map1 + geom_point(data = df, aes(x = Longitude, y = Latitude, colour = Crime.type)) + scale_colour_manual(values = rainbow(14)) + labs(x = “Longitude”, y = “Latitude”, title = “Map of Greater London with the borough boundaries”)

In theory this was the objective of this project, however, as it is clear from the above plot that this isn’t ideal as it is far too busy and difficult to interpret.

I decided to look only at one type of crime for a particular month. Seeing that I enjoy cycling I and wouldn’t ever want my mountain bike stolen, I would like to know whether I should be extra vigilant when parking my bike in certain areas therefore I trimmed the data set to look at bicycle theft for the month of December.

dec <- df[df$Month == "2016-12", ]
dec.bike <- dec[dec$Crime.type == “Bicycle theft”, ]
map1 + geom_point(data = dec.bike, aes(x = Longitude, y = Latitude), colour = "red") + labs(title = "Bicycle theft in Greater London - December 2016")

This produced a more visually appeasing plot that could be interpreted. As expected the crime incidences are concentrated more towards the city center.

The down-side to this is that the amount of insight one can gain is limited to one type of crime for one month of the year, while there are another 13 types of crime and the other months of the year to look at.

summary

The objective of this project was to explore the GIS packages available in R and use these to plot some points on a map. This was achieved using data obtained from the UK police website for different types of crimes and their locations for the year of 2016. By trimming the data to look at one specific type of crime, in this case bicycle theft for one month it became easier to interpret the plot.

This project was a great learning experience and demonstrated the usefulness of R as a tool for GIS.

Considering the amount of data to display in this project and the limitations of interpreting larger data sets on the plots I thought it would be a great idea to turn this into an interactive ShinyApp, which I have left as a follow-up post, which can now be found here.

--

--

Web developer interested in all things science, data and technology. Masters degree in atmospheric science. Endurance sports fanatic.