VISUALIZATIONS | MAPS

Breaking Down Geocoding in R: A Complete Guide

How to use APIs to find a place you are interested in and visualize its location on a map

Oleksandr Titorchuk
Towards Data Science
19 min readApr 25, 2020

--

If you ever wondered how to build maps similar to the ones you constantly see in your apps, it’s probably a good place to start. In this tutorial we will cover how to find a place based on its description or coordinates and how to build a simple map based on that information.

Please note that this article assumes some prior knowledge of R language: data structures, operators, conditional statements, functions etc. All the code from this tutorial can be found on GitHub.

So, let’s get started!

Photo by Julentto Photography on Unsplash

What Is Geocoding?

Geocoding is a process of converting an address or a name of a place into its coordinates. Reverse geocoding performs just an opposite task: returns an address or a description of a place based on its coordinates.

That’s all, as simple as that. So, by using geocoding you must be able to say that the Eiffel Tower in Paris, France can be found at (48.858568, 2.294513) latitude, longitude coordinates. And that by entering (41.403770, 2.174379) on your map app, you will end up at the Sagrada Familia Roman Catholic church in Barcelona, Spain. You can verify it yourself — just type in this information on Google Maps.

What Geocoding Tools Are Available?

When it comes to free geocoding tools available online, one of the options is specialized websites. For example, mentioned above Google Maps. After a bit of search, you can find others.

All of these are perfectly fine instruments if you need to find only a few addresses. But imagine there are hundreds of them? And what about thousands? This task rapidly becomes quite a headache.

For bulk requests API is a much more suitable option. And probably the most obvious choice here is Google Maps API . To be able to use Google services, you need to create the account on the Google Cloud Platform and get your API key. Google provides a detailed instruction on how to do it on their website.

Another option is to use a public API from the OpenStreetMap called Nominatim. OpenStreetMap is a collaborative project whose aim is to create a free maps service for the public. As its website says:

OpenStreetMap is built by a community of mappers that contribute and maintain data about roads, trails, cafes, railway stations, and much more, all over the world.

Nominatim is basically a tool to power the search on the OpenStreetMap website. Unlike Google Maps, Nominatim does not require you to register any account and get an API key. But if you want to use its services in an application, you need to provide an email address, so your app’s activity can be tracked and restrained if needed — OSM servers’ capacity has its limits.

Legal Considerations

You might be wondering why I am telling you about the Nominatim API in the first place if Google offers similar capabilities. And your first guess will probably be the cost — unlike OpenStreetMap Foundation, Google is a private company, which takes charges for its services. And it is true, but only partially.

Firstly, if you register on the Google Cloud Platform now, you are getting a 12-month free trial with $300 credit on your account to learn about its features. Secondly, even after that Google offers a limited access to some of its most commonly used services for free as part of the Always Free package. And if your sole purpose is learning, the limits available under that package are more than enough. To learn more about Google Maps API Pricing, visit Google’s help page.

So, what’s the issue then you would ask me? It is Google Maps Platform Terms of Service, which among other things state that:

3.2.4 Restrictions Against Misusing the Services.
(a) No Scraping. Customer will not extract, export, or otherwise scrape Google Maps Content for use outside the Services.
(c) No Creating Content From Google Maps Content.
(e) No Use With Non-Google Maps.

I am not a legal guy and don’t know how Google treats the use of its services for non-commercial purposes. But I haven’t seen on these Terms of Service any clause stating that above restrictions apply only to commercial use. So, please be mindful of these limitations before you decide to use Google Maps API in your app.

Unlike Google Maps, OpenStreetMap data is licensed under the Open Data Commons Open Database License (ODbL). Below is, as authors themselves put it, a human-readable summary of ODbL 1.0:

You are free:
* To Share: To copy, distribute and use the database.
* To Create: To produce works from the database.
* To Adapt: To modify, transform and build upon the database.

As long as you:
* Attribute: Give reference to the original database.
* Share-Alike: Distribute a database adapted from the original one under the same license.
* Keep open: Give access to the adapted database to the public.

A full-length license, in case you want to have a look, is available on the Open Data Commons website.

Having said all of that, let’s now move on to the coding!

Photo by Fotis Fotopoulos on Unsplash

Installing Packages

Let’s firstly install and load all the packages which will be used in this tutorial, so not to worry about it later. The purpose of each package will be described in the corresponding part of the article. Please also note that we are using software version R 3.6.2 for Windows.

# install packages
install.packages("ggmap")
install.packages("tmaptools")
install.packages("RCurl")
install.packages("jsonlite")
install.packages("tidyverse")
install.packages("leaflet")
# load packages
library(ggmap)
library(tmaptools)
library(RCurl)
library(jsonlite)
library(tidyverse)
library(leaflet)

Geocoding With R Packages

The R community created a few packages, which can be used for accessing Google Maps and Nominatim APIs. Let’s have a look on them.

Package ggmap

The first package is called ggmap and it allows you to connect to the Google Maps API. Before you can start using this package, you need to provide R with your API key.

# replace "api_key" with your API key
register_google(key = api_key)

Let’s now use geocode function from this package on the sample of twelve London pubs to demonstrate how it works. The function accepts as its output argument either:

  • latlon — latitude and longitude;
  • latlona — all of the above plus address;
  • more — all of the above plus place’s type and geographical boundaries;
  • all — all of the above plus some additional information.

Each option corresponds to the type of information generated. Generally, we don’t need more information than more option provides.

Running ggmap geocoding function

Let’s have a look on our results. As you see we have a pub’s name, its coordinates, type of the place, precision of the result (rooftop means that Google was able to find the place down to a specific building) and its address.

Results from running ggmap geocoding function

Now, let’s use the coordinates we just found to reverse geocode the places they belong to.

The revgeocode function allows to do that. It requires two arguments: location — a numeric vector of longitude/latitude and output — either address or all. Option all for output returns much more information than we need, so let’s stick to the address.

This time we will store our results in a list rather than a data frame. Each element of this list will contain another list with information about the pub’s name, coordinates and address.

Running ggmap reverse geocoding function
Results from running ggmap reverse geocoding function

That’s all for the ggmap. Let’s now move on to the next package.

Package tmaptools

The tmaptools is a package that offers a set of tools for reading and processing of a spatial data. It facilitates the capabilities of another R package called tmap, which was built for visualizing thematic maps. Many of the tmaptools functions rely on the Nominatim API.

Now let’s try to get from tmaptools the same sort of information we extracted by using ggmap. I had to modify a bit some search requests because Nominatim was not able to find the place based on it. And one of the pubs — The Glory — couldn’t be located despite all my efforts. So, be aware that the quality of data and its completeness might vary among different services providers.

Running tmaptools geocoding function

We included in the final table only coordinates and the full address. Here is how it looks like.

Results from running tmaptools geocoding function

Now, it’s time for reverse geocoding. In our output we will display the very same information as in reverse geocoding request from ggmap.

Running tmaptools reverse geocoding function
Results from running tmaptools reverse geocoding function

So, that’s the last piece of code for our discussion of R geocoding packages. And here you can finish reading this article and practice some of the techniques described above by yourself. Unless… Unless you want to find out more! If that’s the case, let’s continue!

Photo by Martin Adams on Unsplash

Geocoding With API

Using packages is a very convenient and fast way to get things done. And probably for most of the tasks you would want to do, the functionality these packages provide is more than enough. However, if you need something extra, or you are interested in other API functions, or you just want to learn how to work with API in general, you need to go to Google/Nominatim help pages and do a bit of reading. Or search online for some videos/tutorials like this offering short summaries. Or even better — do both.

Google Maps API

Having looked at the ggmap package, let’s now try to get the place’s location, address, and as a bonus, its phone number and a website, using Google Maps API directly. To accomplish this task we need the Geocoding API and the Places API.

Geocoding API

The Geocoding API is a service that provides the capabilities of geocoding and reverse geocoding of addresses and places. You can access the Geocoding API by sending HTTP request through your web-browser and get back response in JSON or XML format. Although, in our case we will be sending this request from R.

The Geocoding API requests take the following format.

# format
https://maps.googleapis.com/maps/api/geocode/outputFormat?parameters
# geocoding example
https://maps.googleapis.com/maps/api/geocode/json?address=The+Churchill+Arms,+Notting+Hill&key=YOUR_API_KEY
# reverse geocoding example
https://maps.googleapis.com/maps/api/geocode/json?latlng=51.5069117,-0.194801&key=YOUR_API_KEY

So, the web request you send consists of several parts — the API url followed by outputFormat (json or xml) and the list of parameters. outputFormat is separated from parameters by a question mark (?) and parameters itself are separated from each other by an ampersand (&).

The requests’ required parameters include:

  • for geocoding: address — search query in a form of address or place’s name andkey — API key;
  • for reverse geocoding: latlng — latitude and longitude of the place you search and key — API key.

We will not use any optional parameters in our queries.

You can read more about how to construct the API requests here and here.

It’s worth to mention that if you are building your own application, which needs real-time access to the Google Maps services, you can check Google client-side (JavaScript) or server-side (Java, Python, Go, Node.js) API.

Places API

In case you do not want to limit yourself to the place’s address and coordinates only, you can use the Places API. For example, to find a phone number and a web address of the place we need to use the Place Search to get the Place ID and use it later to retrieve this information from the Place Details.

While doing API call, make sure to provide the list of fields you want to extract. Otherwise, Google will send all of them and charge you accordingly. In our case it doesn’t matter as we will not exceed the charge-free limits but if you plan to use API for a large volume of requests you might be charged for that.

For the Place Search/Place Details API call you also need to provide the outputFormat (json or xml) followed by the list of parameters.

For the Place Search the required parameters include: input — a name, an address or a phone number (coordinates will not work); inputtypetextquery or phonenumber; key — API key.

For the Place Details the required parameters are: place_id — can be found by using the Place Search; key — API key.

For both the Place Search and Place Details we will be using optional parameter fields — a comma-separated list of additional information we want Google to return. You can read more about possible options on the corresponding help pages provided earlier. But in our case we only need fields place_id from the Place Search and formatted_phone_number plus website from the Place Details. Please remember to read information about the billing!

The format of API calls is given below.

# PLACE SEARCH# format
https://maps.googleapis.com/maps/api/place/findplacefromtext/outputFormat?parameters
# example
https://maps.googleapis.com/maps/api/place/findplacefromtext/json?input=The+Churchill+Arms,+Notting+Hill&inputtype=textquery&fields=photos,formatted_address,name,place_id&key=YOUR_API_KEY
# PLACE DETAILS# format
https://maps.googleapis.com/maps/api/place/details/outputFormat?parameters
# example
https://maps.googleapis.com/maps/api/place/details/json?place_id=ChIJGTDVMfoPdkgROs9QO9Kgmjc&fields=formatted_phone_number,website&key=YOUR_API_KEY

And again, if you consider building an actual app, it is worth to have a look at Java/Python/Go/Node.js clients for server-side applications or Places SDK for Android, Places SDK for iOS and the Places Library, Maps JavaScript API for the client-side ones.

Geocoding Using Google Maps API

Having said all of that, let’s now write the code itself.

Our code consists of seven functions:

  • a main function;
  • three functions for generating API calls;
  • three functions to extract data from JSON output.

Our main function takes three arguments:

  • search_query — search request (address or place) ;
  • fields — information to extract (coordinates, address, contacts or all);
  • key — API key for Google Maps.

The first and the last of them are required, while the second one is optional with default coordinates.

These arguments can be of below types:

  • search_query — string, character vector, list of characters, one-dimensional matrix or data frame with string data;
  • fields — string, character vector, list of characters;
  • key — string.

Depending on the value of fields the function returns a data frame with:

  • coordinates — latitude and longitude;
  • address — full address and city;
  • contacts — phone number and website;
  • all — all of the above.

Let’s now have a look at each of the function’s components in detail.

Generating API Call

The API call function is pretty straightforward once you get acquainted with the information I provided earlier.

It works in three steps:

1. Transforms the search query into a list.
2. Percent-encodes the search query.
3. Constructs the API call string.

The first step is needed because it’s always easier to deal with a common data structure (a list in our case). For percent-encoding we use URLencode function from the RCurl package. If you don’t know what it is, visit this page with detailed explanation.

Google Maps API call functions

Extracting Data From JSON

Google can return data in two formats — JSON and XML. In our examples we use JSON output. This output needs to be converted into R objects, so we can easily manipulate the data it contains. Once it’s done, our task comes down to picking up from the formatted list only the elements we need.

Raw JSON output and its formatted version look like this.

Raw JSON output from Google Maps API
Formatted JSON output from Google Maps API

So, how does our function work? Firstly, fromJSON function from the jsonlite package is used to transform JSON output into R list. After that our function checks whether the API call was successful (status = "OK"), and if yes, it extracts from the list only the elements we need to construct the final data frame. It’s a bit tricky to retrieve a city name, as firstly we need to find out under what sequence number it is stored inside the address_components. For contacts it’s also important to replace all NULL, which appear if Google has no information about the phone number or the website, with NA, so we get no error while generating the final data frame.

Functions to transform JSON output into R object

Main Function

We have already provided the description of our main function. Let’s now just explain how it works.

Firstly, the function gets coordinates and address from the Google Maps. It checks if user actually wants this information (i.e. coordinates and/or address are present in the fields argument) and if yes, it calls url_google_geocoding function to construct the API call and getURL function from the RCurl package to actually make it.

After we receive response from Google, we need to transform it from JSON format into R list by using get_geodata_from_json_google function. And once it’s done, the result is stored in the geodata_df data frame.

Later, the very same procedure is repeated for contacts (i.e. phone number and website). And that’s all.

Here is the code.

Main function for geocoding using Google Maps API

Now, we can finally call our function and check results.

# replace "api_key" with your API key
pubs_google <- geocode_google(pubs, "all", api_key)
# check results
pubs_google
Results from running geocoding function — Google Maps API [Columns 1–5]
Results from running geocoding function — Google Maps API [Columns 6–7]

Reverse Geocoding Using Google Maps API

Below is pretty much the same function but for reverse geocoding. This time it returns only the address based on the place’s coordinates. I do not give here any detailed explanations as by this time you are well-equipped to understand the code on your own.

Apart from the key argument, the main function needs coordinates as an input, which can be either:

  • a vector with latitude and longitude (a single request);
  • a list of latitude/longitude vectors;
  • a matrix with two columns — latitude and longitude;
  • a data frame with two columns — latitude and longitude.

The function accepts as coordinates both numeric and string values.

Please note that Google might return a couple of results per each our call but we are using only the first one — results[[1]] — which corresponds to the best match in Google’s opinion.

Also be careful with hard-coding references to the elements you want to extract from the R list. For example, in our case the 5th element $address_components[[5]]$long_name might refer either to a city — London ($address_components$types = "postal_town"), level 2 administrative area — Greater London ($address_components$types = "administrative_area_level_2") or level 1 administrative area — England ($address_components$types = "administrative_area_level_1"). So, in this case we have to loop through the R list to find the types of information we need and extract the corresponding long_name.

Reverse geocoding function (Google Maps API)

Below are the results of running this function on the sample of London pubs whose coordinates we got earlier from the same API.

# extract coordinates from pubs_google
pubs_google_crd <- pubs_google[ , c("lat", "lng")]
# replace "api_key" with your API key
pubs_rev_google <- rev_geocode_google(pubs_google_crd, api_key)
# check results
pubs_rev_google <- cbind(pubs_df, pubs_rev_google)
pubs_rev_google
Results from running reverse geocoding function — Google Maps API

Nominatim API

Now let’s turn our attention to the OSM’s Nominatim API.

The Nominatim search API allows you to look for a specific location based on its description or address. It supports both structured and free-text requests. The search query may also contain special phrases, which correspond to the specific OpenStreetMap tags. In our case this special phrase is a “pub”.

The reverse geocoding API generates an address from a place’s latitude and longitude.

The formats of API calls are presented below.

# geocoding format
https://nominatim.openstreetmap.org/search/<query>?<params>
# geocoding example
https://nominatim.openstreetmap.org/search/The%20Churchill%20Arms,%20Notting%20Hill?format=json&polygon=1&addressdetails=1
# reverse geocoding format
https://nominatim.openstreetmap.org/reverse?<query>
# reverse geocoding example
https://nominatim.openstreetmap.org/reverse?format=json&lat=51.5068722&lon=-0.1948221&zoom=18&addressdetails=1

Some parameters are common for both geocoding and reverse geocoding calls:

  • format = [html | xml | json | jsonv2 | geojson | geocodejson] — output format;
  • addressdetails = [0|1] — include breakdown of address into elements;
  • extratags = [0|1] — additional information (wiki page, opening hours etc.);
  • accept-language — in what language to display the search results (English = en);
  • email — unless you provide an email address, which allows to track your activity, you will not be able to use the API in your app (error message would appear).

Some parameters are peculiar for each API.

Search:

  • query — free text or address;
  • countrycodes — limit the search by ISO 3166-1 alpha-2 country codes (the UK = gb);
  • limit — limit the number of returned results.

Reverse:

  • query = lat, lon — in WGS 84 format;
  • namedetails = [0|1] — include a list of alternative names in the results;
  • zoom = [0–18] — level of detail required for the address (default is 18, i.e. a specific building).

The query, format and email are required parameters, while the rest are optional. We will not be using namedetails parameter in our function and will not change the default value of zoom parameter — I provided them just for your reference.

One important aspect here is the usage of tags, which point out to a specific piece of information provided by OpenStreeMap mappers. Some of these tags have duplicates (like email and phone and website vs similar tags in contact namespace), so different people might label the same sort of information with different tags and you need to account for that in your app.

There are also a few requirements, which you must respect to be able to use Nominatim service:

  • restriction on the number of requests sent by the same website/app — one request per second per app;
  • bulk geocoding of large amounts of data is discouraged but smaller one-time tasks (our case) are allowed;
  • search results must be cached, so not to send the same request more than once.

Geocoding Using Nominatim API

The function below replicates the one we have built for Google Maps API, so we will not be describing it in detail.

The only significant difference is that we added two additional optional arguments: country, which corresponds to the countrycodes parameter of API call and is used to restrict your search to some counties only (by default it is not used) and language that corresponds to the accept-language parameter and allows you to choose the language in which to display results (default is English). Both arguments need to be provided in the format of a string: country as a comma-delimited list of codes (e.g. “gb,dr,fr”) and language as a single value (e.g. “es”).

Geocoding function (Nominatim API)

Let’s see the results from running this function.

# replace "email" with your email address
pubs_nominatim <- geocode_nominatim(pubs_m, country = "gb", fields = "all", email = email)
# let's now see the results
pubs_nominatim[, c(1:4)]
pubs_nominatim[, c(1, 5:10)]
pubs_nominatim[, c(1, 11:13)]
pubs_nominatim[, c(1, 14:17)]
Results from running geocoding function — Nominatim API [Columns 1–4]
Results from running geocoding function — Nominatim API [Columns 5–10]
Results from running geocoding function — Nominatim API [Columns 11–13]
Results from running geocoding function — Nominatim API [Columns 14–17]

Reverse Geocoding Using Nominatim API

Similarly, the reverse geocoding function below to a large extent resembles the one we have built for the Google Maps API.

Reverse geocoding function (Nominatim API)

Here are the results from running this function on the sample of twelve London pubs.

# extract coordinates from geocoding results
pubs_nominatim_crd <- pubs_nominatim[, c("lat", "lng")]
# replace "email" with your email address
pubs_rev_nominatim <- rev_geocode_nominatim(pubs_nominatim_crd, email = email)
pubs_rev_nominatim <- cbind(pubs_m_df, pubs_rev_nominatim)
# let's now see the results
pubs_rev_nominatim[, 1:4]
pubs_rev_nominatim[, c(1, 5:11)]
Results from running geocoding reverse function — Nominatim API [Columns 1–4]
Results from running geocoding reverse function — Nominatim API [Columns 5–11]

Building a Map With Leaflet Library

The common truth is that a genuine interest in the subject can only be drawn when training material is complemented with examples of practical application. I promised you that based on information we got from API we would build an interactive map and I intend to deliver on that promise.

One of the ways to easily build a map is using JavaScript Leaflet library. Leaflet is described on its website as: “[…] the leading open-source JavaScript library for mobile-friendly interactive maps.” Many big tech companies, some media and even government bodies are using it: GitHub, Facebook, Pinterest, Financial Times, The Washington Post, Data.gov, European Commission are among the few. In our case, we will rely on the leaftlet package from the RStudio, which makes it easy to integrate and control Leaflet maps in R. For the full documentation check the package’s description on CRAN.

I will not describe here all the features this great tool provides because it’s the topic for another full article. Rather let’s concentrate on the most essential ones.

So, the process of creating a map in Leaflet involves three basic steps:

  1. Create a map widget.
  2. Add layers to your map.
  3. Display the map.

Widget is essentially a backbone or a container for your map.

Layers allow you to add to the map such elements as:

  • tiles — essentially the “skin” of your map, which defines its appearance and a level of detail. More about tiled maps;
  • markers — can be used to show a particular location on a map;
  • pop-ups and labels — can be used to add labels to your map. For example, to show an address or a contact information associated with some location;
  • polygons — a specific region or area. For example, a district within a state;
  • legends etc.

For our map we will be using a tile from OpenStreetMap (the default one for Leaflet) and plot the pubs’ location based on the coordinates we extracted from Nominatim. Additionally, we will add to the markers pop-ups with information about the pub’s name, address and contact details. As Nominatim did not return full details about each pub, I searched this information on my own. We are not using any polygons or legends in our visualization, I added links just for your reference.

So, before we move on let’s do some data preparation.

Data preparation for building a map

Now we can proceed with building the map itself.

Firstly, let’s prepare the text to be displayed in the pop-up messages: pub’s name, address and phone number. Website will not be showed separately but added as a hyperlink to the pub’s name. We will use some html to render our text in the format we want. Here is one hint. Pop-up messages are displayed only when you click on the objects they are attached to. If you want to display some information when cursor is hovered over the marker, you need to use labels. However, unlike pop-ups, labels do not automatically recognize HTML syntax — you would need to use HTML function from the htmltools package to transform your message first. Once it’s done we can “draw” our map.

Leaflet functions are quite self-explanatory. The only thing that might not be familiar to you is the pipe operator %>%, which was brought in by the tidyverse packages collection. Basically, it allows you to easily chain function calls by passing an output of one function as an argument of another. More information on that here.

# text to be diplayed on pop-ups
website <- paste0("<a href='", pubs_map$website, "'>", pubs_map$pub_name, "</a>")
center <- "<div style='text-align:center'>"
name <- paste0(center, "<b>", website, "</b>", "</div>")
address <- paste0(center, pubs_map$address_display, "</div>")
phone <- paste0(center, pubs_map$phone, "</div>")
# building the map
pubs_map %>%
leaflet() %>%
addTiles() %>%
addMarkers(~lng, ~lat, popup = paste0(name, address, phone))

Let’s finally see the result.

Map built with leaflet package

Conclusion

In this tutorial we covered different methods of retrieving geocoding data using Google Maps and Nominatim APIs and showed how this data can be used to plot particular locations on a map using JavaScript Leaflet library. I hope this guide will serve you as a starting point for exploring all the different kinds of APIs and mapping tools.

--

--

Working in Process Automation, interested in Programming, Data Science and Machine Learning