Plus Codes (Open Location Code) and Scripting in Google BigQuery

Brian Suk
Towards Data Science
11 min readOct 10, 2019

--

Don’t We Already Have Street Addresses?

A couple weeks ago I attended the Google’s Geo For Good Summit 2019. It was an amazing experience, and I can’t say enough good things about the people I’ve met, the work everyone’s doing, and the conference itself.

When I go to conferences, I pick at least one session at random where the topic is unfamiliar to me. We all spend so much time focusing on bolstering our areas of expertise and sometimes miss out on so many other great topics being covered. This is how I work on branching out.

I chose “Digital Addressing with Plus Codes,” and it ended up being the session that stuck with me the most.

So what is a Plus Code? It’s the name of the code generated by the Open Location Code geocoding system. This code gives you a four digit area code (only required if you don’t have the nearest town within 25 kilometers), and a six digit local code (quick note, even though they are alphanumeric, in Plus Codes the terminology calls each character a “digit”). There is a “+” symbol right before the last two digits, which doesn’t actually convey any data, but just makes it easier to remember (much like dashes and spaces in a phone number). The ten digit format looks like “8FVC9G8F+6W” (which is the location for Google’s office in Zurich). If you include the town’s name, you can also use the last six digits, so this would be “9G8F+6W Zurich, Switzerland.” If you enter either format in Google Maps, it will take you there.

Google Maps has included Plus Codes in search results for years. If you’ve ever seen a random string of characters in Maps and wondered what that was, well, this is it.

It’s been hiding there the whole time!

If you want a comprehensive look at Plus Codes (its motivations, pros, cons, etc.), I highly recommend reading the Open Location Code definition online.

In a nutshell, it’s essentially converting the entire Earth into a grid of 20 degree by 20 degree cells. The first pair of digits locates a cell in that grid. From there, each subsequent pair denotes a location in a grid within that cell, until you reach the last pair. Think of it as different levels of zoom on a map.

Needs more zoom!

The really cool thing is that a ten digit code, (or a city name with a six digit code) gives enough precision to identify a square that is 14x14 meters. That is pretty good considering how little information one has to remember to use it.

I wanted to try this out for the Google office I am based out of in New York City. The address is 111 8th Avenue, New York, NY 10011. The Plus Code for that coordinate is PXRX+86 New York.

PXRX+86 New York points to 16th St. and 8th Ave… I think.

The full code is 87G7PXRX+86. We mentioned the first two digits gives us the highest level of the grid. What does that look like?

Here’s 87. That’s a big grid.

If we look inside that grid, we can see that the additional G7 adds to it.

87G7. Enhance.

The more pairs of digits you add, the more detail you can get.

87G7PX. Enhance

Eventually, you can drill down to our happy little corner of the building. You can see this in the map below in the red box.

87G7PXRX+86, in red. Enhance.

This mapping tool is available at the Plus Codes site if you would like to join in on the Plus Code fun!

You can see how precise these codes can become. It’s a lot more accurate than regular street addresses, and makes more sense. A street address doesn’t even take you where you want to go, most of the time. For example, if you’re trying to reach the Johns Hopkins University, the street address is 3400 N Charles Street, Baltimore, MD 21218. Never mind the fact that 3400 N Charles Street isn’t even actually located on N Charles Street, but the rest of the campus shares that same street address, regardless of which building it is.

Plus Codes can provide enough precision to direct someone to an exact part of a building. Let’s look back at the earlier example of Google NYC.

So… where am I supposed to go?

There are a couple issues with this building’s address. The first being it doesn’t tell you exactly where to go. This building has three visitor’s entrances. One is on the northwest side of the building at 16th St. and 9th Avenue, one on the northeast side at 16th St. and 8th Avenue, and one on the southeast at 15th St. and 8th Avenue.

But before you play “what’s behind door number three” with the visitor entrances, you need to make sure you have the right address. It turns out this building has two valid street addresses; 111 8th Avenue, and 76 9th Avenue .

With Plus Codes, we can get granular. The NW entrance is PXRW+R7, the NE entrance is PXRX+C8, and the SE entrance is PXRX+75.

You might think this is all trivial. You may easily just give a street address to a friend and say “go to the northeast entrance” and it’s pretty easy to figure out. In a lot of areas around the world, though, this same issue of address precision has very real consequences.

In most industrialized countries, physical — and, of course, electronic addresses are part of everyday life. Just like roads, running water and health services, physical addresses are often taken for granted. Yet, in developing countries, physical addresses frequently exist only in major city centres. In such countries, many streets have no names and properties are not numbered. It is therefore difficult or impossible for public services and businesses to reach their target customers… Physical addresses are taken for granted in most industrialized countries. But for billions of people, addresses effectively do not exist. (Addressing the World — Universal Postal Union)

At the Plus Codes session at the summit, we heard from members of the Navajo Nation who shared the challenges they faced living in the Utah/New Mexico/Arizona region, and how Plus Codes helped address those challenges.

In areas such as this, you get a lot of locations far off from the roads the street address is on, you get addresses that are incorrect by miles, and generally have lots of problems locating things.

The officially listed address, and where the building actually is. When it’s this far off, it creates a lot of issues.

Here’s another example of someone recounting their experience having a package shipped to this region.

Around 2005, I wanted to purchase a new laptop online and tried having FedEx deliver the packages to my hogan in Dilkon, AZ since I lived approximately 0.5 miles from the main highway (Navajo Route 15). The address I submitted to FedEx looked like the following:

Name
1 mile North of Dilkon Chapter Office
Blue hogan with gray roof
Winslow, AZ 86047

In the further instructions part of the order, I put the following: Go north from the Chapter Office past the police station and a large church building. Past the church building, you will cross a large wash. The road curves a bit but keep heading north. You will begin to see a patch of large trees. Head toward those trees. They should be the only patch of trees in the area. My hogan will be near those trees, blue hogan with a grey roof. I also left my cell phone number in case there were issues.

I tracked the delivery process through the FedEx site and made sure I was available by phone on the day of delivery. I was secretly hoping that my instructions were enough for the delivery guy to find my hogan and deliver the packet. However, around 4:00 PM, I received a call from the FedEx delivery guy. He was lost. He had tried to follow the directions but ended up at a house where there was no one home. After asking him for a description of the area, I concluded that he had not gone far enough. He was actually only halfway to my house near the large church building and had not gotten to the wash yet. So while he was on the phone and driving, I led him to my home, giving turn by turn directions as he described what he saw.

Eventually he got to my house, but the experience was less than ideal. (The driver didn’t know what a hogan was.)

Hearing things like this resonated with me from my experiences growing up in South Korea. Until the mid-2000’s, we didn’t have building addresses in the traditional sense. There were defined tracts and neighborhoods, but buildings didn’t have street numbers. When giving directions, you just kind of had to know where things were. Telling someone something along the lines of, “take the sixth exit from the subway stop, turn left into the alley after the Family Mart, keep walking and it’s on the second floor above the LG cell phone store” was commonplace. Street addresses weren’t typically used, and still aren’t today.

Seoul is fairly developed so the impact to me was maybe being a little bit late to meet my friends. In other areas though, it’s a serious issue.

Finding addresses in rural Utah and Arizona.
Finding addresses in Kolkata, India.

Not having a clear location can impact not just the delivery of packages, but the efficacy of first responders, getting a birth certificate, being able to vote, and so much more. It’s happened in the Navajo Nation, in Kolkata, and undoubtedly countless more regions across the world. Plus Codes can help, and the opportunity for impact is meaningful.

BigQuery Scripts and Procedures

There are a ton of reasons why I love BigQuery. Having used a number of different legacy data technologies, BigQuery is the easiest to maintain, and is the best at immediately letting me do cool things with data instead of sitting here poking around infrastructure, wondering why whatever-this-is won’t work with whatever-that-is.

Having said that, nothing and no one is perfect. BigQuery lacked a couple things I was looking for. Scripting and stored procedures used to be a good example of that.

And then not too long ago, those features made a public launch.

And that made me very, very happy.

Plus Codes and BigQuery Scripting.

Open Location Code has a library of open source code in different languages that generate and read Plus Codes. When BigQuery scripting launched, I figured creating something that encodes Plus Codes could be a neat way to introduce this new feature. It allows for easily encoding coordinates in bulk, with BigQuery bringing along a number of benefits in processing at scale.

Let’s look at the encoding logic. According to the specification, it’s a pretty straightforward encoding routine:

Summary: Add 90 to latitude and 180 to longitude to force them into positive ranges. Encode both latitude and longitude into base 20, using the symbols above, for five digits each i.e. to a place value of 0.000125. Starting with latitude, interleave the digits.

The following provides an algorithm to encode the values from least significant digit to most significant digit:

1 — Add 90 to the latitude and add 180 to the longitude, multiply both by 8000 and take the integer parts as latitude and longitude respectively
2 — Prefix the existing code with the symbol that has the integer part of longitude modulus 20
3- — Prefix the existing code with the symbol that has the integer part of latitude modulus 20
4 — Divide both longitude and latitude by 20
5 — Repeat from step 2 four more times.

Seems like a simple loop, which we can do in a script.

Let’s use the longitude and latitude from different Google offices to see how it all works. The office locations are coming from a number of countries across six continents, so we can spot test both hemispheres. These locations will be fed in as an input table in the script.

# Create array of encoding values.
DECLARE OLC_DIGITS_ ARRAY<STRING> DEFAULT ['2','3','4','5','6',
'7','8','9','C','F','G','H','J','M','P','Q','R','V','W','X'];
# Control variable.
DECLARE loopControl INT64 DEFAULT 0;
# Create example source data.
CREATE OR REPLACE TEMP TABLE
pc_data (
latitude float64,
longitude float64,
expectedCode STRING);
INSERT
pc_data (
latitude,
longitude,
expectedCode)
VALUES
(40.740737, -74.002047,'87G7PXRX+75'),
(37.500172, 127.036404,'8Q99G22P+3H'),
(35.660797, 139.729442,'8Q7XMP6H+8Q'),
(-23.586436, -46.681907,'588MC879+C6'),
(12.993711, 77.660769,'7J4VXMV6+F8'),
(-26.073533, 28.032052,'5G5CW2GJ+HR'),
(-33.864971, 151.195854,'4RRH45PW+28'),
(59.333260, 18.054152,'9FFW83M3+8M');
# STEP 1 - Normalize coordinates.
CREATE temp table olc_table AS (
SELECT
*,
FLOOR((latitude + 90) * 8000) AS workingLatitude,
FLOOR((longitude + 180) * 8000) AS workingLongitude,
'' AS plusCode
FROM
pc_data);
SELECT
*
FROM
olc_table;

# STEP 2 - Loop through the encoding loops.
SET
loopControl = 0;
WHILE
loopControl < 5 DO CREATE OR REPLACE TEMP TABLE olc_table AS(
SELECT
* EXCEPT (
workingLatitude,
workingLongitude,
plusCode),
workingLongitude / 20 AS workingLongitude,
workingLatitude / 20 AS workingLatitude,
CONCAT(
OLC_DIGITS_[
OFFSET(MOD(CAST(FLOOR(workingLatitude) AS INT64), 20))],
OLC_DIGITS_[
OFFSET(MOD(CAST(FLOOR(workingLongitude) AS INT64), 20))],
plusCode) AS plusCode
FROM
olc_table );
SET
loopControl = loopControl + 1;
SELECT
*
FROM
olc_table;
END WHILE;
# STEP 3 - Add the Plus!
CREATE OR REPLACE temp table olc_table AS(
SELECT
* EXCEPT (plusCode,
workingLongitude,
workingLatitude),
CONCAT( SUBSTR(plusCode, 0, 8), '+', SUBSTR(plusCode, -2)) AS plusCode
FROM
olc_table );
# Look at the results!
SELECT
*
FROM
olc_table;

When we run that, the bottom panel where your result data normally goes will look something like this.

A script’s output.

This is the new output information showing the steps that BigQuery took to run the script that. This is handy for debugging, as you can see the intermediate data going into each executed statement. If you click the “view results” button on the last step, you’ll find the script results.

Much success!

I’m hoping the code is easily readable (please leave a comment with questions), but just a couple notes on some of the things that we did here.

  • In the loop for step 2, we are creating and/or replacing a temporary table. This is a new feature released alongside scripting, and we’ll use it to perform operations on entire columns.
  • In the script, we are manually creating a temp table acting as our source table. If you’re looking to run this on another table with coordinate data, you can get rid of that specific section and have the creation of olc_table point to your source table.
  • If you want to write the output to another table, simply put a CREATE TABLE [dataset.table] AS around the final SELECT statement at the end, and it should be all set.

From here you can take this script and work it into a stored procedure, and incorporate this into other workflows as well!

So there you have it, a quick glimpse into BigQuery scripting and encoding coordinates to Plus Codes in bulk!

To learn more about Plus Codes, be sure to check out the slide content for the Plus Codes session at Geo for Good Summit 2019 that are publicly available. This was the session that the folks from the Navajo Nation presented at. Also check out the agenda page for all the other sessions where the slide content as well as any applicable training materials have been released to! There’s great stuff on Google Earth, and Earth Engine out there!

Also be sure to check out the work that Addressing the Unaddressed is doing to help make locations be more accessible to the world.

--

--

Avid 2020 bed-to-couch traveler, cloud tech, big data, random trivia, Xoogler. My employer isn’t responsible for what’s here. NYC. linkedin.com/in/briansuk