Addresses and Barangays: Geotagging with Google Maps API + PhilGIS

Linking Google Maps and shapefiles to map addresses to their barangays (the Filipino version of a county), to avoid the need for painful manual classification

Lj Flores
Towards Data Science

--

Dengue fever is a real issue in the Philippines. Headaches, joint pain, full body rashes, and death–this disease has witnessed 146,000 cases in the Philippines this year, and claimed the lives of 622 people. Epidemics have been declared in 7 out of the 17 regions, and there remains much controversy around the use of the only licensed dengue vaccine.

As part of the efforts to prevent the spread of dengue, the Mosquito Realtime Census Project led by Dr. Wilson Chua aims to use satellite data to identify areas of stagnant water that could serve as potential dengue hotspots. The end goal is to place Ovitraps in these areas as well as sensors that will detect the presence of mosquitoes and alert officials regarding possible outbreaks.

I was tasked to map 6,100 addresses to their corresponding barangays in Quezon City. A barangay is the smallest unit of government in the Philippines, which encompasses anywhere from a 10 block grid to a 30 block grid (or even bigger). Quezon City (the largest city within Metro Manila) is composed of 142 barangays.

A bunch of barangays crammed like sardines in Manila
And Barangay Pasong Tamo casually sitting there just encompassing 3 villages, a park and a university

Initially, the task was to be done manually. To further complicate things, barangays have old names and new names, and there are a number of districts and area names that appear on Google Maps that appear to be barangays but actually aren’t. There are multiple streets with the same name, and barangays most often aren’t written on addresses.

Luckily, Google Maps API and barangay shapefiles from PhilGIS (bless you) had all the information necessary to simplify the process by a lot.

Methods (Quick and Stats-Free)

Using the Python module requests, I pulled the Google Maps information for each address, from which I pulled the latitude and longitude. Then for each address, I checked if its latitude and longitude lay inside the region of a barangay whose coordinates were taken from the PhilGIS shapefiles.

For barangays that could not be found using Google Maps and PhilGIS, I had to sort them manually. These addresses could not be found because they were too vague or had conflicting information. Maybe the street did not match the city written in the address, or the name of the building was located on a different street. Before doing an all-out brute-force search, I tagged each address using keywords that could identify important place identifiers, namely:

  • Barangay: BGY, BRGY
  • Village: VILLAGE, SUBD, HILLS, HOMES, HEIGHTS, HTS
  • Project: PROJ, PROJECT
  • Building: BLDG, CONDO, APT, RESIDENCE, RESIDENCES
  • Road: RD, ROAD, ST, STR, STREET, DRIVE, EXT, EXTENSION, EXTN, SCT, SCOUT
  • Avenue: AVE, AVENUE, ROAD

For Barangay, Project, and Scout (a group of roads in Manila), I took the word after them which would give the name of the place (e.g. Barangay Tatalon, Project 4, Scout Chuatoco). For the rest, I took the words before them (Damar Village, Banawe Street, Grass Residences, Quezon Avenue).

The keywords were selected in the order shown: If the barangay was written in the address, no need to further classify — since it’s already there! Villages are pretty specific, and so are projects (residential subdivisions) and buildings, since they have specific addresses. Roads and avenues get even less specific since they don’t always come with specific numbers, and barangay boundaries often meet on roads/avenues so it wouldn’t be very clear which address goes where. The keywords helped make reading the addresses faster — instead of reading the entire address, I’d just look up the specific name of the village or building, or straight up copy the barangay, which made the task faster.

Results

Before any of the automation, I tried doing it manually. I averaged around a minute for each address (it takes long because some addresses are abbreviated weird, match multiple locations on Google maps, have obscure details, etc.). Thus, that would take me 102 hours to complete.

On the other hand, running the entire code took about half an hour, and it classified 97% of all the addresses. It took me about 3 hours to classify the remaining 180 addresses.

The code is available to be used and can be extended to nearly all the other provinces in the whole Philippines! Details are up on the Jupyter notebook.

Appendix

Check out the code here

--

--

Yale ’22, Statistics & Data Science | A data blog about the Philippines! Find the code here: http://www.github.com/ljyflores