Geocoding and Reverse Geocoding using Python

Finding latitude and longitudes when the address is known, or finding the address if the latitudes and longitudes are known for dataframes using OpenCage’s geocoder & geopy.

Jaswanth Badvelu
Towards Data Science

--

Photo by Capturing the human heart. on Unsplash

Geocoding is the process of taking input text, such as an address or the name of a place, and returning a latitude/longitude location. To put it simply, Geocoding is converting physical address to latitude and longitude.

There are many geocoding API options available in python. Some of the popular ones are GeoPy, OpenCage geocoder, google geocoding. Geopy is one of the few API services which provides unlimited access for non-commercial use. For Google API and OpenCage geocoders, there is a limit of 2500 requests per/day. Using geopy, the latitudes and longitudes for some addresses in my dataset, showed different countries instead of the US. With OpenCage geocoder, surprisingly all the addresses were accurate so I used OpenCage encoder.

Working with OpenCage geocoder and pandas

To use OpenCage Geocoder in python, the python library should be installed first using pip install opencage .More info about this library can be found here: OpenCageGeocode on Github

Once the library is installed, you will need an OpenCage geocoder account to generate an API key. A free account can be created using opencagedata.com. Once you signup for an account you can find API keys in Dashboard as shown in the below image.

Example

from opencage.geocoder import OpenCageGeocode
key = "Enter_your_Api_Key"
geocoder = OpenCageGeocode(key)
address='1108 ROSS CLARK CIRCLE,DOTHAN,HOUSTON,AL'
result = geocoder.geocode(address, no_annotations="1")
result[0]['geometry']

Output: {‘lat’: 31.2158271, ‘lng’: -85.3634326}

We got the latitude and longitude for one hospital named Southeast Alabama Medical Center. In most cases, we will have multiple addresses that need to be plotted in maps as we do now. In this case, using pandas to create a data frame will be a lot easier. The dataset I used contains the list of all Hospitals in the US along with the COVID-19 total cases for the counties where hospitals are located. The dataset can be downloaded from here.

import pandas as pd
data=pd.read_csv(‘Final.csv’)
data.head(10)
Hospital location data frame

We have a data frame that contains the list of Facility Name of all Hospitals in the US and their addresses, so we just need to find location coordinates.

First, we should convert the Address column to the list. So, it will be easier to loop all the addresses.

Next, enter your API key from OpenCage geocoder website and create empty lists to store latitudes and longitudes. After creating empty list, create a loop which gives latitude’s and longitude’s for all addresses

addresses = data["Full_Address"].values.tolist()
key = "Enter-your-key-here"
geocoder = OpenCageGeocode(key)
latitudes = []
longitudes = []
for address in addresses:
result = geocoder.geocode(address, no_annotations="1")

if result and len(result):
longitude = result[0]["geometry"]["lng"]
latitude = result[0]["geometry"]["lat"]
else:
longitude = "N/A"
latitude = "N/A"

latitudes.append(latitude)
longitudes.append(longitude)

We have latitudes and longitudes for the list of all the addresses in the data frame. we can add this latitudes and longitudes to our existing data frame using this simple pandas command.

data["latitudes"] = latitudes
data["longitudes"] = longitudes
data.head(10)

Finally, we got the latitude and longitudes for all the hospital addresses. To better understand this location coordinates let’s plot all this location coordinates as points in map using folium maps.

folium_map= folium.Map(location=[33.798259,-84.327062],zoom_start=4.4,tiles=’CartoDB dark_matter’)FastMarkerCluster(data[[‘latitudes’, ‘longitudes’]].values.tolist()).add_to(folium_map)folium.LayerControl().add_to(folium_map) for row in final.iterrows():
row=row[1]
folium.CircleMarker(location=(row["latitudes"],
row["longitudes"]),
radius= 10,
color="#007849",
popup=row[‘Facility_Name’],
fill=False).add_to(folium_map)

folium_map

Now, we can see the location points of all the hospitals in the USA. I used CircleMarker cluster to better help understand the regions with most number of hospitals.

A snapshot of the map visualization (clustered locations) created using Folium

Reverse Geocoding

Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street/road network, together with postal and administrative boundaries.

For reverse geocoding, I found the output format of Geopy API more detailed when compared to OpenCage Geocoder. And also, there is no limit for Geopy API so we will Geopy instead of OpenCage Geocoder.

OpenCage Reverse Geocoder Example

result = geocoder.reverse_geocode(31.2158271,-85.3634326)  
result[0][‘formatted’]

Output : Southeast Health Medical Center, Alma Street, Dothan, AL 36302, United States of America

Geopy Reverse Geocoder Example

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="test_app")
location = geolocator.reverse("31.2158271,-85.3634326")
location.raw[‘display_name’]

Output: ‘Southeast Health Campus, 1108, Ross Clark Circle, Morris Heights, Dothan, Houston County, Alabama, 36301, United States of America

Working with Geopy Geocoder and pandas

For reverse geocoding, as above first, we will convert latitude and longitude to list and zip them together.

lats=data['latitudes'].to_list()
lons=data['longitudes'].to_list()
# Creating a zip with latitudes and longitudes
coords=list(zip(lats,lons))

Since, we already created list, just like above we will create a loop to find address for each location coordinate and append them together.

from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="test_app")
full_address=[]
for i in range(len(coords)):
location = geolocator.reverse(coords[i])
address=location.raw['address']['country']
full_address.append(address)
#Creating dataframe with all the addresses
addres=pd.DataFrame(data=full_address , columns=['Address'])
addres

Finally, we have the address list of all hospitals in the US.

For interested readers, I put the code in my GitHub Repo here. If you have any doubts, contact me using linkedin.

--

--

I write articles about easy ways to implement Data Science and Machine learning techniques in real world.