Geocoding and Reverse Geocoding using Python
Finding latitude and longitudes when the address is known, or finding the address if the latitudes and longitudes are known for dataframes using OpenCage’s geocoder & geopy.
Geocoding is the process of taking input text, such as an address or the name of a place, and returning a latitude/longitude location. To put it simply, Geocoding is converting physical address to latitude and longitude.
There are many geocoding API options available in python. Some of the popular ones are GeoPy, OpenCage geocoder, google geocoding. Geopy is one of the few API services which provides unlimited access for non-commercial use. For Google API and OpenCage geocoders, there is a limit of 2500 requests per/day. Using geopy, the latitudes and longitudes for some addresses in my dataset, showed different countries instead of the US. With OpenCage geocoder, surprisingly all the addresses were accurate so I used OpenCage encoder.
Working with OpenCage geocoder and pandas
To use OpenCage Geocoder in python, the python library should be installed first using pip install opencage
.More info about this library can be found here: OpenCageGeocode on Github
Once the library is installed, you will need an OpenCage geocoder account to generate an API key. A free account can be created using opencagedata.com. Once you signup for an account you can find API keys in Dashboard as shown in the below image.
Example
from opencage.geocoder import OpenCageGeocode
key = "Enter_your_Api_Key"
geocoder = OpenCageGeocode(key)
address='1108 ROSS CLARK CIRCLE,DOTHAN,HOUSTON,AL'
result = geocoder.geocode(address, no_annotations="1")
result[0]['geometry']
Output: {‘lat’: 31.2158271, ‘lng’: -85.3634326}
We got the latitude and longitude for one hospital named Southeast Alabama Medical Center. In most cases, we will have multiple addresses that need to be plotted in maps as we do now. In this case, using pandas to create a data frame will be a lot easier. The dataset I used contains the list of all Hospitals in the US along with the COVID-19 total cases for the counties where hospitals are located. The dataset can be downloaded from here.
import pandas as pd
data=pd.read_csv(‘Final.csv’)
data.head(10)
We have a data frame that contains the list of Facility Name of all Hospitals in the US and their addresses, so we just need to find location coordinates.
First, we should convert the Address column to the list. So, it will be easier to loop all the addresses.
Next, enter your API key from OpenCage geocoder website and create empty lists to store latitudes and longitudes. After creating empty list, create a loop which gives latitude’s and longitude’s for all addresses
addresses = data["Full_Address"].values.tolist()
key = "Enter-your-key-here"
geocoder = OpenCageGeocode(key)
latitudes = []
longitudes = []
for address in addresses:
result = geocoder.geocode(address, no_annotations="1")
if result and len(result):
longitude = result[0]["geometry"]["lng"]
latitude = result[0]["geometry"]["lat"]
else:
longitude = "N/A"
latitude = "N/A"
latitudes.append(latitude)
longitudes.append(longitude)
We have latitudes and longitudes for the list of all the addresses in the data frame. we can add this latitudes and longitudes to our existing data frame using this simple pandas command.
data["latitudes"] = latitudes
data["longitudes"] = longitudes
data.head(10)
Finally, we got the latitude and longitudes for all the hospital addresses. To better understand this location coordinates let’s plot all this location coordinates as points in map using folium maps.
folium_map= folium.Map(location=[33.798259,-84.327062],zoom_start=4.4,tiles=’CartoDB dark_matter’)FastMarkerCluster(data[[‘latitudes’, ‘longitudes’]].values.tolist()).add_to(folium_map)folium.LayerControl().add_to(folium_map) for row in final.iterrows():
row=row[1]
folium.CircleMarker(location=(row["latitudes"],
row["longitudes"]),
radius= 10,
color="#007849",
popup=row[‘Facility_Name’],
fill=False).add_to(folium_map)
folium_map
Now, we can see the location points of all the hospitals in the USA. I used CircleMarker cluster to better help understand the regions with most number of hospitals.
Reverse Geocoding
Reverse geocoding, on the other hand, converts geographic coordinates to a description of a location, usually the name of a place or an addressable location. Geocoding relies on a computer representation of address points, the street/road network, together with postal and administrative boundaries.
For reverse geocoding, I found the output format of Geopy API more detailed when compared to OpenCage Geocoder. And also, there is no limit for Geopy API so we will Geopy instead of OpenCage Geocoder.
OpenCage Reverse Geocoder Example
result = geocoder.reverse_geocode(31.2158271,-85.3634326)
result[0][‘formatted’]
Output : Southeast Health Medical Center, Alma Street, Dothan, AL 36302, United States of America
Geopy Reverse Geocoder Example
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="test_app")
location = geolocator.reverse("31.2158271,-85.3634326")
location.raw[‘display_name’]
Output: ‘Southeast Health Campus, 1108, Ross Clark Circle, Morris Heights, Dothan, Houston County, Alabama, 36301, United States of America
Working with Geopy Geocoder and pandas
For reverse geocoding, as above first, we will convert latitude and longitude to list and zip them together.
lats=data['latitudes'].to_list()
lons=data['longitudes'].to_list()
# Creating a zip with latitudes and longitudes
coords=list(zip(lats,lons))
Since, we already created list, just like above we will create a loop to find address for each location coordinate and append them together.
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="test_app")
full_address=[]
for i in range(len(coords)):
location = geolocator.reverse(coords[i])
address=location.raw['address']['country']
full_address.append(address)
#Creating dataframe with all the addresses
addres=pd.DataFrame(data=full_address , columns=['Address'])
addres
Finally, we have the address list of all hospitals in the US.
For interested readers, I put the code in my GitHub Repo here. If you have any doubts, contact me using linkedin.