Photo of asphalt separated from grass with a white dividing line, by Will Francis on Unsplash.
Photo by Will Francis on Unsplash

How to Create a Geofence with Python

Mike Flanagan
Towards Data Science
5 min readAug 16, 2021

--

Geofencing is a useful technique with a wide range of applications whenever one is handling location data. It can be used to signal notifications or flag alerts based on proximity to an individual or landmark. In data science, it is often used in feature engineering and creating boundaries for visualizations.

A geofence is a virtual perimeter for a real-world geographic area. A geo-fence could be dynamically generated — as in a radius around a point location, or a geo-fence can be a predefined set of boundaries (such as school zones or neighborhood boundaries).
Wikipedia

While any of the applications are apparent, thinking of how geofences may be constructed may not always be immediately obvious. There are also many approaches to engineering geofences, with some occasions requiring precisely drawn boundaries, and others requiring variable radii from a moving user’s cell phone, that changes in certain contexts. In this blog, let’s see how we can create some simple geofences as to understand the principles behind them.

The example that I walk through below comes from my project exploring and modeling using a dataset containing property sale values in King County, WA. More information about the project and example dataset may be found at the following GitHub repository—regression analysis & modeling for property sale prices in King County, WA. Access to the source data is provided in the bibliography at the end of this article.

Classifying by Location

Imagine that we have a dataset which contains records for numerous locations, with the locations for each row item being represented by latitude (df[‘lat’]) and longitude (df[‘long’]).

We may classify certain locations as to falling within a certain “zone” or bounds. There are several techniques that we may employ, ranging from straightforward to complex. Let’s look at a handful of elementary examples to see how such techniques may be employed.

Simply Dividing a Location in Half

A Google Maps image of King County, Washington.
(Google Maps, [King County, WA])

Above is an image of King County, WA. From Google Maps’ rendition alone, a shallow interpretation may be that a majority of King County is less developed, with more urban and suburban area being grey, and the more rural being green.

The most rudimentary partition we could make of this land would be with a straight line dividing the rural from the more developed, simply interpreted below.

The same Google Maps image as above of King County, Washington, overlaid with a color mapping associated with the longitude where we split our properties in to two classifications: rural and non-rural.
Properties that fall within green bounds will be classified as ‘rural’, while those under red will be ‘non-rural’. (Google Maps, [King County, WA]). Overlay created by author with Adobe Photoshop.

We can do this in Python with NumPy and Pandas:

# Creating a dummy classifier for rural or notdf['rural'] = np.where(df.long > -121.961527, 
1,
0)

The above code creates a dummy variable representing “rural” King County. All properties that fall east of the divider (i.e.—any property with a longitude greater than -121.96) with be classified as rural, by assigning it a 1 in a newly created column identified as rural in our DataFrame df. Any property with a 0 in that column will be de facto not considered rural.

While this is imprecise, it captures most of the properties correctly, and is very simple to perform. What if we would like to add some other conditions that consider latitude?

Simple Bounding Box Using North-South / East-West Constraints

A Google Maps image of Seattle, which is located within King County, Washington.
(Google Maps, [Seattle, WA])

The non-rural area of King County includes the city of Seattle. For this exercise, let’s consider Seattle to be “city,” rural King County to be “rural,” and the non-rural area of King County excluding Seattle to be suburban. Were we to draw as precise of a box around the city of Seattle, we could do so as shown below.

Properties that fall within green bounds will be classified as ‘city’, while those under red will be ‘non-city’.(Google Maps, [Seattle, WA]). Overlay created by author with Adobe Photoshop.

Again, we can use an np.where() statement to create a new class for our data. We’ll call this one “within_seattle_city_limits” and only properties sold in Seattle will be assigned a 1 in this column.

# Creating a dummy classifier for if property is within Seattle city limits or notdf['within_seattle_city_limits'] = np.where(
(df.long < -122.251569) # establishes the EAST box border
& (df.long > -122.438230) # establishes the WEST box border
& (df.lat < 47.734178) # establishes the NORTH box border
& (df.lat > 47.495479), # establishes the SOUTH box border
1, # assigns 1 to all properties within the bounding box
0 # assigns 0 to all properties outside the bounding box
)

Above, we create the green box that encompasses Seattle with another np.where() statement. The coordinates in the statement in conjunction with the ampersand & set the corners of the box.

Note that no column is created at any point for “suburban,” “non-city,” or “non-rural.” Any property in our dataset that does not get classified as within Seattle nor classified as rural (i.e.— receives a 0 in both columns ‘within_seattle_city_limits’ and 'rural’) will be considered “suburban” by default.

Understanding that, realize that this again is a bit imprecise. We can see in the north-east corner that our “city” box includes a slice of what should be considered “suburban,” ditto with Mercer Island east of Seattle, and we even cut out a very small segment of what should be considered the “city” of Seattle in the south-east, classifying it as “suburban.”

While this may cause problems, we may individually check properties that would be mistakenly given an inappropriate classification and update them manually. We can also address this with layering additional boxes, but that will be covered in my next blog on this subject.

Still, I must emphasize the importance of catching potential issues as you create them. While I did scrub the dataset and ensure that every property was correctly labeled, what would happen were we to add exogenous data and apply the above classification? We would dirty our data and be nursing our headaches with icepacks.

And what if we would like to create a zone with complex boundaries? This may require a little trigonometry, exclusion conditionals and the incorporation of pipes (|) for “or” conditions, or the help of other Python libraries. Many tools and techniques may be used, but the same principle applies: creating rules for classification based on location contexts and conditions.

Bibliography

Dataset:
House Sales in King County, USA (2016),
Provided by harlfoxem on Kaggle via CC0: Public Domain License

Images:
Header photo by Will Francis on Unsplash

Google Maps. [King County, WA]. Map image. 2021,
King County — Google Maps. Accessed 16 August 2021.
Map data © 2021 Google.

Google Maps. [Seattle, WA]. Map image. 2021, Seattle — Google Maps. Accessed 16 August 2021. Map data © 2021 Google.

--

--