How to Use US ZIP Code Data in Modeling and Forecasting?

Samma Hejazi
Towards Data Science
5 min readJan 21, 2022

--

Introduction

Five digits ZIP code data in USA is one of the most popular model granularity in Supply Chain and Transportation models. We hear sometimes analysts and scientists complain about “missed ZIPs” or inconsistency of ZIP code boundaries. Many of the models suffers from missed ZIP codes and that cause bad accuracy in outputs. In this paper, we are trying to provide a guideline for how to use ZIP code data in planning and forecasting.

Map 1: Spatial representation of three types of data for a ZIP ID : Boundaries (purple), centroids (purple), and ZIP points (dark blue) of all of the ZIP IDs in Boise downtown, Idaho. “Image by author”

As you see in Map 1, there are multiple ZIP code IDs (points and polygons)in an area and not all of them have a same ID with the polygon. This may impact model outputs if we don’t distinguish between ZIP ID shown as pints vs polygon.You may think what is a ZIP code and what is ZIP code boundary? Let’s have a clear definition of a ZIP code.

ZIP code definition:

ZIP codes were first introduced in the 1960s and they were developed to help the Postal Service improve nationwide mail distribution. Although ZIP codes were enumerated based on regional sorting facilities, geographic boundaries do not technically exist. ZIPs are actually designations identifying the point of delivery (i.e. a street address or Post Office), rather than any defined bounding region. The best example of this “placeless” designation is the US Navy, which has its own ZIP code, but no permanent location. So, ZIP code boundaries can therefore be non-contiguous, undefined, or non-existent. In other words, it is fairly difficult to create a truly representative map, and the maps of ZIP codes that do exist are not comprehensive. Even more problematically, ZIP codes change — but more on that later.

In other word, ZIP code areas, or boundaries, or polygons are approximate area representations of U.S. Postal Service (USPS) ZIP. The USPS makes periodic changes to ZIP Codes to support more efficient mail delivery. They publish changes into Tiger Census website[1] , but unlike block boundaries that physically exist, ZIP code boundaries don’t exist formally! Yes, you read that correctly! ZIP boundaries can cross roads, blocks, rivers, or any other feature.This is nature of ZIP code boundary as we have new developments in space. In US, sometimes a zip boundary may cross a large parcel and a house can have two ZIP codes that cities and counties correct these issues. In some countries ZIP boundaries represent as a buffer around a road or in geometry shapes within a block.

This is the reason we sometimes hear from network planners in Supply Chain or Transportation analysts that we have “missed ZIPs” which are not actually missed ZIP codes, in fact they are Point ZIP Codes identified as one of the four primary types of ZIP codes. You should know in US we have ~10K Point ZIP codes out of ~42K US ZIP codes and only ~32K ZIP codes have a physical boundary. Therefore, we can display ~42K Zip Code ID in tabular format and ~32K out of those ~42K have spatial format with boundaries which come from USPS/Census/County. And don’t forget every country has its own ZIP code system. It is impossible to run a model that consumes ZIP code ID for multiple countries without knowing their ZIP code system. For instance, UK and UAE have very different ZIP code system with US. Let’s review ZIP code types.

ZIP code types:

There are four primary types of ZIP codes: PO Box, Unique, Military, and Standard. There are also additional types of ZIP codes as Non unique, Unique Organization. Definition of these ZIP codes is as described in their type. PO boxes are located at the post office itself; unique codes refer to individual addresses; US military bases overseas have a domestic mailing address; and standard codes designate everything else (i.e. the “normal” ones). Therefore, PO Box, Unique, and Military ZIP codes can be a point that is located within another ZIP code.

Problem Solved!

The “missed ZIP codes” are located within ZIP codes with spatial boundaries. Don’t forget about new development when real estate developer companies took the ZIP ID from city or county, but it is not formally part of addressing system. Currently ~10K Point ZIP exists in US which is located within a ZIP polygon. Each of the ZIP code IDs, regardless of the type, has an “end ZIP code” that shows where that ZIP code is located. We can also locate them by a Spatial Join between ZIP code boundaries and ZIP code point. When we use ZIP to ZIP distance, we shoud use the end ZIP code.

How to use data?

As explained above, we have two datasets for ZIP codes:

  1. Point ZIP codes (~42K records). This dataset has fields of ZIP code ID, ZIP code type, and end ZIP code. ZIP code Id can help you to find data and information in tabular format, for example volume. End ZIP code is for locating a ZIP point within a ZIP code polygon with physical boundary for routing purposes.
  2. Polygon ZIP codes (~32K records). This dataset has spatial boundaries of ZIP codes and include all end ZIP code IDs. It must be input for running any model or search that needs distance to ZIP code. This dataset is same as Tableau provides for ZIP mapping.

Note: Point ZIP codes may include customer data like customer address and name. Discussion with your IT security team is required for any application in production that will use and map this point ZIP codes. If the point ZIP code is mapped into a ZIP polygon, it’s no longer categorized as red data!

Lastly, don’t use any ZIP code data that you find in internet! Google Map, Zillow, Redfin, etc are NOT source of truth for ZIP code data. You should download ZIP code data from City or County or other local government or federal agencies website ( end .gov). Tableau and ESRI though have these data availble in their server. Below is a link of references that you can find reliable ZIP code data. Next time that your client came to you with a different ZIP code boundary that they have found in google map, just show them the right formal ZIP boundaries from the resources as described below. There is another wonderful paper in Medium explaining how to downlowd data , see https://medium.com/@sahilkashyap64/usa-zipcode-boundary-ccbdcfd0af8

References

--

--