The world’s leading publication for data science, AI, and ML professionals.

Enrich your location data without leaving your notebook

Introducing the AskIggy Python API

Photo by Luiz Cent on Unsplash
Photo by Luiz Cent on Unsplash

Geographic information plays a major role in pricing for industries like real estate. The characteristics of a home’s neighborhood – its (dis)amenities, aesthetics, and walkability – all impact its value.

Imagine you’re a new data scientist at a real estate company. Your first assignment is to improve your group’s pricing model. You have a hunch that adding model features to describe the home’s vicinity could lead to more accurate price predictions, but you’re not sure. How will you begin your experiments?

How can you test your hypothesis without investing lots of time in gathering the location data, and without having to read up on projections, routing algorithms, or spatial joins?

At Iggy, we’ve been working on tools that make it easy for any company to include this type of neighborhood intelligence in its analyses. With our API and its new Python wrapper, experimenting with location information in your project is as easy as a pip install and a few method calls.

In this first of several posts around our real estate scenario, we’ll walk through a quick demonstration of how the Iggy API can be used to enrich property data with location information in Python. With just a few API calls, we’ll be able to quickly add features to a home sales dataset that describe neighborhood characteristics like proximity to necessities, tree cover, and population density.

Bring your own dataset

The dataset we’ll use for this demo is a publicly-available sample of 1000 detached, single family home sales in Pinellas County, FL since 2019. It includes information about the sales dates and prices, as well as some important information about the properties themselves.

Let’s take a look at what we have to start with.

The resulting data dictionary looks like this:

There are a range of data here that describe each property: its physical characteristics (square_footage, floor_system), its situation (frontage), and even some assessment data (log_land_value). Of course there is also sales information, including sale_date and log_price.

Now, let’s say the existing pricing model predicts the sales price (log_price) for each home, using the rest of the features as input. While the available information provides plenty of detail on the individual home characteristics, there are important factors that are missing. Specifically, this data does not tell us anything about the neighborhood of each property: is it convenient to grocery stores or coffee shops? Is it in a leafy neighborhood? How is the surrounding air quality?

The data we have here is certainly useful, but we can do better without much effort.

Iggy makes it easy to enrich this dataset with neighborhood intelligence via a set of API calls:

  • Iggy’s /lookup API endpoint provides information about the input location itself. This includes physical information like air quality, light pollution, and tree canopy, as well as population density and school catchments.
  • The /points_of_interest endpoint gives information about the surrounding neighborhood in terms of distance to amenities (by time walking, biking, or driving).
  • The /amenities_score gives a score representing the diversity of quality of life amenities (i.e. parks, bakeries, restaurants, grocery stores, recreation) around a location.

Our new Python library is designed to simplify the process of accessing the Iggy API within a Python script. Instantiating an iggyapi object requires an Iggy API key (which you can request via the "Get Early Access" link on our home page):

Then we can call any of the endpoints using built-in functions. Here is an example of a call to the /lookup endpoint to find out the air quality at an input point:

The Iggy Python library is built for data scientists

The Iggy Python library provides a way to define "features" based on information derived from the Iggy API, through the iggyfeaturemodule. Since we’re starting with a data frame that contains geographic coordinates, we can use iggyfeature to define new columns for our data frame that rely on Iggy API calls.

For example, if we want to build a feature that calculates the distance to the closest grocery store, we can do so as follows:

In the feature definition above, we’re specifying that the feature is derived from Iggy’s /points_of_interest endpoint (IggyPOIFeature), that it’s looking specifically for grocery_stores, and that it will return the minimum (calc_method='min') distance to any grocery store within 30 miles of the input point.

Further, the iggyfeature.IggyFeatureSet class helps us to combine a list of features and enrich an entire data frame with multiple features in one line of code.

So let’s create some new Iggy features to add to our data. We’ll add the following to each row:

  • Distance to the nearest point of interest from one of 28 categories and 3 brands, including amenities (like restaurants, bakeries, and dry cleaners) and dis-amenities (including waste management and prisons). For a full list of POI types currently available, see our documentation.
  • Number of restaurants, bars, coffee shops, bakeries, groceries within 15 min walk
  • Value of light pollution, tree canopy, air quality, and population density indices for that location
  • The house’s Amenity Score within a 10 min walk or drive. The Amenity Score is an Iggy index that measures the diversity of quality-of-life amenities (e.g. parks, restaurants, etc) in a local area.

    With the code above, we’ve defined 42 new features or columns that can be added to our data to describe each property’s local neighborhood. Let’s combine them into an IggyFeatureSet and use it to enrich the data frame in one go:

    Now, if we use Featuretools again to examine the new features in the Iggy-enriched data frame, we see the new elements of information for each home sale, each providing additional context about the surrounding neighborhood.

There! We’ve added 42 new location-based features to the model without having to do any spatial joins or even look at a map. Now we’re free to experiment with these new features to quickly determine whether they’re helpful for the pricing model.

In an upcoming post, we’ll walk through some of these analyses to see which features turn out to be most important in our real estate pricing scenario. Until then…

We’re excited to see what you’ll build with Iggy!


Related Articles