The world’s leading publication for data science, AI, and ML professionals.

Applying Machine Learning to Assess Florida’s Climate-Driven Real Estate Risk (part1)

Before she left for a White House job in the Trump Administration, Florida's first-ever – and short-lived – climate change czar, Chief…

Image created by author, inspired by source:'https://sealevelrise.org/states/florida/': The sea level around the Florida Keys has risen by 8 inches since 1950. Its speed of rise has accelerated over the last ten years and it's now rising by 1 inch every 3 years. Scientists know this because sea levels are measured every 6 minutes using equipment like satellites, floating buoys off the coast, and tidal gauges to accurately measure the local sea level as it accelerates and changes
Image created by author, inspired by source:’https://sealevelrise.org/states/florida/‘: The sea level around the Florida Keys has risen by 8 inches since 1950. Its speed of rise has accelerated over the last ten years and it’s now rising by 1 inch every 3 years. Scientists know this because sea levels are measured every 6 minutes using equipment like satellites, floating buoys off the coast, and tidal gauges to accurately measure the local sea level as it accelerates and changes

Florida’s short-lived Climate Change czar, Chief Resilience Officer Julia Nesheiwat, set a clear priority for the state: Protect the real estate market.

Image by source: JNW21 885, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0 via Wikimedia Commons
Image by source: JNW21 885, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0 via Wikimedia Commons

Nesheiwat’s [unpublished] January 2020 report is loaded with proposals aimed to keep Florida’s most important industry, real estate, high and dry.

Her plan proposes stricter building codes, but also more controversial measures, such as disclosing flood risks to home buyers, providing state-sponsored home buyouts, and requiring vulnerability studies for cities and counties.

"Florida’s coastal communities and regions do not have time to waste and need a partner at the highest level to help manage and prepare against impending threats," wrote Nesheiwat, who took a job with the Department of Homeland Security after six months in Florida.

A case study by McKinsey echoed Nesheiwat’s dire projections for much of the state, including the Florida Keys, but also many coastal areas far North of Atlantis [formerly South-Florida].

Homes in Vero Beach, FL. Submerged seawalls and docks, and sea-soaked lawns have become commonplace, image by author
Homes in Vero Beach, FL. Submerged seawalls and docks, and sea-soaked lawns have become commonplace, image by author

"For nearly three months, the residents of [a Florida Keys neighborhood] Stillwright Point’s 215 homes have been forced to carefully plan their outings and find temporary workarounds to deal with the smelly, stagnant water – a result not of rain, but a rising sea – that makes their mangrove-lined streets look more like canals." – The New York Times

The McKinsey report authors say they estimate the projected increase in tidal flooding frequency could result in a $10-30 billion devaluation to exposed properties as soon as 2030, and $30-80 billion by 2050. By 2050, they say the average impact of affected homes is expected to increase 15-35 percent, up from 5 percent today.

"That’s a conservative estimate", said case study co-author Mekala Krishnan, a senior fellow at the McKinsey Global Institute.

Based on reports like McKinsey’s, the common wisdom seems to be that buyers will realize flood-prone properties are bad investments slowly, and not abandon hard-hit areas all at once, which could easily happen if a hurricane hit during King Tides Season in the Fall, which overlaps with Hurricane Season.

"Will mortgages and markets stay afloat in Florida?" – Krishnan, says yes, provided that the state actually does the sort of things suggested in Nesheiwat’s climate report.

"If we do, I think the risk can be managed," she said. But that doesn’t mean every pocket of the state will make it.

Image created by author, inspired by source: https://sealevelrise.org/states/florida/
Image created by author, inspired by source: https://sealevelrise.org/states/florida/

The Dataset & Machine Learning Approach

There are many great Medium articles which outline how to apply ML to real estate valuations, so in an effort to provide original content, the remainder of this article will explore a new approach, with a never-before-seen real estate dataset. In doing so, the aim is to either confirm or deny claims that South Florida real estate prices are already taking a hit from sea-level-rise (SLR)-driven climate risks.

"Our results suggest a disconnect in coastal Florida real estate: From 2013–2018, home sales volumes in the most-Sea-Level-Rise(SLR)-exposed communities declined 16–20% relative to less-SLR-exposed areas, even as their sale prices grew in lockstep. Between 2018–2020, however, relative prices in these at-risk markets finally declined by roughly 5% from their peak. Lender behavior cannot reconcile these patterns, as we show that both all-cash and mortgage-financed purchases have similarly contracted, with little evidence of increases in loan denial or securitization." – The National Bureau of Economic Research‘s recently published report

The target we want to predict is "the difference in the rate of property value change of a given property vs the rate of change of similar properties nearby". In theory, for properties with a high degree of difference, we should find that a substantial part of that difference can be explained by flood risk.

To calculate the target value, we will use the following equations:

In simple terms, we want to predict the difference in the rate of change in price between each property and its neighbors (comparables).
In simple terms, we want to predict the difference in the rate of change in price between each property and its neighbors (comparables).

The above equation can be practically implemented with the following:

In the dataset, approximately 66% of properties have a low flood risk (less than 5 on the Flood Factor scoring scale), and approx. 33% have a high flood risk.

We will test the hypothesis that a high flood risk score is a strong predictor for the difference in the rate of property value change by building an ML model, and then applying various model explainability tools to it, such as feature importance, and SHAP.

Feature Importance of the best trained XGBOOST pipeline, using EvalML AutoML library. image by author
Feature Importance of the best trained XGBOOST pipeline, using EvalML AutoML library. image by author

All of the code and data used for this blog are available on github, here.

Gathering The Data

In addition to standard real-estate features like bedrooms, bathrooms, area, etc., the data also includes:

  • 3 images per property (also contained in the Github repo)
  • 2 published flood risk ratings, one from FEMA, and one from First Street’s Flood Score
  • Each property is/was listed for sale with a published asking price on realtor.com, as of the 3rd week in February, 2021 (last week)
  • Each property is appended with the closest matching Zillow Home Value Index ZHVI (aggregated and merged on Zipcode + number of bedrooms), resulting in a monthly time-series index price of nearby homes, dating back to 1996
  • Additionally, many other public datasets, such as the relevant area Census data demographics, were appended for each property

This dataset includes records of over 1600 south Florida real estate properties, each with over 650 columns, and includes flood risk information, demographic information, listing images, listing descriptions, how long the property has been listed, sales history, and lots of other related information.

Load the Property Data

We need to clean up the raw data before we can build our model. Because it’s time-consuming to clean-up everything, we’ll do a "quick-pass" here, but there will be additional data that could be cleaned in order to obtain more training data. Please feel free to do this and provide a link!

import pandas as pd
import evalml
from evalml.preprocessing import load_data
property_data = pd.read_csv('../data/raw/property.csv').reset_index()

Load the Demographic Data

Geocodio provides a few data sets we can easily append to our property data, for additional studies and usage:

  • The US Census Bureau
  • Local city, county, and state datasets from OpenAddresses
  • OpenStreetMap
  • GeoNames
  • CanVecPlus by Natural Resources Canada
  • StatCan
  • Legislator information from the UnitedStates project on GitHub
demographics_data = pd.read_csv('../data/raw/demographics.csv', low_memory=False).reset_index()

Merge the Data

The data has been presorted, and reindexed, therefore we can simply merge on the index id.

merged = pd.merge(property_data, demographics_data, on='index')

Note that the word sqft is contained within the column Area, therefore we will identify all rows which contain sqft, and disregard rows that dont. NOTE: rows that do not contain sqft would need to be cleaned in a seperate workflow. For now, we will focus on cleaning the bulk of the data only, and later on we can come back to clean the stragglers.

word = 'sqft'
new_df = merged[merged["Area"].str.contains(word) == True]

Other columns need similar cleaning, such as Beds, FloodInfo, and YearBuilt.

word = 'bed'
new_df = new_df[new_df["Beds"].str.contains(word) == True]
word = 'Year Built'
new_df = new_df[new_df["YearBuilt"].str.contains(word) == True]
word = 'Flood Factor'
new_df = new_df[new_df["FloodInfo"].str.contains(word) == True]

The Style column is a total mess because many/most listings do not include this information. For now, we will simply drop it to save time parsing the mess.

new_df = new_df.drop('Style',axis=1)

Using Featuretools to speed up data cleaning.

The Featuretools library has a few great data cleaning tools we will use to save time. Specifically:

  • remove_low_information_features: Keep only features that have at least 2 unique values and that are not all null
  • remove_highly_null_features: Removes columns from a feature matrix that have higher than a set threshold of null values.
  • remove_single_value_features: Removes columns in feature matrix where all the values are the same..
  • remove_highly_correlated_features: Removes columns in feature matrix that are highly correlated with another column.
from featuretools.selection import remove_low_information_features, remove_highly_null_features, remove_single_value_features, remove_highly_correlated_features
df = new_df.copy()
"""Select features that have at least 2 unique values and that are not all null"""
df_t = remove_low_information_features(df)
"""Removes columns from a feature matrix that have higher than a set threshold"""
df_t = remove_highly_null_features(df_t)
"""Removes columns in feature matrix where all the values are the same."""
df_t = remove_single_value_features(df_t)
"""Removes columns in feature matrix that are highly correlated with another column."""
df_t = remove_highly_correlated_features(df_t)

Clean up the Flood Risk Data

The Flood Risk Data is a prominent feature of this study, so we want to clean up the formatting a bit.

df_t[['FemaInfo','FloodFactorInfo']] = df_t.FloodInfo.str.split(' • ', expand=True) 
df_t['FloodFactorInfo'] = df_t['FloodFactorInfo'].astype(str).str.replace('/10 New','').str.replace('Flood Factor ','')
df_t['FemaInfo'] = df_t['FemaInfo'].astype(str).str.replace('FEMA Zone ','').str.replace('(est.)','')

Clean up the Numerical Features

We cannot reformat things like Area and Baths and Year Built and Days on Realtor.com as long as they contain text characters, so we need to remove these in order to correctly format the dataset for training our model.

df_t['Beds'] = df_t['Beds'].str.replace('bed','')
df_t['Baths'] = df_t['Baths'].str.replace('bath','')
df_t['Noise'] = df_t['Noise'].str.replace('Noise:','')
df_t['PropertyType'] = df_t['PropertyType'].str.replace('Property Type','')
df_t['DaysOnRealtor'] = df_t['DaysOnRealtor'].str.replace('Days on Realtor.com','').str.replace('Days','')
df_t['Area'] = df_t['Area'].str.replace('sqft','').str.replace(',','')
df_t['Price'] = df_t['Price'].str.replace('$','').str.replace(',','').str.replace(',','')
df_t['PricePerSQFT'] = df_t['PricePerSQFT'].astype(str).str.replace(',','')
df_t['YearBuilt'] = df_t['YearBuilt'].astype(str).str.replace('Year Built','')

Split up the LastSoldAmt and LastSoldYear features

These columns were included together in the scraped data, so we need to split them up accordingly, in order to properly format them as model features.

df_t[['LastSoldAmt','LastSoldYear']] = df_t.LastSold.str.split(' in ', expand=True) 

Cleanup the LastSoldAmt

The LastSoldAmt data used text characters to indicate thousands and millions, however, for our purposes we need to replace these with their numerical kin.

df_t['LastSoldAmt'] = df_t['LastSoldAmt'].astype(str).str.replace('k','000')
df_t['LastSoldAmt'] = df_t['LastSoldAmt'].astype(str).str.replace('M','000000').str.replace('.','').str.replace('Last Sold','').str.replace('$','').str.replace('000000','0000')

Drop Unecessary Columns and Save the Preprocessed Data

df_t = df_t.drop('LastSold',axis=1)
df_t = df_t.drop('index',axis=1)
df_t = df_t.reset_index()
drop_cols = [col for col in df_t.columns if 'url' in col.lower() or ' id' in col.lower()]
X_t = df_t
X_t = X_t.drop(drop_cols,axis=1)
t_datapath = '../data/processed/preprocessed.csv'
X_t.to_csv(t_datapath,index=False)

Download the Property Images for Later Use

Each property in the dataset comes with 3 urls conatining listing images. While these images are not immediately of concern, we will be using them in a later blog post. For now, we will simply download them.

import requests
def download_images(indx):
    file_name = str(indx)+'.png'
    urldata = df_t[df_t['index']==indx]
    url1 = df_t['Image_URL'].values[0]
    url2 = df_t['Image_URL1'].values[0]
    url3 = df_t['Image_URL2'].values[0]
    urls = [url1,url2,url3]
    ct=0
    for url in urls:
        response = requests.get(url)
        with open('../data/images/_'+str(ct)+'_'+file_name, "wb") as file:
            file.write(response.content)
            file.close()
        ct+=1

df_t['index'].apply(download_images)

Merge the Zillow Data

ZHVI User Guide

One of Zillow’s most cited metrics is ZHVI, the Zillow Home Value Index. It tells us the typical home value in a given geography (metro area, city, ZIP code, etc.), now and over time, for specific property types and sizes. For general information about ZHVI, please refer to this methodology guide and this lighter-hearted video.

We will merge on the key created which concatenates the area Zip code and the property bedroom count in zipbeds:

zillow1beds = pd.read_csv('../data/raw/zillow1bed.csv')
zillow1beds['zipbeds'] = zillow1beds['RegionName'].astype(str)+'_'+str(1)
zillow2beds = pd.read_csv('../data/raw/zillow2bed.csv')
zillow2beds['zipbeds'] = zillow2beds['RegionName'].astype(str)+'_'+str(2)
zillow3beds = pd.read_csv('../data/raw/zillow3bed.csv')
zillow3beds['zipbeds'] = zillow3beds['RegionName'].astype(str)+'_'+str(3)
zillow4beds = pd.read_csv('../data/raw/zillow4bed.csv')
zillow4beds['zipbeds'] = zillow4beds['RegionName'].astype(str)+'_'+str(4)
zillow5beds = pd.read_csv('../data/raw/zillow5bed.csv')
zillow5beds['zipbeds'] = zillow5beds['RegionName'].astype(str)+'_'+str(5)
zillowdata = pd.concat([zillow1beds, zillow2beds, zillow3beds, zillow4beds, zillow5beds])
# load preprocessed data
t_datapath = '../data/processed/preprocessed.csv'
target = 'Price'
#set to None for production / actual use, set lower for testing
n_rows=None
#set the index
index='index'
X, y = load_data(t_datapath, index=index, target=target, n_rows=n_rows)
y = y.reset_index().drop('index',axis=1).reset_index()[target]
X_t = X.reset_index().drop('index',axis=1).reset_index()
X_t[target]=y
df_t['LastSoldDate'] = '1/31/' + df_t['LastSoldYear'].astype(str).str[2:4]
df_t['zipbeds'] = df_t['Zip'].astype(str).str.replace('zip_','')+'_'+df_t['Beds'].astype(str)
zipbeds = list(set(df_t['zipbeds'].values))
zillowdata['zipbeds'] = zillowdata['zipbeds'].astype(str)
df_t['zipbeds'] = df_t['zipbeds'].astype(str)
Number of Features
Categorical                  60
Numeric                     640
Number of training examples: 1071
Targets
325000     1.49%
450000     1.21%
350000     1.21%
339000     0.93%
349900     0.84%
           ...  
245000     0.09%
5750000    0.09%
74995      0.09%
2379000    0.09%
256000     0.09%
Name: Price, Length: 567, dtype: object
Columns (5,9,23,700) have mixed types.Specify dtype option on import or set low_memory=False.
df_t = pd.merge(df_t, zillowdata, on='zipbeds')

Calculate the rate of change for each Property, and its comparables

In real estate, a comparable is a nearby property with similar features, such as the same number of bedrooms. In our case, for each property we merged on zipbeds we want to train a model to predict the difference in the rate of change of the price vs the rate of change in price for nearby comparables. To do this, we will find the LastSoldDate column for the target property, and lookup the corresponding ZHVI rate of change from that date until now.

X_t = df_t.copy()
time_series_cols = [col for col in X_t.columns if '/' in col and 'Percentage' not in col and 'Value' not in col and 'Margin of error' not in col and 'Metro' not in col and col != '1/31/21']
l = []
for ct in range(len(X_t)):
    try:
        indx = X_t['index'].values[ct]
        last_sold_date = X_t['LastSoldDate'].values[ct]
        zillow_price = X_t[last_sold_date].values[ct]
        X_ts = X_t[X_t['index']==indx]
        X_ts['zillow_price'] = zillow_price
        X_ts['zillow_price_change'] = X_ts['1/31/21'].astype(float) - X_ts['zillow_price'].astype(float)
        X_ts['zillow_price_change_rate'] = X_ts['zillow_price_change'].astype(float) / float(2021.0 - X_ts['LastSoldYear'].astype(float))
        X_ts['zillow_price_change_percent'] = X_ts['zillow_price_change'].astype(float) / X_ts['zillow_price'].astype(float)
        l.append(X_ts)
    except: pass

df = pd.concat(l)
df['last_sold_price_change'] = df['Price'].astype(float) - df['LastSoldAmt'].astype(float)
df['last_sold_price_change_percent'] = (df['Price'].astype(float) - df['LastSoldAmt'].astype(float)) / df['LastSoldAmt'].astype(float)
df['last_sold_price_change_rate'] = df['last_sold_price_change'].astype(float) / float(2021.0 - X_ts['LastSoldYear'].astype(float))
df['yearly_price_delta'] = df['last_sold_price_change_rate'].astype(float) - df['zillow_price_change_rate'].astype(float)

Defining the Target Variable we will train our model to predict.

For our initial blog entry, we will use the feature yearly_price_delta_percent as our target variable. It is defined as follows:

df['yearly_price_delta_percent'] = df['last_sold_price_change_percent'].astype(float) -  df['zillow_price_change_percent'].astype(float)

Final Cleanup & Save

A few straggler text characters remain in some numerical columns, so remove them then save.

df = df.drop(time_series_cols, axis=1)
df =df[df['LastSoldAmt'] != 'Property TypeTownhome']
df = df[df['LastSoldAmt'] != 'Property TypeSingle Family Home']
df.to_csv('../data/processed/zillow_merged.csv',index=False)

Benchmarking with AutoML

While the raw dataset is much larger, for the purposes of this post, we will be focusing on a smaller sample of features.

image by author
image by author

Before diving into ColorFrames, it’s worthwhile to get a baseline for our target data’s predictability using traditional tabular modeling algorithms, such as XGBOOST, so that we can compare ColorFrame performance. But what is our target feature?

Remember, to calculate the target value, we used the following equations:

In simple terms, we want to predict the rate of change in price between each property and its neighbors (comparables).
In simple terms, we want to predict the rate of change in price between each property and its neighbors (comparables).

The above equation can be practically implemented with the following:

The National Bureau of Economic Research recently published a report which we’ll use as a starting point and reference for planning our methodology. To be sure, the aim here is to see what they did in a very general sense, and NOT replicate their study. Our study will be done using wholly different methods, and data.

Florida zip codes. Image created by author, inspired by source
Florida zip codes. Image created by author, inspired by source

The Hypothesis

Our hypothesis is that the published flood risk ratings/scores, such a First Street’s Flood Factor, should be among the most important predictive features for high-flood-risk properties, if indeed there is a strong/non-trivial price influence associated with high flood risk.

To test this hypothesis, we start-out building a basic pipeline using EvalML’s AutoML feature.

AutoML via EvalML. image by author
AutoML via EvalML. image by author

Feature Importance

best_pipeline.feature_importance
feature importance. image by author
feature importance. image by author

We can get the importance associated with each feature of the resulting pipeline.

It’s easy to see that, as expected, the Flood Factor score from First Street does in fact show up near the top of the most important features! Wow!

We can also compute and plot the permutation importance of the pipeline.

from evalml.model_understanding.graphs import graph_permutation_importance
graph_permutation_importance(best_pipeline, X_holdout, y_holdout, 'MAE')
permutation importance. image by author
permutation importance. image by author

Interpreting Permutation Importances

"Permutation feature importance is a model inspection technique that can be used for any fitted estimator when the data is tabular. This is especially useful for non-linear or opaque estimators. The permutation feature importance is defined to be the decrease in a model score when a single feature value is randomly shuffled 1. This procedure breaks the relationship between the feature and the target, thus the drop in the model score is indicative of how much the model depends on the feature. This technique benefits from being model agnostic and can be calculated many times with different permutations of the feature." – sklearn

When interpreting permutation importances, the values towards the top are considered the most important features, and those towards the bottom matter least, and the values indicate how much model performance decreased with a random shuffling of each feature (in this case, using "Mean Absolute Error" as the performance metric).

In the above chart we are immediately drawn to see the negative value for the feature FloodFactorInfo. While this could imply that the high feature importance we saw above amounts to little more than random chance, it’s important to note that permutation importance is highly susceptible to feature correlation.

To address the potential for correlation to interfere, let’s replace the FloodFactorInfo_flood_factor_high/low features with a boolean type of FloodFactorHigh=True/False . Let’s also drop the FemaInfo feature due to its potential interference with FloodFactorHigh. Then let’s perform a correlation study to remove any other potential conflicts.

Feature Correlation Study Using Woodwork DataTable's mutual_information() tool. image by author
Feature Correlation Study Using Woodwork DataTable’s mutual_information() tool. image by author

In the correlation table above we can see that a few features are more than 50% correlated, so to drill-down on what’s happening with our permutation chart, let’s try removing City_x and County, which should remove all correlations > 50%.

Revised feature Importance after removing the City_x and County features to reduce correlation between features. image by author
Revised feature Importance after removing the City_x and County features to reduce correlation between features. image by author

We can see in the revised feature importance chart that FloodFactorHigh becomes the most important feature in the XGBOOST model, but what about the permutation importance?

Revised permutation Importance after removing the City_x and County features to reduce correlation between features. image by author
Revised permutation Importance after removing the City_x and County features to reduce correlation between features. image by author

In the revised permutation chart we can see that while FloodFactorHigh is still near the bottom, it’s no longer negative, which is an improvement over the prior case. Now the permution with the greatest importance is Areafollowed by Longitude. This result seems at least plausible, but as a matter of personal preference I tend to avoid over-relying on permutation due to its high sensitivity to feature correlations, which when dealing with real estate features can be largely unavoidable, especially when using smaller data samples.

Let’s turn to the SHAP library and see if we can get any other insights about our model and its feature importance…

Tree ensemble example with TreeExplainer (XGBoost/LightGBM/CatBoost/scikit-learn/pyspark models)

Image source: SHAP
Image source: SHAP

SHAP can explain the output of any Machine Learning model, and it’s extra fast C++ implementations are supported for XGBoost, LightGBM, CatBoost, scikit-learn and pyspark tree models, so let’s try it out:

import xgboost
import shap
# load JS visualization code to notebook
shap.initjs()
# train XGBoost model
model = xgboost.train(best_pipeline.parameters['XGBoost Regressor'], xgboost.DMatrix(X_ttrain, label=y_train.to_series()))
# explain the model's predictions using SHAP
# (same syntax works for LightGBM, CatBoost, scikit-learn and spark models)
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_ttest)

Now that we’ve created the explainer object, let’s make some explainability plots…

# visualize the first prediction's explanation (use matplotlib=True to avoid Javascript)
shap.force_plot(explainer.expected_value, shap_values[0:1], X_ttest.iloc[0:1])
SHAP plot. image by author
SHAP plot. image by author

The above explanation shows features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to the model output. Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue (details about force plots can be found here: Nature BME paper).

# visualize the test set predictions
shap.force_plot(explainer.expected_value, shap_values, X_ttest)
SHAP plot. image by author
SHAP plot. image by author

To get an overview of which features are most important for a model we can plot the SHAP values of every feature for every sample. The plot below sorts features by the sum of SHAP value magnitudes over all samples, and uses SHAP values to show the distribution of the impacts each feature has on the model output. The color represents the feature value (red high, blue low). This reveals for example that a high FloodFactorInfo_flood_factor_high increases the predicted difference in price between the target home, and the Zillow Home Value Index.

# summarize the effects of all the features
shap.summary_plot(shap_values, X_ttest)
SHAP plot. image by author
SHAP plot. image by author

To understand how a single feature effects the output of the model we can plot the SHAP value of that feature vs. the value of the feature for all the examples in a dataset. Since SHAP values represent a feature’s responsibility for a change in the model output, the plot below represents the change in predicted house price as DaysOnRealtor.com changes. Vertical dispersion at a single value of DaysOnRealtor.com represents interaction effects with other features, in this case, FloodFactorInfo_flood_factor_high

shap.dependence_plot("DaysOnRealtor.com", shap_values, X_ttest, 
                     interaction_index="FloodFactorInfo_flood_factor_high"
                    )
SHAP plot. image by author
SHAP plot. image by author

Key Takeaways

  1. The FloodFactorInfo feature is among the most important features when predicting the difference in rate of change in price between a given property and its comparables. This importance implies that Florida Real Estate price are already being impacted by climate change risks.
  2. Because the methodology used to calculate the FloodFactorInfo feature is specific to each property, whereas the FEMA flood ratings are not, it makes sense that the FEMA flood ratings are less predictive. NOTE that FEMA plans to release its first change in approx. 50 years starting Fall of 2021, thoughts on what’s to come?
  3. Are there other possible correlations to the FloodFactorInfo feature which could indirectly explain these results, such as proximity to the ocean or canals? Because we’ve included latitude+longitude and other location-aware features, we would expect these other features to be nearly as important, if this were the case.

Final Thoughts

More related posts using this dataset are coming soon! Be on the lookout for additional analysis using ColorFrames, a novel computer vision approach inspired by SuperTML…

Real estate investors should definitely take note of this and related study results. In most cases, investors consider things like location, current asset valuations, and potential upside based on local area development trends. But increasingly, savvy investors are analyzing environmental events and impacts, and these types of considerations are fast becoming formalized in standards like those proposed by the Task Force on Climate-related Financial Disclosures (TCFD). It’s important to be aware of the impact that these changes and events can have on real estate investing.

Climate risks, such as sea-level-rise and wildfires, pose a significant threat to both residential and commercial buildings and negatively impact real estate valuations. Besides causing property damage, climate events can also spell expensive insurance, maintenance and operational costs. In a worst-case scenario, a natural disaster could cause a complete property loss and bankrupt insurance providers. Thankfully, there seems to be increased awareness surrounding climate risks in the real estate investment industry as a whole.

Best Practices For Identifying Climate Risks

If they’re not already, investors and investment managers need to implement practical strategies to address the potential impacts of climate risks. When considering adding a new asset to their portfolio, its prudent to explore exposure to any form of climate risk. Proactive investors are already analyzing mountains of climate data and other available information to make the most educated decisions, and they will be tomorrow’s winners. Here’s a few bullet-points to help guide real estate investment decision-making:

• Formalize a process to evaluate climate risks as part of your broader investment decision-making.

• Educate yourself on mitigation and adaptation strategies (higher seawalls, HVAC, higher elevation in general, etc.).

• Contact your insurance provider to understand their plans for the future.

• Research regulations intended to address climate risks.

Climate risks shouldn’t strike fear into the heart of real estate investors, but they should be a consideration now and in the foreseeable future.


Related Articles