The world’s leading publication for data science, AI, and ML professionals.

Landcover Analysis using Python and ArcGIS

Create data maps or perform spatial analysis all inside a Jupyter Notebook

Landsat image courtesy of the U.S Geological Survey
Landsat image courtesy of the U.S Geological Survey

ArcGIS is one of the top platforms for geostatistical analysis. Using ArcGIS libraries with python gives you the ability to analyze shapefiles, display Landsat satellite imagery, create data maps, and much more all within a Jupyter notebook! There are a limitless amount of Data Science projects you can create using this library but for this article, I’ll focus on a simple NDVI (Normalized Difference Vegetation Index) analysis.

Before we get started, make sure you have all the necessary libraries installed check out arcpy on ESRI’s site to learn more about what you need. Here are all the libraries I used for this analysis:

import pandas as pd
from datetime import datetime
from IPython.display import Image
from IPython.display import HTML
import matplotlib.pyplot as plt
import sys
import arcgis
from arcgis.gis import GIS
from arcgis.raster.functions import apply, clip, remap, colormap
from arcgis.geometry import lengths, areas_and_lengths, project
from arcgis.geometry import Point, Polyline, Polygon, Geometry
from arcgis.geocoding import geocode
from arcgis.features import FeatureLayer

1. Define Your Area

This part is entirely up to you, for this project I’m going to use New York City. After you’ve chosen an area you’ll then need to find the FeatureLayer file associated with that location. Usually, this can be found by using ESRI’s RESTful API, here’s how I found NYC’s FeatureLayer:

nyc_fl = FeatureLayer('https://gisservices.its.ny.gov/arcgis/rest/services/NYS_Civil_Boundaries/FeatureServer/4')
ny_df = pd.DataFrame.spatial.from_layer(nyc_fl)
nyc_df = ny_df.iloc[32:33]

This gave me a GeoDataFrame, I singled out the 32nd row which covered New York County. After I determined the area I just took the spatial component and converted it to a shapefile.

Image by Author
Image by Author

2. Get Satellite Imagery

This is how we can define the land cover of our FeatureLayer. NDVI analysis works by analyzing the color based on each pixel of a satellite image. The easiest way to do this is to use Landsat imagery from ESRI:

landsat_item = gis.content.search('title:Multispectral Landsat', 'Imagery Layer', outside_org=True)[0]
landsat = landsat_item.layers[0]

Then you can take the Landsat item, define the ideal cloud cover percentage (keep this as low as possible) and set a start and end date:

selected = landsat.filter_by(where="(Category = 1) AND (cloudcover <=0.05)",time=[datetime(2020, 4, 1), datetime(2020, 6, 30)],                              geometry=arcgis.geometry.filters.intersects(area['extent']))  
df = selected.query(out_fields="AcquisitionDate, GroupName, CloudCover, DayOfYear",                   order_by_fields="AcquisitionDate").sdf 
df['AcquisitionDate'] = pd.to_datetime(df['AcquisitionDate'], unit='ms')
Landsat image courtesy of the U.S Geological Survey
Landsat image courtesy of the U.S Geological Survey

3. Add NDVI filter and create a clip

Now that you have your satellite imagery you can apply the ‘NDVI Raw’ function to calculate the NDVI for each pixel

nyc_colorized = apply(nyc_image, 'NDVI Raw')

Then you can use the shapefile we defined early to get a perfect clip of your area

nyc_clip = clip(nyc_colorized,nyc_poly)
nyc_clip.extent = area['extent']
Image by Author
Image by Author

4. Color mapping

NDVI values range from -1 to 1, it’s up to you to decide how to classify your land cover but I would suggest reading more about what this value represents. For this project, I divided the land cover into three groups: water, concrete, and vegetation.

masked = colormap(remap(nyc_clip, 
                        input_ranges=[-1,0,     # water
                                     -0.1, 0.4, # Concrete
                                     0.4, 1],   # Vegetation, Trees      
                        output_values=[1, 2, 3]),
                        colormap=[[1, 1, 255, 248], [2, 144, 148, 148], [3,14,247,22]], astype='u8')
Image by Author
Image by Author

4. Analysis

Now that you have your colormap, you can analyze it!

xpixel = (nyc_clip.extent['xmax'] - nyc_clip.extent['xmin']) / 800
ypixel = (nyc_clip.extent['ymax'] - nyc_clip.extent['ymin']) / 400

full_res = masked.compute_histograms(nyc_clip.extent,
                                   pixel_size={'x':xpixel, 'y': ypixel})
total_pix = 0
hist = full_res['histograms'][0]['counts'][0:]
for x in hist[1:]:
    total_pix += x
colors=['#0EF716','#01FFF8','#909494']
labels =[ (hist[1]/sum(hist)), (hist[2]/sum(hist)), (hist[3]/sum(hist)) ]
plt.pie(hist, labels=['', 'Water', 'Concrete', 'Green Cover'],colors=colors,
        shadow=True)
plt.title('Landcover of New York City')
plt.show()
Image by Author
Image by Author

This is just one project idea, but I hope you can use it as inspiration to create something even better. Want to know how the land cover of your location has changed over the year? Change the start and end dates to create multiple colormaps. Want to know how cloud cover impacts accuracy? Raise the cloud cover percentage and compare the two. Want to add a unique, spatially-focused data science project to your resume? Considered using ArcGIS with Python!

Thank you for reading!

Full notebook: https://github.com/Bench-amblee/geo/blob/main/nyc_landcover.ipynb


Related Articles