The world’s leading publication for data science, AI, and ML professionals.

Geospatial Analytics for Reassessing Urban Structures

Utilizing Geohash and data science models to solve traffic congestion and spatial inequality that has resulted from the current urban…

Image from Unsplash.com
Image from Unsplash.com

Background

In today’s technological era, location data has been essential for the business operations of several tech companies. By enabling users to link their device location of the firm’s platform, data teams can build models and insight reports using the available data. Such cases can be price-setting based on a location’s demand or visualizing sales coverage by city. While working with such data, I cannot help but imagine potential use cases beyond business. As an urban enthusiast, the first use case that came to mind is generating insights for tackling traffic congestion and spatial inequalities within an urban area.

Traffic congestion can generate such high socioeconomic and environmental costs, from reduced workforce productivity to increased carbon emissions. For cities like Jakarta, the annual cost of traffic congestion is up to USD 4.5 billion (Roberts, 2019). This also deserves global attention as it contributes to climate change. Spatial inequalities can be defined as the unequal distribution of quality public goods and services, such as transportation, healthcare, and education. In urban areas, it results in disparities in quality of life, increased crime rates, and low overall happiness index of a city’s population (Glaeser et al, 2009).

Taking a closer look, both traffic congestion and spatial inequalities are the results of the urban structure, the arrangement of land use, and the degree of connectivity. Lack of transport accessibility, major roads, and proximity to work and amenities would force residents to take long commutes using personal vehicles. Meanwhile, those without personal vehicles would have difficulties accessing such places, giving them fewer income generation opportunities. Several households may move to informal settlements within districts of high economic activity for the sake of proximity. This is an example of inequality.

From issues stated in the previous paragraph, I define two basic solutions: ** (1) improve accessibility for underserved neighborhoods and (2) increase the distribution of urban amenitie**s. Current technology could be used to generate insights for local policymakers to determine the right solutions for each area. These insights would not only show the areas’ urgent needs but can also be used to formulate funding schemes.

In this article, I will discuss the use of geospatial Analytics, particularly geohash, for generating insights and recommendations to improve urban structures. Before going into technical details, we first need to embrace the interdependence across districts in terms of economic activity.

Structural interdependence within an urban economy

To determine the geospatial analytics model, we need to understand how urban economies work. According to Hewings (2021), urban economies are often characterized by the cursory understanding of a set of districts or regions. However, taking a deeper look, there is a strong interdependence across regions. Detailed analysis on the Chicago metropolitan showed the strong trading relationships in goods and services, flows of people (commuters), and flows of household expenditures (shopping). This revealed the reciprocal economic effects between regions in terms of income distribution and consumption expenditure.

Earlier research by Pan et al (2018) reassessed urban structure and land-use patterns, taking the case of Chicago, based on employment, population, reviews to indicate Places of Interests (POIs), and accessibility attraction using data from Hoover data directory, US Census, yelp.com and road density maps respectively. A key finding was that residents are attracted to places with high accessibility and "quality of life" amenities or places of interest (POIs). This shows the deviation of the distance to the central business districts (CBD) model, which was the primary measure of attraction. Other findings show that new businesses prefer locations away from existing competitors in the CBD, creating new employment centers. My main takeaway from this article would be that accessibility and spatial distribution of amenities are important elements for residential areas. The former enables residents to access income generation and/or employment opportunities while the latter indicates the quality of life. Hewings’ approach can be used as a framework for geospatial analytics projects, which is to be discussed in the following section.

The application of geospatial analytics

Hewings’ research inspired me to take a glance at the structure of Bandung, Indonesia, the city where most of my life was spent. Like most major cities in Southeast Asia, Bandung has a strong mall culture. Malls can be seen as main amenities or POIs for leisure, with locations reflecting the district’s level of development. Its density of business activity generates a considerable amount of job opportunities. Therefore, the distribution of malls can be considered an indicator of spatial equity. To get an overview, I simply searched "malls in Bandung" on Google Maps. From the figure below, we can see that all malls are located on the West side of the city. Moreover, we can see that there are only two major roads that connect these districts to the Eastside (marked in the red box), which mostly consists of residential areas. This is a clear indicator that the city’s current structure has generated issues of congestion and spatial inequality. From personal experience, these roads plus those leading to malls are highly congested during peak commuting hours and weekends. The contrast in the overall level of prosperity between the West and East can also be seen through observation. The government of Bandung is indeed seeking to transform its single nucleus structure to a double nucleus, hence geospatial analytics can be of great use.

Distributions of malls in Bandung, Indonesia (Image by Author, generated from Google Maps)
Distributions of malls in Bandung, Indonesia (Image by Author, generated from Google Maps)

To build a stronger analysis, GIS tools can be used to display road congestion on-peak hours. Relevant data can be extracted from crowdsourcing platforms, Google Earth imagery, or other available government data. For capturing inequality indicators, nightlight datasets generated from satellite imagery are currently being used for economic development analysis. The density of nightlights indicates higher economic activity and since it is real-time, users can also analyze the movement to indicate growth. This latter would be useful to measure the success of policy interventions.

Theoretically, this is a common issue for urban planners as the Eastside is often used for manufacturing districts to mitigate the smog movement from wind flow. With the lack of synchronization between development and structural transformation plus population increase, areas like East Bandung have insufficient infrastructure that provides accessibility for residents and attracts businesses. To determine the right action for overcoming this issue, local governments and development practitioners can utilize geospatial analytics tools to generate insights at a neighborhood level.

Using Geohash as a primary dimension for analysis

A tool that I found highly useful is geohash, which positions a location coordinate (longitude and latitude) into a rectangular cell that is expressed using a short alphanumeric string. The geohash size is easily adjustable from a precision level of one to 12 or in other words, more characters generate lower geographic levels. When visualized on a map, geohashes form a grid that makes it convenient for users to obtain insights (an example for Minneapolis can be found in this link). The most frequently used levels for me are 7 and 6, which can be defined into neighborhoods and sub-districts respectively. Samples of these geohashes can be found in the images below. These were generated by inputting specific location coordinates (I referred to a popular mall in Bandung) into the movable type scripts website. This site is useful to check a geohash coverage and it also provides a JavaScript source code that is also available on Github. There are also built-in functions on SQL servers to convert longitude and latitude data, which enable us to write simple aggregation queries once data is transformed.

Example of a Geohash 7 grid (Image by Author, generated from Movable Type Scripts)
Example of a Geohash 7 grid (Image by Author, generated from Movable Type Scripts)
Example of a Geohash 6 grid, (Image by author, generated from Movable Type Scripts)
Example of a Geohash 6 grid, (Image by author, generated from Movable Type Scripts)

Since it is a convenient way to display a location, geohash can be used as a primary dimension for analyzing accessibility and spatial distribution within an urban structure. Geohash7 can be used for accessibility is an important requirement for all neighborhoods. For spatial distribution, geohash 6 will be more appropriate as amenities/POIs and other employment centers can serve a wider range of residents.

A limitation for geohash is that it cannot be customized according to the size of an administrative area due to its grid form. The size of districts, counties, or boroughs may vary according to population density. For sharper analysis needed for certain administrative offices, GeoPandas would be a more feasible tool. This Data Science library uses geometry columns derived from points and polygons. Location coordinates, longitude-latitude, can be converted into this geometry data format. A use case example showing a crime rate heat-map for Columbus, Ohio, can be found in this link.

Accessibility

The main enablers of accessibility are road and public transportation infrastructure. Considered as the backbone of an urban network, it is essential for neighborhoods to be well-connected to major roads and public transport nodes. To analyze the sufficiency of public transport coverage, we can simply add a geohash column to a dataset containing the longitude and latitude of every bus stop and railway station. This is usually available in the local transport authorities’ database or archives. From here, we can generate a map that shows the availability of bus stops in each geohash grid. To determine the urgency of provision for grids that do not have bus stops, we can join the geohash column with a population dataset from census data, as done by Hewings et al. Python libraries like polygon-geohasher can be used to create visualization The approach can be applied to road infrastructure, taking the distance of the grid to a coordinate of the main road.

For many cities, especially developing ones, infrastructure insufficiency might already be obvious for stakeholders. The main challenge is the lack of funding for providing the needed infrastructure, as most local governments have limited fiscal capacity. Infrastructure projects do not only require high costs to build, operate and maintain, but also endures the cost of land acquisition. To provide an alternative solution, a mechanism that is becoming more widely used for this case is land value capture (LVC). This is basically a method of financing for transport infrastructure by capturing the added value of land due to increased accessibility (Medda,2012). The accessibility created by infrastructure projects can only be enjoyed by the beneficiary (resident or private developer) if they make an upfront contribution in form of betterment tax or Development fees.

Machine learning algorithms can be applied to calculate potential uplifts. The first step is creating a data frame in a Python notebook by importing a land value database, generally used for tax calculation. From here, we can set up a prediction model using linear regression with the distance to major roads and/or public transport nodes as main attributes, depending on the infrastructure category. To predict the value uplift from a planned project, we can create a simulation column that assumes its current existence. Results will reveal the expected average land values in geohashes surrounding the project. Another method is using historical data that contains values before and after recent infrastructure projects. Taking surrounding areas as a treatment group and others as control, we can run a causal impact analysis to calculate the land value uplift after the project.

The results of the algorithms would be a useful reference for creating a tax-based value capture scheme. This is often referred to as a betterment levy, which has been implemented in Bogota. Landowners paid the levy based on the land gain at the time of sale or development. The proceeds are then used for funding infrastructure investment. Another mechanism is land consolidation, which is popular in Japan, where landowners are willing to sacrifice parts of their land for road improvement for better accessibility. The main challenge for these schemes is convincing landowners on the economic uplifts, as it is not yet concrete and often limited to those interested in selling their property. An alternative scheme is development-based, which is correlated with the spatial distribution of POIs discussed in the next section.

Spatial Distribution of Amenities

With the number of social media check-ins, reviews, and GPS navigations done through mobile devices, there is an abundant amount of data relevant to this issue. We can see that Hewings et al could utilize yelp.com review data to measure attraction and I could simply search Google Maps to get a broad overview of the distribution of malls. Amenities are not limited to malls and restaurants but also include public spaces, fresh markets, and clinics/healthcare centers (a super-high priority today). The aforementioned platforms can generate quality insights on an area’s attractiveness, such as areas that had the highest overall check-ins on Instagram the last 2 weeks. Another interesting dataset would comprise the top destinations on Google Maps that includes the users’ starting point. This information would help urban planners identify areas that are attracting businesses and commuter activity. The latter could also reveal congestion resulting from uneven spatial distribution. However, the main obstacle here is that most of these datasets are not publicly available. A technique that can be used is scraping, extracting data from an accessible website. Google Maps would be a good platform for this case since it has rating data for a wide range of categories.

To commence an exploratory data analysis, we can extract longitude and latitudes of the top 500–1000 places on Google Maps per category. After converting the data points to geohash 6, we can generate a heatmap showing each grid’s attractiveness by category, using a filter column. A heatmap of the city’s population would also be necessary to complement the analysis. An ideal result would show warm colors across different parts of the city (excluding agricultural and green spaces). On the other hand, a concerning result would show most warm color grids fully concentrated in a particular area and contradictory to the population distribution. This would lead to the where and how questions of redevelopment.

For the first question, a k-means clustering model can be applied to understand the characteristics of the attractive areas and the propensity of currently underdeveloped areas. In other words, identifying less attractive areas have the highest level of similarity to the attractive areas. The hypotheses would be an area’s attractiveness is linked to major roads, air quality, green spaces, and pedestrian pavements. Potential locations for redevelopment (in a sustainable manner, of course) would have these attributes but a currently low number of amenities/POIs. To foster economic activity and growth, policymakers can give tax incentives for the businesses to move or set up operations in these locations. This will attract the private sector to invest in underserved areas and build equity in the long term.

A challenging situation would be if these potential locations are also limited or even non-existent. Another alternative is simply selecting brownfield areas within proximity of residential areas and using the key attributes from the k means model as a planning framework. This would mean that these attributes, like major roads and green spaces, must be built. This brings us back to the funding challenge and LVC.

As mentioned earlier, there are development-based LVC schemes besides the discussed tax-based schemes. Instead of charging landowners, local governments can collaborate with private developers to help fund redevelopment projects, including required infrastructure, with development rights as an incentive. In simple terms, developers cannot enjoy revenues from new locations unless they help fund the infrastructure that will bring in tenants and consumers. This mechanism has been used in London to redevelop the Battersea area, which required underground rail extension. A machine learning algorithm similar to the one for accessibility can be applied, with more focus on economic activity. The results will help attract partnerships with developers.

Closing Remarks

Geospatial data can be analyzed in several ways, including reassessing an urban structure. There are several available tools that can help policymakers obtain insights related to underlying issues such as lack of accessibility and uneven spatial distribution. These insights can also be further analyzed using data science models to justify solutions for overcoming these issues.

However, geospatial analytics is actually just a tool to support policymakers. The fundamental aspects lie in the policies, institutions, and market. The first question to ask is whether the policymakers and market stakeholders have the interest to seek information and take necessary action. The main challenge is actually building the motivation for change and executing action plans. Geospatial analytics is indeed a powerful tool, but the deciding factors are the people consuming the data insights.

An interesting topic for further discussions on geospatial analytics is the pandemic. With a large number of people working from home and increased demand in online shopping, I keep wondering to what extent these patterns will change Urban structures post-pandemic. Is this just an extreme event and things are soon to be back to normal or has technology proven there is less need for commuting? Geospatial analytics can help provide insights on change of behavior before and after social-distancing restrictions, as it has been on and off for some cities. However, findings can also be biased to the less vulnerable/precautious/extroverted population. Nevertheless, such studies should continue until a few years after the pandemic.

References

Glaeser, E.,Resseger,M.,Tobio,K. 2009. Urban Inequality. Harvard Kenney School: Tauban Center for State and Local Government.

Hewings, G. 2021. Sustainability and the Urban Economy. Webinars BIG! Series #11, University of Indonesia

Medda, F., 2012. Land value capture finance for transport accessibility: a review. Journal of Transport Geography, 25, pp. 154–161.

Pan, H., Deal, B.,Chen,Y., Hewings,G. 2018. A Reassessment of urban structure and land-use patterns: distance to CBD or network-based? – Evidence from Chicago. Regional Science and Urban Economics,70,215–228

Roberts,M., Gil Sander, F., Tiwari,S. 2019. Time to ACT: Realizing Indonesia’s Urban Potential. Washington DC: The World Bank


Related Articles