The world’s leading publication for data science, AI, and ML professionals.

Clustering Villages and Finding Fresh Produce Suppliers in Metro Manila Using K-Means, Foursquare…

A possible business opportunity during the pandemic: fresh produce delivery to wealthy Metro Manila residents.

Due to the COVID-19 pandemic, Metro Manila, Philippines has been under quarantine since March 2020. Quarantine restrictions made it difficult for most businesses to operate, but these also greatly increased the demand for delivery services because it is unsafe to go out in public places.

Since large delivery fees are unattractive to most customers, some businesses offer free delivery to certain areas, usually exclusive villages, if they have many customers in that area. These businesses often have a Viber chat group for each village, where residents send their orders.

This delivery model works well for businesses that sell essential, perishable goods such as vegetables, seafood, and meat because these are items that are bought frequently and by the same customers. Thus, delivering these goods to exclusive villages in Metro Manila is one of the possible business opportunities available during this pandemic.

Target Market

This report is targeted towards stakeholders who are interested in starting a fresh produce delivery business that caters to residents in exclusive villages in Metro Manila, Philippines.

Business Problem

If deliveries are done 6 days a week, how could one know which villages to deliver to each day to optimize logistics, and from which wet market supplier to get the fresh produce from?


1. Data

a. Data Needed

Based on the definition of the problem, here are the factors that will influence the decision:

  1. The locations of the exclusive villages in Metro Manila
  2. The names and ratings of wet markets nearest to each delivery group (since the customers are usually picky with the quality of the goods that they buy)

b. Data Sources

The following data sources will be used to extract or generate the required information:

  1. Nominatim API geocoding – for finding the longitude and latitude of each exclusive village in Metro Manila
  2. Foursquare API – for determining the wet markets around the areas of the exclusive villages and their ratings
  3. List of exclusive residential areas in Metro Manila from one of the top bread shops in the country that delivers to these villages – for identifying the exclusive villages which the target market will be delivering to
  • The residents of the villages listed here are the same ones the target market would want to attract.
  • This is public information because this list is available on the bread shop’s online order form.

2. Methodology

a. Gather data

One of the top bread shops in the country delivers bread weekly to the most exclusive villages in Metro Manila. The residential areas that they deliver to are included in their online order form, since customers will have to select the village where they live.

I took note of these 45 areas and consolidated each village’s longitude and latitude in a CSV file, which I got from using Nominatim API geocoding.

This is an example of how I got the longitude and latitude of each village:

#get latitude and longitude of North Greenhills
address = 'North Greenhills, Metro Manila'
geolocator = Nominatim(user_agent="gh_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

I then uploaded this CSV file into the notebook. The only data included in the CSV file were the names, longitude, and latitude of each village.

b. Import libraries and data

These are the following libraries which I used for this project:

i. Requests: for handling requests

ii. Pandas: for data analysis and dataframe-making

iii. Numpy: to handle data in a vectorized manner

iv. Json: to parse JSON files into a Python dictionary or list

v. _Jsonnormalize: to transform json files into a pandas dataframe library

vi. Matplotlib: for plotting points in the map

vii. Folium: for creating maps

viii. Nominatim: for geocoding the longitude and latitude of different areas needed

ix. KMeans: for creating a k-means clustering model to cluster the villages

import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import json #library to parse JSON files into a Python dictionary or list
import matplotlib.cm as cm # library for plotting points in the map
import matplotlib.colors as colors #library for plotting points in the map
from pandas.io.json import json_normalize # library for tranforming json files into a pandas dataframe library
!python3 -m pip install folium
import folium # library for creating maps
from geopy.geocoders import Nominatim # library for geocoding the longitude and latitude of different areas needed
from sklearn.cluster import KMeans # library for creating a k-means Clustering model
print('Libraries imported.')

After importing these libraries, I also defined my Foursquare API credentials because the names and ratings of the different wet markets near the villages would be requested from Foursquare API.

#Foursquare credentials (hidden cell)
# @hidden_cell
CLIENT_ID = '[insert Foursquare ID here]' 
CLIENT_SECRET = '[insert Foursquare Secret here]' 
ACCESS_TOKEN = '[insert Access Token here]'
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

I then uploaded the CSV file of the village location Data into the notebook as a pandas dataframe, named "df_villages".

c. Visualize village locations

I did some exploratory data analysis by visualizing the villages in a map using Folium. I generated a map around Metro Manila and plotted the villages as blue dots.

#get latitude and longitude of Metro Manila
address = 'Metro Manila'
geolocator = Nominatim(user_agent="mm_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
# generate map of villages in Metro Manila
village_map = folium.Map(location=[latitude, longitude], zoom_start=13)
# add the villages as blue circle markers
for Latitude, Longitude, label in zip(df_villages.Latitude, df_villages.Longitude, df_villages.Village):
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(village_map)
# display map
village_map

At first glance, I saw that most villages were located in the center of Metro Manila, while there were a few outliers in the north and south.

From here, it looked like there were 4 possible village clusters, but since there are 6 working days a week for delivery, I wanted to split all these villages into 6 clusters.

d. K-means clustering of villages

Because the goods being delivered are perishable and could spoil easily, only residents in villages near each other should be delivered to in one day.

The k-means clustering algorithm was used to group the unlabeled data based on their proximity to each other; in this case, the different villages.

#get k-means clusters of Metro Manila exclusive villages
#6 clusters because one cluster for each working day of the week

kclusters = 6
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_villages[["Latitude", "Longitude"]])

The "kclusters = 6" section means that 6 clusters were to be created out of this formula. After the dataset was divided into 6 groups, a new column was added to the dataframe for the cluster labels.

#add cluster labels to dataframe

df_villages.insert(0, 'Cluster Labels', kmeans.labels_)
#show first 5 rows of dataframe
df_villages.head()

To visualize the clusters, a new map was created called "cluster_map" where each cluster label was assigned a specific color and plotted on a map using folium.

#map the clusters
cluster_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# set colors for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]
centers = kmeans.cluster_centers_

# put markers
cluster_markers = []
for lat, lon, village, cluster in zip(df_villages['Latitude'], df_villages['Longitude'], df_villages['Village'], df_villages['Cluster Labels']):
    label = folium.Popup(str(village) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(cluster_map)

cluster_map


3. Results and Analysis

Now that it was clear which villages would be delivered to on the same day, a specific wet market had to be assigned to each village cluster as to minimize the amount of travel time the goods will have to go through during delivery.

It is important that the wet markets chosen were as close as possible to the clusters, especially if there are customers who order seafood or other produce that spoils easily.

Farmers Market in Cubao, Quezon City is one of the more high-end wet markets in Metro Manila where residents of these exclusive villages often get their fresh produce.

For this reason, I did not search for wet markets around the areas of Clusters 0 and 4 anymore, since Farmers Market is situated in between these clusters.

I recommend that this is the wet market supplier for villages in these clusters because 1) it already has a good reputation and 2) "suki" culture in the Philippines means that wet market vendors usually give lower prices to customers who consistently buy from their stalls.

Because the target market will be buying goods from the same stalls in Farmers Market for the residents of these two clusters, prices would be minimized as well.

I only searched then for the best wet markets to supply villages in Clusters 1, 2, 3, and 5.

Finding wet market supplier candidates for Cluster 1

Because Cluster 1 only includes 2 villages (Ayala Alabang and Ayala Southvale), I picked one to use as reference for finding wet markets near Cluster 1, and this will be the first village to be delivered to.

I got the latitude and longitude of Ayala Alabang and made a search query for "wet markets" near Ayala Alabang which were to be accessed through Foursquare API.

#get latitude and longitude of Ayala Alabang Village
address_1 = 'Ayala Alabang, Metro Manila'
geolocator_1 = Nominatim(user_agent="1_agent")
location_1 = geolocator_1.geocode(address_1)
latitude_1 = location_1.latitude
longitude_1 = location_1.longitude
#search wet markets near each selected address
search_query = 'wet market'
radius = 2000
print(search_query)
#define the url to find wet markets near Cluster 1
url_1 = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude_1, longitude_1,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)
#get the results of wet markets near Cluster 1
results_1 = requests.get(url_1).json()
# assign relevant part of JSON to venues
venues_1 = results_1['response']['venues']

# tranform venues into a dataframe
df_results_1 = json_normalize(venues_1)
df_results_1

Now that there was a list of wet markets near Cluster 1, I cleaned up the data so it would be easier to understand. I saved the cleaned up data as "df_markets_1"

# keep only columns that include venue name, and anything that is associated with location
filtered_columns_1 = ['name', 'categories'] + [col for col in df_results_1.columns if col.startswith('location.')] + ['id']
df_markets_1 = df_results_1.loc[:, filtered_columns_1]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']

    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
df_markets_1['categories'] = df_markets_1.apply(get_category_type, axis=1)

# clean column names by keeping only last term
df_markets_1.columns = [column.split('.')[-1] for column in df_markets_1.columns]

df_markets_1

I could see here that not all of the search results are actually wet markets, like "Filinvest Corporate City," which is tagged as a neighborhood. But since most of these were really wet markets anyway, I visualized these points in the map and spotted which markets were nearest to Ayala Alabang.

# add the wet markets to the map as yellow circle markers
for lat, lng, label in zip(df_markets_1.lat, df_markets_1.lng, df_markets_1.name):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='yellow',
        popup=label,
        fill = True,
        fill_color='yellow',
        fill_opacity=0.6
    ).add_to(cluster_map)

# display map
cluster_map

Based on the map, the nearest market to Ayala Alabang would be the Saturday Market on University Ave.

I checked the rating of Saturday Market on University Ave. to see if this market would meet the standards of the potential customers, the exclusive village residents.

#check the rating of Saturday Market on University Ave.

venue_id_SMUA = '4b9c7413f964a520d96936e3' # Saturday Market on University Ave.
url_SMUA = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id_SMUA, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)

result_SMUA = requests.get(url_SMUA).json()

try:
    print(result_SMUA['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

Because this venue has not been rated yet, I tried getting the ratings of the next nearby wet markets. However, all of them had no ratings either so I tried seeing if I can get a photo of the venue and gauge if it looks orderly and has good quality products.

url_SMUA_photo = 'https://api.foursquare.com/v2/photos/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id_SMUA, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)
result_SMUA_photo = requests.get(url_SMUA_photo).json()

result_SMUA_photo

No photo was available also on Foursquare API, so an external image search was done on Google. The photos showed that the market looked clean and it seemed to cater to the right customers (promotional materials were in English language, the tarpaulin sign was well-designed, and there was an ample amount of walking space).

Therefore, I would recommend the target market to get Saturday Market vendors as the suppliers for Cluster 1.

I repeated the same process for choosing the wet market suppliers for Clusters 2, 3, and 5.

Finding wet market supplier candidates for Cluster 2

Because Cluster 2 only includes 3 villages, I picked one that wasn’t the middle village to use as reference for finding wet markets near Cluster 2 (Serendra One), and this will be the first village to be delivered to.

After creating a new dataframe called "df_markets_2" for wet markets near Serendra One, I plotted these out on "cluster_map."

As seen on the map, there were many markets plotted near Cluster 2 because it is right next to a shopping mall called "Market! Market!" This might be one of the restrictions of using Foursquare API because even the non-food shops in the mall were included in the search results since their names had the word "Market" in them.

Upon checking the nearby points though, it was seen that there was also a Farmers Market within the mall Market! Market!, so this could be the potential supplier for Cluster 2.

I checked the rating of the Market! Market! Farmers Market, and saw that there was none. I decided to see if it might have any tips instead from Foursquare users that would give a hint of how it is.

## MMFM Tips
limit_MMFM = 15 # set limit to be greater than or equal to the total number of tips
tips_MMFM = 'https://api.foursquare.com/v2/venues/{}/tips?client_id={}&client_secret={}&oauth_token={}&v={}&limit={}'.format(venue_id_MMFM, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION, limit_MMFM)

results_MMFM = requests.get(tips_MMFM).json()
results_MMFM

There are also no tips available on Foursquare API. Since this wet market and the next nearest wet markets (which were already a bit far from the cluster) do not have ratings too, I based my decision to recommend Market! Market! Farmers Market as a supplier for Cluster 2 on photos of it that were found in Google Images.

Judging from the photos, it looked like a clean and reputable wet market, especially because it was located in an area of a popular shopping mall. Therefore, I will recommend Market! Market! Farmers Market as a supplier for Cluster 2 residents.

Next, I looked for the best wet market supplier for Cluster 3.

Finding wet market supplier candidates for Cluster 3

Because Cluster 3 only included 3 villages, I picked one that wasn’t the middle village to use as reference for finding wet markets near Cluster 3, and this will be the first village to be delivered to.

I specifically picked La Vista Village as the reference because the other two villages are a bit close to each other and are on one side of a river, so logistically, it is easier to deliver to La Vista first.

Again, the neighboring wet markets were placed in a dataframe "df_markets_3" and plotted on "clusters_map."

Based on the map, the nearest market to La Vista was the Viaga Public Market. Since this venue did not have a rating, tips, or photos on Foursquare, I checked the next nearest wet markets to see if they have ratings, tips, or photos.

Unfortunately, none of them do. I tried checking Google Images for photos of Viaga Public Market, Tumana Public Market, and Taboan Public Market (the next nearest wet markets to La Vista), but no images of these were available as well.

I suppose that there are not too many venues in Metro Manila (or even in the Philippines) with reviews, tips, and photos on Foursquare API just yet. Either that or not too many people review and leave tips on wet markets in this area.

In this case, I will leave it to the user of this report to explore the suggested wet markets themselves and gauge whether these are suitable for their customers.

Lastly, I repeated the process for Cluster 5.

Finding wet market supplier candidates for Cluster 5

Because Urdaneta Village is generally in the center of Cluster 5, I used it as a reference for searching for the list of wet markets nearest to the cluster. This is the map of wet markets near Cluster 5:

Looking at the map of Cluster 5, I noticed that many of the points plotted were not actually wet markets, as some were night markets or supermarkets. It might be best if the wet market supplier is close to at least one of the villages in the border of the cluster, such as Rockwell Makati which is on the top right corner of the cluster.

This is so that the wet market supplier could be near an "entry point" of the delivery route for that group. The nearest points near Rockwell Makati are "Marketplace by Rustans," which is a high end supermarket, and "The Grid Food Market," a high-end food court.

Since neither are real wet markets for fresh produce, I would recommend Poblacion Public Market as a wet market supplier for this cluster because it is the next nearest point to Rockwell Makati.

Again, since this venue had no ratings, tips, or photos available, I based the quality of this market on Google Images again.

I only saw one image of the market on Google and its interior was similar to that of Farmers Market Cubao, so I would assume that the produce in this wet market should be of similar quality too. Therefore, I would recommend Poblacion Public Market as the wet market supplier for Cluster 5.


4. Conclusion

The 45 exclusive villages in Metro Manila were clustered into 6 delivery groups (Clusters 0–5) according to their proximity to each other.

Using Foursquare API and some additional human knowledge (like knowing the reputation of Farmers Market Cubao with the intended customers), I was able to identify some recommended wet markets where the target market could buy supplies which will be sold to the residents of these exclusive villages. These wet markets are:

1. Cluster 0 (West Greenhills, Wack Wack Village, etc.) – Farmers Market Cubao

2. Cluster 1 (Ayala Alabang, Ayala Southvale) – Saturday Market on University Ave.

3. Cluster 2 (Serendra One, Serendra Two, and Mckinley Hill) – Market! Market! Farmers Market

4. Cluster 3 (La Vista, Loyola Grand Villas, and Ayala Heights) – Viaga Public Market, Tumana Public Market, or Taboan Public Market

5. Cluster 4 (Corinthian Gardens, Valle Verde 1, etc.) – Farmers Market Cubao

6. Cluster 5 (Dasmarinas Village, Forbes Park, etc.) – Poblacion Public Market

I have also concluded through this project that although this kind of data and technology are available for aiding in business decisions, a bulk of the analysis still relies on human experience and intuition.

For example, data from Foursquare could show us that certain venues tagged as wet markets would be the most practical choice as suppliers based on their proximity to the generated clusters, but this would not account for the savings a businessman could make by choosing the supplier that allows him to haggle for bulk orders.

At the end of the day, technology is a tool to make decision-making easier, but it can only be optimized by integrating it with real-world human knowledge.


_This report was written as part of the capstone project for my IBM Data Science Professional Certificate. You may view the full code on my notebook._

Any suggestions on how I could improve this? Leave a comment below or connect with me on LinkedIn! Feedback is much appreciated.


Related Articles