Scraping a list of Singapore Michelin-star restaurants

Hannah Yan Han
Towards Data Science
2 min readSep 13, 2017

--

Today I want to practice web scraping by extracting the name, address, star-rating, cuisine type of Michelin-star restaurants in Singapore, to understand its geolocations and cuisine type distributions.

There isn’t one website that neatly contain both, so I identified a news site containing address and the official Michelin site containing cuisine types. I used selectorgadget and rvest to collect the data, did some cleaning, as a handful of addresses are encoded with different CSS ids in the frame, and cuisine type include both English and Chinese words.

Then I used BatchGeo to find the geocoding through addresses, which is rather accurate and also conveniently provides a map, except it isn’t obvious whether it provides latitude/longitude data in output.

click for interactive version

It appears they are mostly centrally-located around 3 clusters:

  • from Tanjong Pagar all the way to City Hall (city center)
  • in Orchard/Botanic Gardens area (shopping/touristy area)
  • 3 in Sentosa, one each from 1-star (Osia)/2-star (L’Atelier de Joel Robuchon)/3-star(Joel Robuchon) (touristy/affluent residential area)
Click for interactive version

As a next step we can fuzzymatch restaurant name (different spelling) from two sources to obtain cuisine types, group them by geographical district and identify distance to different residential areas.

And looking at cuisine types, contemporary and innovative food is quite a big deal. French and Cantonese each topped Western and local food list. And there were two hawker/street food made the 1-star list this year.

cuisine types by star rating

This is #day57 of my #100dayprojects on data science and visual storytelling. The scrapped data can be found here. Thanks for reading. Suggestions of new topics and feedbacks are always welcomed.

--

--