Cooking up a data science project using Kaggle Datasets and Kernels

On this episode of AI Adventures, I’ve asked Megan Risdal, Product Lead of Kaggle Datasets, to give us a tour of some of the latest features of Kaggle Kernels and Kaggle Datasets, and to showcase some of the ways to collaborate on Kaggle Kernels. You can catch the video of our adventure into Kaggle down below.

We’ll be working together to use the freshest ingredients (data), prepare them using different tools, and work together to come up with a delicious outcome — a published dataset and some cool analysis that we’ll share with the world.

Working with Datasets and Kernels

We’ll pull down public data from the City of Los Angeles’s open data portal, containing the Environmental Health Violations from restaurants in Los Angeles. Then we’ll create a new dataset using that data, and collaborate on a kernel together, before releasing it to the world.

In this episode, you’ll learn:

  • How to create a new, private, Kaggle Dataset from raw data
  • How to share your dataset to your collaborators before turning it public
  • How to use collaborate on Kaggle Kernels by adding collaborators to a private Kernel
  • How to properly annotate and configure your Kaggle dataset so that others can easily discover and contribute to it

Data is most powerful when it shared alongside reproducible code and a community of experts and learners. By having data and code on a shared, consistent platform, you get the best of collaboration with the best of productive and high performing notebooks.

Resources

Now that you’ve seen how to make a dataset and kernel, it’s time for you to make your own! Head over to Kaggle to get started making a new dataset today.

The dataset: https://www.kaggle.com/meganrisdal/exploring-la-county-health-code-violations-by-date

Our kernel: https://www.kaggle.com/meganrisdal/exploring-la-county-health-code-violations-by-date

Megan Risdal is the Product Lead on Kaggle Datasets, which means she work with engineers, designers, and the Kaggle community of 1.7 million data scientists to build tools for finding, sharing, and analyzing data. She wants Kaggle to be the best place for people to share and collaborate on their data science projects.


Thanks for reading this episode of Cloud AI Adventures. If you’re enjoying the series, please let me know by clapping for the article. If you want more machine learning action, be sure to follow me on Medium or subscribe to the YouTube channel to catch future episodes as they come out. More episodes coming at you soon!