Kaggle For Beginners : Getting Started

Munira Omar
Towards Data Science
6 min readJul 1, 2019

--

The aim of this article is to help you to get started on Kaggle and join the world’s largest machine learning and data science community.

So what is Kaggle ?

Kaggle as they say is “Your Home for Data Science”. It is the best place to learn and expand your skills through hands-on data science and machine learning projects.

So what are you waiting for ? Head over to Kaggle and register with just one click 🏃.

Programming Languages on Kaggle

Both Python and R are popular on Kaggle and you can use any of them for kaggle competitions.

Kaggle Services

1. Machine Learning Competitions

This is what kaggle is famous for. Find the problems you find interesting and compete to build the best algorithm.

Common Types of Kaggle Competitions

You can search for competitions on kaggle by category and I will show you how to get a list of the “Getting Started” competitions for newbies, the ones that are always available and have no deadline 😃.

  • Featured competitions are the types of competitions that Kaggle is probably best known for. They are usually sponsored by companies, organizations, or even governments. They offer prize pools going as high as a million dollars.
  • Research competitions feature problems which are more experimental than featured competition problems. They do not usually offer prizes or points due to their experimental nature.
  • Getting Started competitions are structured like featured competitions, but they have no prize pools. They feature easier datasets, plenty of tutorials, and have no deadline — just what a newcomer needs to get started! 😃. One example of Getting Started competitions is:

Titanic: Machine Learning from Disaster — Predict survival on the Titanic

  • Playground competitions are a “for fun” type of Kaggle competition that is one step above Getting Started in difficulty. Prizes range from kudos to small cash prizes. One example of Playground competitions is:

Dogs versus Cats — Create an algorithm to distinguish dogs from cats

how to find getting started competitions

Kaggle Competition Environment

kaggle competition environment

Here’s a quick run through of the tabs

  • Overview: a brief description of the problem, the evaluation metric, the prizes, and the timeline.
  • Data: is where you can download and learn more about the data used in the competition. You’ll use a training set to train models and a test set for which you’ll need to make your predictions. In most cases, the data or a subset of it is also accessible in Kernels.
  • Kernels: Previous work done by you and other competitors. Reviewing popular kernels can spark more ideas.You can read through other scripts and notebooks and then copy the code (known as “Fork”) to edit and run.
  • Discussion: another helpful resource where you can find conversations both from the competition hosts and from other competitors. A great place to ask questions and learn from the answers of others.
  • Leaderboard: In every competition there are public and private leaderboards. Be warned, the leaderboards are VERY different. The public leaderboard provides publicly visible submission scores based on a representative sample of the submitted data. This leaderboard is visible throughout the competition. Although it gives you a good idea, it does not always reflect who will win and lose.The private leaderboard is what really matters. It tracks model performance on data unseen by participants. The private leaderboard thus has final say on whose models are best, and hence, who the winners and losers of the Competition will be. Not calculated until the end of the competition.
  • Rules: contains the rules that govern your participation in the sponsor’s competition. It’s extremely important to read the rules before you start.
  • Team: you can perform a number of different team-related actions on this tab.
  • My Submissions: view your previous submissions and select the final ones to be used for the competition.
  • Submitting Predictions : to submit a new prediction use the Submit Prediction button. This will open a modal that will allow you to upload your submission file.

2. Datasets

Kaggle datasets are the best place to discover, explore and analyze open data. You can find many different interesting datasets of types and sizes you can download for free and sharpen your skills.

how to choose a dataset

3. Kaggle Learn courses

Free micro-courses taught in Jupyter Notebooks to help you improve your current skills.

4. Discussion:

A place to ask questions and get advice from the thousands of data scientists in the Kaggle community.

There are six general site Discussion Forums:

Types of Kaggle discussions

5.Kernels

Kaggle Kernels are essentially Jupyter notebooks in the browser. These kernels are entirely free to run (you can even add a GPU). This means you can save yourself the hassle of setting up a local environment. They also allow you to share code and analysis in Python or R. They can also be used to compete in Kaggle competitions and complete the kaggle learning courses. Exploring and reading other Kagglers’ code is a great way to both learn new techniques and stay involved in the community.

Choosing a dataset and spinning up a new kernel with a few clicks

Click the Kernel tab of the competition then click new kernel

Kaggle Kernel Environment

Here is how to turn on the GPU , change the kernel language , make your kernel public , add collaborators, and install packages which are not preinstalled as kaggle kernels come preloaded with the most popular python and R packages 😃.

Remember kaggle’s run time limitly is currently 9 hours

Adding a dataset to your kernel

You can load additional datasets from your computer , from kaggle competitions, or from other Kagglers’ public kernels to your kernel.

Kernel Versions

When you commit and run a kernel, it runs all your code and saves it as a stable version you can refer to later. However, you code is always saved as you go 😃.

Committing a kernel

Forking Kaggle Kernels

You can copy and build on existing kernels from other users 😃.

Don’t forget to vote for kernels you liked
Click the three dots to learn about what else Kaggle has to offer you 😃

You made it all the way here?! Thanks for reading. 😆

Congratulations!
You are now a kaggler 👏🎉.

If you have any questions or comments feel free to leave your feedback below or you can always reach me on Twitter. Till then, see you in the next post! ✋.

Reference

--

--