The world’s leading publication for data science, AI, and ML professionals.

A Novice Journey in Kaggle: A Story of Success and Struggles

See the struggles and challenging experiences in Kaggle usually faced by a non-technical beginner and how to overcome them.

Photo by Patrick Hodskins on Unsplash
Photo by Patrick Hodskins on Unsplash

Introduction

Not all stories are about the success journeys in Kaggle. Here, I want to share my own experience, success but mostly struggles in Kaggle with the hopes that beginners like me would know the struggles and how to face it with grit and perseverance. Just like what I believe in,

"If you want to run fast, you can by yourself. But if you want to run far, you need to go with other people who can support you."

Photo by Kristopher Roller on Unsplash
Photo by Kristopher Roller on Unsplash

I joined the M5 Forecasting Accuracy with basic knowledge of algebra, statistics and programming. It was intimidating at first knowing how good the other competitors were working in these Kaggle Competitions. But, I took the leap of faith and pushed on to complete the project. My intention was to learn the end-to-end Data Science project and what’s a better way to do it than joining the Kaggle.

I. Mastering the Kaggle Features

When you work on competition, you’ll be forced to navigate and master the features of Kaggle. Here are some of the things I learned:

  • Creating a notebook – Kaggle gives a more flexible platform where I can code and write about my work without having any HTML knowledge. It’s as easy as that.
  • Notebook Versions – All your work outputs get saved every time you make some changes and you can review all the changes you’ve made. Plus, you have a total history of your work so if you want to retrieve something from the past, you can just click the version history.
  • External data can be uploaded and added easily.
  • The datasets and visualizations are downloadable
  • Large RAM capacity – Most of the data in the competitions have large file sizes; running and processing them will require a huge RAM size. Thankfully, Kaggle has up to 16 RAM so big datasets can be run and processed with ease.
  • Discussion Forums – I got some of the best inputs about the data, analysis, and models in the forums. A lot of people are sharing their insights and ideas and are willing to give a helping hand to other teams.

Since then, I became comfortable working at Kaggle and other platforms such as Google Collab.


II. Finding the Right Direction is Challenging

Photo by Jordan Rowland on Unsplash
Photo by Jordan Rowland on Unsplash

What I meant by the right direction was having a clear picture of the case outline and my expected output. I had all my outputs in my head, but doing the work to get the output was the challenge for me. I knew that this was going to be like a thesis, but then I realized, I was solving a different problem here. A unique problem requires a unique solution. One solution doesn’t fit all.

Thankfully, I read Randy Lao’s data science pipeline in Medium. This pipeline was what worked for me and what I understood so I followed it. Here’s an excerpt from his Medium post (https://medium.com/breathe-publication/life-of-data-data-science-is-osemn-f453e1febc10)

  1. Obtaining the data
  2. Scrubbing or cleaning the data
  3. Exploring the data to identify significant behavior, trends, and insights.
  4. Modeling the data for prediction.
  5. Interpreting the data and results.

III. Don’t Rely On Other People’s Works (Invent Yourself)

I started to do the project on my own. At first, it was fine, but I was still blank on some key parts like EDA and Machine Learning. I was feeling that I was missing something. Honestly, I looked into other people’s works to get some insights and inspirations. There are two sides of the coin here:

The Good: Since Kaggle supports collaboration, it’s acceptable to check on other people’s works, make some feedback, and drew some inspiration. You can get some interesting insights and feedback which can help on your project.

The Bad: Take a grain of salt of what you’re seeing in Kaggle. You may box yourself into what they’ve done instead of inventing your work to solve the problem.

My advice here is to check the forum, get some advice from the other professionals, and do some research. Looking at other people’s work wasn’t helpful for me; it just hindered my creativity.


IV. Every Problem is Unique

Just because this model or EDA worked in other projects doesn’t mean it will work here (or it’s appropriate to use the same model again). I learned it in M5 Forecasting Accuracy the hard way. Initially, I did my EDA based on my stock knowledge in statistics: correlation, statistical summary, and basic plots.

One truth I realized is that every problem is unique. I shouldn’t treat this case as I did in my past works. At some point, I couldn’t think of anything else for EDA and became paralyzed.

To overcome this, I took a short break and wrote in my journal about the idea surrounding the case (in my case sales for M5 Forecasting Accuracy). Then, I formulated all I can think of about retail sales and all questions I’ve thought of related to it. Eventually, I was able to organize my EDA and find a lot of interesting insights into the sales data – better than what I did before.


V. Information Overload

While I was doing the M5 Forecasting, I bumped into these models: ARIMA, Sporadic Demand, Gradient Boosting, LightGBM, etc. Understanding the math behind these models proved to be too difficult – I resorted to reading the application of these models in the finance context. It enabled me to build a good understanding of these models.

Photo by Glenn Carstens-Peters on Unsplash
Photo by Glenn Carstens-Peters on Unsplash

It wasn’t easy building machine learning with sophisticated models. My learning experience on this part was not to overload myself with too much information. I had to focus on the models and solutions I can understand and build and put it in a context I could understand further (in my case, finance).


VI. Having a mentor is a BIG HELP.

I was lucky to have a great mentor and a community who gave out support and advice throughout my journey.

I needed guidance and advice on things I didn’t know and I needed reviews and feedback on my work so I would know where I need to improve. All of these, I received from my mentors. Honestly, the challenge was tough and all these obstacles I faced may have driven me to give up, but my mentors pushed me to go and finish the challenge – guiding and supporting throughout the competition.

Working together with a professional or practitioner allowed me to go faster than if I were to go alone.

My biggest realization: if you want to move fast, you can go by yourself, but if you want to reach something far, you should work with other people.

Photo by NeONBRAND on Unsplash
Photo by NeONBRAND on Unsplash

Final Note

Taking the Kaggle Competition was a tough journey that demands patience, grit, and hard work. I learned my strengths and weaknesses, points where I could further improve on, and managed to complete the Kaggle project from start to finish. Even if I didn’t score well in the finals, I achieved the objectives with Kaggle and I couldn’t have done it without the support from my mentors and the community. I thank them for it and I am gonna use this experience to go for more projects and further myself in the field.

Thanks for reading the end of my blog post. I hope that beginners out there learn that taking a data science project is a tough road. You have to have the right resources and support to survive. Learn from my journey and good luck!

In case you want to see my work in M5 Forecasting Accuracy, here is the link: M5 Forecasting – Accuracy (R) | Kaggle. Feel free to give out your comments and feedback! 🙂


Related Articles