My passion for Data Science started about 3 years ago. After a long and exciting learning journey, I was able to land my first job as a data scientist. I have written several articles to share my learning experience and the process that leads to me getting that dream job.
In this article, I will write about my experience after starting my first job as a data scientist. It has been thrilling and fun so far and I’m glad I made a career change into data science.
However, I have also faced some challenges and made some mistakes. It is ok to make a mistake but if you make the same mistake a few times, it might turn into a problem. Thus, we should learn from our mistakes. This is how we can get better at what we do.
As I just mentioned, I did some mistakes and I have learned from them. I would like to share 2 of them with you so that you can be prepared for such mistakes at your first job or you can avoid them which is even better.
Not spending enough time on data preparation
We do retail analytics. I was assigned a task to create a model to see the relationships between certain features and sales amount. The details about the model are not important here. The focus is how I approach the problem and the steps I followed.
After I obtained the raw data, I did some data cleaning and preprocessing. I also created an additional feature that I thought was important for the model. Then, I thought I was ready to work on the model.
I spent quite a long time on model creation. I did several iterations of creating a model, evaluating it, and trying to improve with modifications. However, the end result was not acceptable. I was not able to create a model that works as expected.
Then, I went back to the exploratory data analysis phase. I spent more time on the data to understand it. I added a couple of more features. What I achieved by going back to the data exploration phase was to understand that I built a model with wrong features.
My mistake was not to spend enough time on the data. I actually knew that the most important component in model building is to understand the data well. However, I did the opposite. It might have been due to my extraordinary excitement.
It was a good lesson learned from experience. I will never forget it. The next time I’m assigned a similar text, I will definitely organize my plan and effort accordingly.
Duplicate rows
This is a more specific mistake than the previous one and has a simpler solution: Just check for duplicates.
When I was learning data science, I was usually practicing with one dataset. Thus, I did not have to use join, merge, or concatenation operations a lot. However, in real life tasks, the data is spread out among many tables or data frames and you need to do many operations to combine the required data together.
Our tech stack is quite rich. I often use Python, SQL, R, and PySpark in my daily tasks. The functions and methods for combining data with these tools are similar but each has its own syntax.
I think the biggest risk when combining data is to create duplicate data points (or rows). If you apply the function properly, the resulting data frame or table will not have any duplicates. However, this process is prone to making mistakes. Thus, you need to be extra careful and place duplicate checks in your scripts.
I had a task that includes collecting data from many different resources. I forgot to make a duplicate check. I realized the problem when I did an analysis on the resulting data frame. The quantities were off by far.
Even if you are sure about your code, there is no harm in being extra cautious. Adding a duplicate might save you from doing repetitive work.
Duplicate data points may cause serious issues especially if cumulative quantities are important for your case. I suggest checking your data frame or table for duplicate entries several times. It is better to make an extra control than having to do the entire operation from the beginning.
Conclusion
We all make mistakes. The ones I have mentioned in this article are kind of rookie mistakes. But, it does not mean that only rookies or juniors make mistakes. It can happen any time in your career. The consequences might be different though 🙂
The important thing is spending enough time to understand the root cause of a mistake so that we learn from it. In this way, you will learn better than learning from a tutorial or by reading.
Thank you for reading. Please let me know if you have any feedback.