The world’s leading publication for data science, AI, and ML professionals.

5 Key Things I Wish I Knew Before Starting My First Data Science Job

These five key things are invaluabe to know in your first Data Science job

Photo by Zan on Unsplash
Photo by Zan on Unsplash

As of today, I have completed my first month working as a Data Scientist. It has been a steep learning curve to say the least, but also one of the most rewarding and exciting experiences!

However, throughout these introductory weeks I had to quickly become competant in some essential technologies which I had no or very limited understanding prior. In this article I hope to shed some light on tools that will help new Data Scientists in their first job.

Object-Oriented Programming

Chances are that you will be working with Python, which is inherently an Object-Oriented Programming (OOP) language. This Coding paradigm is very useful in real life Data Science problems where you typically have very large datasets and notebooks with thousands of lines of code. Using OOP helps to condense the script and provides a cleaner structure to your program compared to the typical procedural programming style. In fact, most code in industry is written using this ideology and all the common libraries and packages are as well.

I had limited experience with OOP and I sure wish I practised coding in this paradigm a great deal more before starting my job. My advice to any budding Data Scientist is to learn about OOP, stuff like classes, self, inheretence e.g. Then try coding some basic Machine Learning algorithm using this paradigm. There are many tutorials online to get you started with different people explaining OOP in various ways, so there will be an explanation out there for you!

Git and GitHub

"Have you head of GitHub?"

"Yeah!"

"Do you know how to use it?"

"Not really"

This is how my first conversation went about Git and GitHub. As far as I know, every company uses GitHub for something or another and it is an essential tool and skill for any tech professional.

I have used GitHub before but only as a portfolio to showcase my work. However, Git and GitHub is so much more than that and has very useful functionality that I am still becoming accustomed to.

For those of you who may not know, Git and GitHub is a version control system that eases the structuring and managing of coding projects. There are other version control systems but GitHub is by far the market leader.

Similarly to OOP, learning Git and GitHub is fairly straight forward with a plethora of online resources for you to explore. It is also fairly simple to learn but requires practise, like anything, to become proficient. I recommend studying the fundamentals such as push, pull, merge, branching etc.

Command Line / Terminal

Even though we are not Software Engineers or Developers, Data Scientists do use the the command line semi-frequently for certain tasks. A good percentage of Data Scientists do not come from computer science background, therefore they probably have limited experience using the terminal or command line.

Again, like with Git and OOP, a simple tutorial can cover most of the functionality a Data Scientist would use. So learn stuff like compiling, installing packages, changing directories etc. These are all very trivial commands but I think any tech professional should know this and be comfortable using it.

Modelling Is Not Everything

Implementing the newest Machine Learning algorithms is typically the most exciting part of a project and is why most people get into Data Science. I remember I used to spend hours fine tuning my algorithm to squeeze out the most performance I can for my model.

However, in industry this is not always the go-to approach. The most common solution to why your model is underperforming is the quality, type and size of data that you are training on. You might have heard of the new ‘Data-Centric’ emergence in the data community, which is focussed in improving the data to improve the model. This is now becoming very abundant in industry.

The idea is focus on your data in terms of it’s origins, quality and also improving the feature engineering process to generate a better model. Therefore, make sure you understand and even focus your learning into the pre-processing of data in your projects.

The mistake I made was learning all about the ML algorithms, which is good to know, but neglected putting significant time into the pre-processing steps. Therefore, make sure you focus your learning equally on the data side as well as the ML modelling side so that you are proficient in both.

Know And Learn Your Industry

You may be the most technically gifted Data Scientist in the world but if you have no idea about your business area, your work will not amount to much. The job of a Data Scientist is to answer business questions and provide invaluable insight. This means you must know how your industry works and keep up to date with it.

This is hard to learn before your job as you may not know what business area you will be working in. However, when you know your industry, I would recommend spending around half an hour a day reading up its recent news and developments. Even just a simple Wikipedia dive will benefit you massively and I have noticed that it has really helped me in my projects. It is also important to mention that you will learn a good amount through simple osmosis by working there everyday. However, there is no harm in accelerating this process.

However, to be even more general, just listen and read the news as this will improve your knowledge about everything and make you a more rounded professional!

Conclusion

Every Data Science job is different and tools vary between companies and industries. I believe the five topics listed above are essential and will benefit you no matter where you will end up working.

Another Thing!

I have a free newsletter, Dishing the Data, where I share weekly tips for becoming a better Data Scientist. There is no "fluff" or "clickbait," just pure actionable insights from a practicing Data Scientist.

Dishing The Data | Egor Howell | Substack

Connect With Me!


Related Articles