10 Points to Make it Big in the Data Industry

People want to make careers here. But they are often deafened by the noise that surrounds them.

Kunj Mehta
Towards Data Science

--

Photo by Joshua Sortino on Unsplash

Suppose you are someone who just got awed by the flashy terms of artificial intelligence, machine learning and data science and have decided to either get a degree in one of these fields or pivot your career and enter into the data industry. You get in on the hype, jump on the bandwagon, enroll in Andrew Ng’s courses on Coursera, some more courses on Udacity, buy some detailed books and scour through them, start Kaggling, implement some projects and publish research papers. You start feeling good for what you have accomplished. But when you go and apply for a job or an internship, you don’t get it and you wonder why.

Well, the thing is all of what you did above is good for getting to know the basics and being exposed to what the industry has to offer. However, the data industry is inherently interdisciplinary, broad and constantly evolving, and the ecosystem around it very noisy. I mean, just search any of the buzz words and you will get thousands of articles — all of them claiming to be different from the rest; this one does not — blasting opinions, advice, tutorials and what not. I recently read an article where the author was bemoaning the lack of helpful online resources for experts in the field. Almost everything is targeted towards newcomers. Imagine that!

What newcomers actually need is to develop the ability to navigate around this huge industry, decide for themselves what they want within it and how to get it.

In the data industry, be an expert vertically, not an ‘expert’ horizontally

Honing the Thought Process

To make it big in data and to filter the noise, it is necessary for every individual to know at every stage of their never-ending learning process:

  1. How the data industry works
  2. What technology it uses
  3. How it's moving forward

and align that with what the individual:

  1. Imagines their role within the industry to be
  2. Has planned to get to their desired position
  3. Knows are the resources available at their disposal to get there — all of which is a cyclical process.
Thought Process of a Data Industry Veteran (Image by Author)

That being said, here is a deep dive into the available types of interconnected knowledge resources.

The 10 Resources

  1. MOOCs / Certifications: Online courses and certifications are a great way to get started — much like a crash course. One advantage of the noise here is the immense amount of options available. Given that one starts with a highly recommended course, MOOCs get one familiarized with the concepts, terminologies, differences and layout of the data industry and its various sub-industries. To top it off, there is a certificate — many universities have started recognizing these for credits — to show for it!
  2. Reference Books: Reference books on AI, deep learning and machine learning can be good for a deep dive into any selected topic. This type of resource will cover in-depth theory and give a solid understanding of the underlying concepts. Plus, an introduction to the underlying math as well!
  3. Mathematics: Maths, in itself is not necessary. However, to have a full rounded understanding of any concept, algorithm or technique in the data industry, it is a must that one know the underlying mathematics. This will allow them to understand the concept at a grassroots level as well as understand and tweak the workings of the concept at the programmatic level (think: implementation libraries)
  4. Technology: Every day, better and more optimized libraries and frameworks in many different languages are built on top of the concepts in the data industry, for developers. For someone looking to a career developing ML models or data pipelines, it is imperative they keep up-to-date with the newest offerings while also understanding the underlying concepts and mathematics so that they can use them to the fullest extent.
  5. Research Papers: On the contrary, for someone looking forward to a career as a researcher in the industry, keeping up with the newest offerings from the theoretical perspective is imperative. Which is why reading research papers and publishing their own work goes a long way. Here too, the mathematical understanding helps.
  6. Kaggle: After getting into the theory and the concepts, the next thing is leveraging the theory and the libraries learnt to work on some actual data. Kaggle is the best platform to work on real world data from real world use cases — from data analysis to hyperparameter tuning. The only disadvantage — also its biggest downside — is that model deployment and building an application around the model is not possible on the platform. Kaggle can prove best for future data analysts and machine learning engineers.
  7. Projects: To get over Kaggle’s disadvantage, stand-alone projects need to be built. For the same, data from Kaggle can be leveraged, but in any project not being done on Kaggle, make it a point to wrap an application around the model and deploy it. This way one can get some experience in end-to-end development of applications in the data industry.
  8. Cloud: Speaking of end-to-end development, much of the development that happens and the services that are offered now leverage the cloud. Hence, it is imperative that someone who wants to make a career in deploying data applications (think MLOps) is familiar with the various cloud technologies in the space.
  9. Internships: Lastly, all of the implementation and theoretical knowledge can be leveraged in internships or on jobs to ideate, plan, architect and build something from scratch as per client requirements. The difference between this and projects is that in projects, the problem definition is already given and there is no need to think from the business perspective. In essence, internships and jobs provide an opportunity to expand the brainstorming capacity of a person as the requirements are seldom straight-fit and straightforward. This can be an ideal experience for future data scientists
  10. Domain Knowledge: Remember, the data industry is like running a marathon. Marrying any type of business requirement to data is tough and gaining enough knowledge to always do that for a business domain requires time and association with the business domain for a number of years. Once achieved, one will be on top of the data pyramid for that domain!

This article is based on my journey and experience in the industry so far. I would consider it humbling if someone improves upon the post or points out anything I have missed out or have got wrong.

Lastly, I would love to connect on Linkedin!

--

--