The world’s leading publication for data science, AI, and ML professionals.

7 Things You Need To Know If You Want to Become a Data Engineer ☄

Strategies to help you to land your first Data Engineer job

Magic sunrise in the Bromo mountain [Digital Image] by Irham Bahtiar https://unsplash.com/photos/Z1A2U0vo8uY
Magic sunrise in the Bromo mountain [Digital Image] by Irham Bahtiar https://unsplash.com/photos/Z1A2U0vo8uY

There is a ton of knowledge out there on how to be a 10x data engineer, but what’s worth taking for you to land your first data engineer job? There are so many tooling and concepts you need to learn: it’s scary 😱 . In this article, I will give you some tips on how to hack your learning journey and provide you with extra resources 📌 . This is not a shortcut, but it will help you to prioritize things and not get swamped.

If the bar is too high, try another data role. 👷

Depending on your experience, getting a first job as Data Engineer may be challenging. If you are new to Software Engineering, then the technical entry barrier will be pretty high. The range of technical skillset required for a Data Engineer is much broader than for a Data Analyst.

Therefore, Data Analyst is a good start as it will require less hard skills (yet more domain knowledge sometimes). Knowing SQL and mastering a dashboarding tool (Tableau/PowerBi/Metabase) should take you already to a good position.

On top of that, a Data Analyst will often work with Data Engineers. Therefore, you have an opportunity to understand what they do and when you feel ready, you can apply for an internal move. It’s always easier like this rather than going through the main door. There are many current data engineers I know that followed that path.

📌 Check out this article if you want to hear a story about such a move and some insights about becoming a data analyst here.

Don’t start with Streaming & Machine Learning 🌊

Depending on the company’s data maturity, some of these concepts are not must-haves. Please don’t get fooled by how many buzz words you would find in their job offer. Wired mentioned last year that only 9 percent of firms employ tools like machine learning. While AI adoption is growing fast, there’s still a ton of companies that are struggling with basic Data Engineering.

As a junior, there’s a baseline in terms of knowledge that would cover a lot of use cases and get you pretty far. If you learn how to write Python and SQL using the classic pipeline framework (Pandas, Spark, dbt), you will cover most analytical batch use cases. Get experience with an analytical database such as BigQuery (or Redshift/Snowflake, but there aren’t any free tiers for playground) and pick an orchestration tool. Airflow on that side is an industry-standard at the moment.

📌 Have a look at these data engineer roadmaps, this should give you a proper learning path :

Did you notice that these roadmaps are starting with Software Engineering basics ?

Software Engineering basics matter. A lot. 💾

We often forget that data engineers, at the very essence, are just another type of software engineer.

I believe the reason for that oversight it’s because the job has evolved, and a lot of Data Engineers today come from a non-software engineering background (BI developer, Data Analyst). However, if you master these basics, you will shine among Software engineer peers, and you will get an edge to understand how to deliver a production-ready project.

These includes (not exhaustive list) :

  • CICD concepts & tooling (Github Actions / Jenkins / Circle CI)
  • Git (Github / Gitlab)
  • Testing (unit/integration/system testing)
  • Infrastructure as a code (Terraform, Pulumi)
  • Devops (k8s, Docker, etc)

📌 Here’s an excellent article to understand how DevOps is related to Data Engineer and what’s in for you to take.

Learn to build things end to end with a side project 🗺

Take a pen and design how you would take data from point A, transform, consume it (with a dashboarding tool) and make decisions based on this. Try to answer these questions :

  • Where is my data coming from? How do I get it? API? Database? Scraping?
  • How do I orchestrate the pipeline?
  • How will I consume it? Which dashboarding tool can I use? How does the connection work? What are the limitations/costs behind this? How do I model my data?
  • What happens if I want to change a feature in the data pipeline? How do I manage access? How do I manage versioning?

Having a good picture of the high-level design and understanding how each component will talk to each other is an excellent start to learn how you would put your skillset into actionable values.

📌 Check this article to get inspiration about side project ideas for data engineers.

Focus on one cloud provider, and learn the similarity with the others ☁️

All the cloud providers have a lot of similarities in terms of tooling. The fancy names are just there to get you lost. Focus on one provider and look up online what’s the equivalent of the service you are using on another cloud provider. While there may sometimes be significant features difference, you will grasp how that tool fits in your end-to-end pipeline without having work experience.

AWS dominates the market with more than 32 % of the market share according to Statista. So it’s definitely a good bet to be able to land your first job.

📌 Google keeps an up-to-date comparison table with their competitors here.

Target young companies without too much legacy and reasonable data maturity 🏢

Based on the previous point, you probably want to focus on a company that’s cloud-native. There are many reasons as a junior to do so.

First, you probably have spent already quite some time focusing on cloud services. If the company has a lot of old frameworks or on-premise clusters, this is additional knowledge you need to grasp.

Next to that, you are ensuring that the time you invest into learning data modern stack will last at least a couple of years before becoming obsolete.

📌 Crunchbase is a great resource to quickly see how big / how old is a company. Checking their engineering blog and GitHub organization will also give you another feeling about their maturity.

Soft skills matter as much as hard skills, or rather, even more. 👨 ‍🏫

"Engeering is easy – it’s the people problems that are hard." Google VP Bill Coughran

Data engineers are NOT technical gurus living in a basement. They are surrounded by many stakeholders: business, software engineers, data scientists, data analysts, etc. Therefore, Teamwork and communication are key in data in order to break all these silos.

Good soft skills (or rather human skills because there isn’t any softness in these) will give you powerful leverage once you are in the industry.

📌 Here are some articles worth reading to get practical tips about soft skills in a data role:

Conclusion 🚀

Don’t focus on being the next technical superstar. Step back, see the bigger picture, and focus on what you need to strengthen to get your first role in the data world depending on the market trends and your experience.

Don’t give up on the first failure. Keep going, and good luck! ❤️

Mehdi OUAZZA aka mehdio 🧢

Thanks for reading! 🤗 🙌 If you enjoyed this, follow me on 🎥 Youtube, ️ Medium, or 🔗 LinkedIn for more data/code content!

Support my writing ✍️ by joining Medium through this link


Related Articles