The world’s leading publication for data science, AI, and ML professionals.

A day in the life of a data engineer

Breaking down the main activities of a data engineer in 2021

Coding [Digital Image] https://unsplash.com/@jefflssantos | Spongebob Cleaning [Digital Image] https://imgflip.com/meme/81959717/Spongebob-Cleaning
Coding [Digital Image] https://unsplash.com/@jefflssantos | Spongebob Cleaning [Digital Image] https://imgflip.com/meme/81959717/Spongebob-Cleaning

Data engineering’s role in 2021 has been scaling beyond the scope for a better or for worse. Therefore, multiple definitions of the role are popping up. Does the data engineer do more analytics (aka new role definition, analytics engineer), data pipelines, handling more infrastructure (DevOps), or machine learning engineering? Basically, it’s getting a bit blurry on what an average data engineer will spend his time. However, these categories fall into technical activities, and we often forget that it represents just a chunk of the time spent. In this article, we will break down into different activities what a typical day in the life of a data engineer looks like.

Coding – 30 to 40%

Let’s define what do we actually mean by Coding:

  • Development of a data pipeline/API/microservice.
  • Setup/Maintenance infrastructure
  • Fixing bugs, improving code base, documentation

Depending on the project phase, you will work on different coding aspects: new features, debugging, maintenance, and stability.

It’s also worth remembering that coding is not only about "more" (adding lines of code) but also about "less" – removing code. A good example is to look at the top committers of Apache Spark here. We can see that most of them actually have a negative ratio; they removed more lines than adding them!

So no, coding is not the main activity! Multiple studies tend to show that a software engineer will spend 30 to 40% coding daily. That number is totally correlated with my experience.

Project and time management – 20 to 30%

This is a challenging part as it’s fairly easy to be unproductive with these. Measuring project/time management efficiency is hard, and you are often not the only variable in the equation.

These activities fall mainly into 2 types :

  • Writing: tickets grooming, roadmap, etc.
  • Meetings: standup, sprint planning, etc.

Writing is (almost?) a pre-requisite to every meeting. A proper pre-read or agenda speeds up the discussion and gets everyone on the same page.

Data Evangelism – 10 to 15 %

Data engineers are most of the time sitting between the hammer (data consumers aka data analyst/data scientist/business/microservice) and the anvil (data producers). If something goes wrong for the data consumer, the first one to blame will be the data engineers.

Angry Lady Cat [Digital Image] https://imgflip.com/meme/195076787/Angry-lady-cat
Angry Lady Cat [Digital Image] https://imgflip.com/meme/195076787/Angry-lady-cat

In that situation, you are the bad cop. You need to play your role by setting the rules and spreading the data culture. You sometimes have to say no. You may have to bring people back to reality. Being able to communicate realistic milestones politely and gently is an invaluable skill set.

Write best practices, communicate with stakeholders and data producers and show them that these guidelines are there to help everyone and improve productivity, not to block them.

Review – 10 to 20 %

Review is an important category as it’s basically the time you learn the most. When you are on your own to learn new things, it’s pretty hard to know if you are on the right track. Getting a close feedback loop with people (peers, stakeholders) is crucial. You learn what you are doing well and what you need to adapt.

Review can be split into 3 different categories:

  • Code review
  • Project review
  • Performance review (Team or peer to peer review)

There are days where I would spend more time reviewing code than coding. And it’s not a bad thing. It may be that I need to get familiar with a new code base, or there’s some big feature I would like to double-check.

Project review can be post-mortem or demos to your stakeholder. It’s basically everything related to a specific project, understanding what is/was going wrong, what is/was going well. It’s also an opportunity to share best practices and establish conventions: coding style, documentation, etc.

Technology watch – 5 to 15%

Even if it’s not daily, it’s essential for a data engineer today to do technology watch as new tools and frameworks are popping up so fast that you need to follow the trend if you don’t want to be outdated.

When people think about technology watch, they sometimes think, "that’s the kiddo that just hypes about new toys." But actually doing technology watch is not necessarily looking at big-breaking new tech but also :

  • Reading articles, books.
  • Improving your current setup with new libs/frameworks or design patterns.
  • Follow-up the new cloud services or features that could simplify your setup or reduce costs.

You can read up about my blog post and our data tech Skills radar if you want to get more insights on the trends here.

Nonefficient – 1 to 10 %

"It takes a lot of effort to be this unproductive"

Let’s be honest. We have all days where we feel that nothing we have been doing falls into any of the categories above.

Scrolling through your LinkedIn feed, talking about the last game you played at coffee break, non-productive meetings, these are all kinds of activities that would take you down some days.

If you consider this as part of your time, there’s nothing wrong with having it and acknowledge it, as long as it doesn’t take a too big part of your time.

Conclusion

As we can guess, coding is just the tip of the iceberg. And yes, communication is key at the end for almost all the sections. I always try to keep these ratios in mind weekly to be sure I’m spending my time accordingly. These ratios will of course change depending on the culture and the size of your company.

Are you missing something? Feel free to share your ratios or/and sections that I may have forgotten!


Mehdi OUAZZA aka mehdio 🧢

Thanks for reading! 🤗 🙌 If you enjoyed this, follow me on 🎥 Youtube, ️ Medium, or 🔗 LinkedIn for more data/code content!

Support my writing ✍️ by joining Medium through this link


Related Articles