The world’s leading publication for data science, AI, and ML professionals.

How to Manage a Junior Data Scientist

A personal reflection on keeping you and your team happy and productive

Do you ever wish that you could just manage a clone of yourself? Messages would never get lost in translation, you would know exactly what work your line report is capable of, and they would find your jokes hilarious.

Photo by frank mckenna on Unsplash
Photo by frank mckenna on Unsplash

But we can’t do that, and notwithstanding an inflated opinion of myself, it’s not particularly desirable. Teams instead work best when they have a diversity of background and thinking. Plus, it can be a useful exercise in humility to know that your manager is probably having the same thoughts.

So in this clone-less world, you need to embrace the challenges and rewards attached to managing a Junior Data Scientist at the start of their journey in the field.

A little judgement, and a lot of trial & error, have allowed me to pause for thought and reflect on the things that have worked well for me, and those that fell flat on their face.

At the risk of sounding like clickbait for a self-help article, I have picked out 6 key points that I will keep with me as a manager.

This article is primarily for a Data Scientist (or Senior) managing more junior members of the team, but hopefully, there is something for everyone.

Let’s take a tour of my 6 observations.

Pair and pair again

Pairing is a common technique amongst software engineers, where two engineers use dual-monitors or a shared screen to work on the same code. Typically, one person is the pilot, steering the work and writing the code, and the other is a navigator who makes observations or suggestions along the way, and can directly write code too when appropriate.

Photo by Amir-abbas Abdolali on Unsplash
Photo by Amir-abbas Abdolali on Unsplash

The beauty of pairing is that you learn from one another, spot bugs that you wouldn’t have picked up on yourself, and write higher quality code as a collective.

It feels like pairing is becoming more common in Data Science as the lines blur more with software engineering. Pairing has produced great results across the team but particularly so with junior members.

If you are introducing them to something new or daunting, pairing is a great opportunity to start your project as the pilot to demonstrate the code in practice, before handing over the reins so that your team-member can learn by doing, with you alongside them for guidance.

It may be something that you do less formally already, but I find it adds more rigour to how you work together if you book in some time and say…

let’s pair and write this together

…as opposed to simply reviewing some code afterwards or throwing a new topic at them without helping them get started.

Give it a go, find the dynamic that works for you both, and make sure that you are setting aside an appropriate block of time in your calendars to pair productively without distractions.

Pick your battles

As I started to pair more and more, I also noticed that I was talking more and more. Whilst communication is key, it can be easy to start nitpicking and correct any work until it looks exactly like you would have written it.

Photo by James Pond on Unsplash
Photo by James Pond on Unsplash

This pattern got me thinking (worrying), so I pulled up the last 5 pieces of written feedback I had given to my line report. After re-reading my messages, I soon realised that much of it was rooted in personal preference. In essence, I was saying that I wouldn’t have written code a particular way, or I would have presented some slides in my style.

Instead of asking "how does this work compare to my own?", I should have been asking the only question that matters, "does this work do what we set out to achieve?".

For example, who am I to say that this code…

df.rename(columns={"old_name": "new_name"}, inplace=True)

…is better than this…

df = df.rename(columns={"old_name": "new_name"})

Getting practical, I made a list of the aspects of data science that I consider universally good practice and expect my team to follow. It serves as a useful electric shock if I’m providing feedback on a topic that isn’t on the list, because if it’s not a must-have then it might not matter so much.

Priorities when it comes to feedback & reviewing work. Image by the author.
Priorities when it comes to feedback & reviewing work. Image by the author.

What would make it on to your list? You might just find that letting go of the less important stuff makes your reviews more efficient and less likely to leave your team feeling bombarded by feedback.

Learning on and off the job

Data science is a broad and dynamic field where it is impossible to keep up to date with all of the advances, nor should you be expected to.

Nonetheless, whether you’re a data science generalist, ML engineer, or researcher, it is an inescapable truth that you need to know your onions and possess a certain amount of technical skill.

Photo by Tim Mossholder on Unsplash
Photo by Tim Mossholder on Unsplash

Imposter syndrome can strike any member of the team, not just junior ones, but Junior Data Scientists should be afforded plenty of patience and support, in particular, to steadily get to grips with the role. What’s more important than prior experience, is their willingness, passion, and inquisitiveness to learn the trade.

Make it clear that your team can take some time to focus on new skills that may benefit them e.g. one Friday a month dedicated to training.

A degree of learning also has to happen outside of work. Sharing tutorials, subscribing the team to newsletters, attending meetups, and running journal clubs are great ways to encourage continuous learning. It is also a useful acid test; alarm bells should be ringing if your team is not committed to personal Development in some form.

Define progression

This one took some candid feedback from a member of my team. During one of our weekly catch-ups, they said…

I’m not sure where I’m heading in the company or what I would need to do to progress

Not a great place to be, and not great to hear as a manager. But they were absolutely right, we had never really sat down and discussed what progression in the team would look like.

Photo by Lindsay Henwood on Unsplash
Photo by Lindsay Henwood on Unsplash

This feedback served as a helpful wake-up call to perform two actions.

  1. Create a document which expresses the responsibilities and traits of a Junior Data Scientist and what that looks like in the next step up. This does not need to be overly prescriptive, but it helps to transparently show what they should be aiming towards.
  2. Ask the questions of what makes them most happy in the role. When did they feel most proud? What type of work would they like to do most if they were given the choice? Having an understanding of whether someone is an individual contributor or see themselves as a future team leader means you can tailor their progression accordingly.

Do you have a clear outline for progression within your team? You owe it to yourself and your Junior Data Scientists to create a transparent description of your expectations and what they need to demonstrate to make a compelling case for promotion.

Example career ladder from The Care and Feeding of Data Scientists. https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf
Example career ladder from The Care and Feeding of Data Scientists. https://oreilly-ds-report.s3.amazonaws.com/Care_and_Feeding_of_Data_Scientists.pdf

Seek out the right opportunities

As mentioned above, team members should be given some freedom to express what type of data scientist they want to be and what they feel their speciality is.

Nonetheless, you still need a team with an understanding of the full end-to-end process of delivering data science projects. It’s no use having a machine learning specialist with no awareness of how their model could be put into production or a strong coder that cannot engage with stakeholders.

If you have defined what progression looks like, the next step is to make sure you are prepared to help your team get there.

Photo by Markus Winkler on Unsplash
Photo by Markus Winkler on Unsplash

I found it a useful exercise to sit down together and create a scorecard for the different skills expected of a Data Scientist, and how much exposure they have had in each area.

This exercise is less about strengths and more about whether they are being presented with the opportunities to pick up the skills demanded of them. Doing so together is win-win because they will have a clear development plan and you will have a more well-rounded team as a result.

For example, your line report may say…

  • "I have been buried in SQL code and ingestion pipelines for the past few months and I’d like to get back to some machine learning".
  • "I’ve worked on plenty of models but someone else always takes over and deploys it in production for me."

Creating a scorecard allows you to hear their concerns or development needs first-hand, and then come up with a plan to address any blind spots.

Here is a high-level example of a scorecard. The exercise can take any shape you want, but you should do it together.

Example scorecard. Image by the author.
Example scorecard. Image by the author.

OKR’s

You’re a Data Scientist. You like to measure stuff. You should apply that thinking when setting objectives for the people you manage.

There are countless articles about SMART objectives, OKRs, BHAGS, and any other acronym you can think of. Suffice to say, an objective should be relevant, ambitious, and measurable.

Photo by Fleur on Unsplash
Photo by Fleur on Unsplash

Data Scientists are typically a passionate bunch, and we can get distracted by wanting to learn the latest and greatest frameworks or looking for chances to use deep learning. But we do not work in a bubble, and ultimately, our employers are expecting something of value back.

So outside of personal development objectives, it makes sense to specify a target with a commercial outcome associated with it.

Let’s say that your business runs a monthly email campaign to attract new customers, and 10% of the people who receive that email typically become a customer.

The Data Science Team have been asked to improve its performance, and it’s a perfect project for your Junior Data Scientist. Ahead of starting the work, you could set this objective and measurable outcome…

Objective: Improve the conversion of email campaign

Key Result: One model which predicts who should be targeted

Seems like a good place to start. We have a project to work on, and it makes sense that the person working on it needs to deliver a model. But what happens to that model, how is it put to use, and how will we know whether the work was successful?

Maybe the next iteration of the key result would be…

Key Result: One model which predicts who should be targeted with 60% recall

We’re getting a bit closer to capturing success but it is tied to a statistical measurement like recall rather than something that affects a business KPI.

If you force yourself to think commercially, you might speak to the CRM and Finance Teams to understand what a meaningful improvement would be. Perhaps they tell you that conversion would have to increase from 10% to 15% to justify the additional work of updating their workflow and incorporating your model in production.

Great. Next, you’ll have to think about how you would measure that, and what the experiment would look like. If you arrive at something you can’t measure, should you even be working on it, because you’ll never know if it was worthwhile?

This key result feels more solid…

Key Result: A/B test of campaign using data science model vs existing approach, demonstrating increased conversion to at least 15% by using the model

Things will always happen that are outside of your control. You may deliver the best model in the world but stakeholders don’t use it in the right way. On those occasions, at least you started with the right intentions and can take into account mitigating factors in your review.

By starting with a commercially-minded objective in the first place, and thinking about the experiment you would like to run to validate your work, your team will start to become much more valuable to you and your company.

That’s a wrap

As with everything in life, there’s plenty more to learn when it comes to data science and Management, but I hope some of these 6 suggestions can help you along the way.

  • Pair together
  • Define your must-haves, and let go of personal preferences
  • Place value in learning outside of the day-to-day role
  • Communicate the traits required to progress from junior to senior
  • Create a scorecard for exposure to different elements of the job
  • Set commercial objectives

Best of luck to you and your team!

While you’re here

Feel free to check out my other articles:


Related Articles