There are a lot of frustrated data scientists out there right now. A number of recent surveys indicate that this field is among the most likely to have discontented employees who feel their work is not valued. In fact, it’s quite likely that you – the reader of this article – are currently unhappy or frustrated in your job or have been at some time in the past.
One thing to consider is whether there is something you can do about it. Sometimes data scientists find themselves systemically disenfranchised, and that is something it is very difficult to change unless you are in a senior leadership position. Other times, the answer lies closer to home. I’ve seen data scientists act as their own worst enemy, adopting practices which almost guarantee that their work will end up going nowhere in their organisation.
Here are five common reasons I see for data science work going nowhere, as well as some thoughts on how to avoid these things happening to you.
1. The work was not useful in the first place
It seems like a crazy idea that someone would do work that is not useful, but this is a very common symptom of poor management practices or poorly designed reporting structures. Incapable or poorly qualified managers inherit teams whom they don’t know how to manage, or they don’t know how to line work up against organisational priorities. Every check-in becomes a chore for them. They give you work because you have to be doing something to justify your pay. This is an awful, demotivating situation for any employee to be in – data scientist or otherwise – but it happens more in data science in my experience.
If you work for any organisation that has decent values, you should always be willing to ask your manager to associate a task with an objective, and ask yourself if that objective makes sense. You need to see a straight line between the specific computational task assigned to you and some sort of strategic imperative. Maybe the strategic imperative is related to tangible Business objectives like revenue generation or cost reduction, or maybe it is related to intangibles like skill or asset-building, but you have to see the link and it has to make sense to you. Otherwise, what’s the point?
2. The problem was not well defined
Another common situation is where there is a link to an important objective, but it has not been defined well and as a result you have been led up the garden path on what the deliverables are. A typical example of this which I see very frequently is poor use of language in defining the problem. Maybe someone has asked for a model that predicts something, when it turns out that they wanted a model that explains something. As a result you’ve gone and used a bunch of machine learning algorithms and calculated the accuracy metrics when they don’t care about accuracy and want to know about coefficient importance from inferential statistical models.
I find that it’s useful to ask the following question before embarking on a project: if the results are useful, how would you use them? This is a very revealing question which can really bring out what the true purpose of the work is. For example, if the answer was: we want to put the predictive model into production to automate decisions, then you can be sure that predictive metrics like accuracy are of prime importance in your modeling approach.
3. Your results were not well explained
Data Science methods are complex, and you can’t assume that because you know what you are doing, so will your manager or other people in your team, or god forbid laypeople. The best data scientists carefully document their methods. When I say ‘document’, I don’t just mean some occasional hashtagged lines of terse commentary in your code. I mean a research style document that outlines the approach, data, method and a discussion of the results.
If you are not in the habit of writing integrated documents by means of R Markdown or Jupyter Notebooks, get into that habit now. And next time you have a substantial deliverable, don’t just deliver the results and the code, take a few extra hours to write it all up and explain your method. Some of the greatest impact work I have done can be put down to how I wrote it up. People will trust it more if you explain it well.
4. You didn’t think about reproducibility
Don’t create work in a way that has no regard for someone else picking it up and trying to reproduce it. There is no more sure fire way for your work to be forgotten then to make it hard to reproduce. I cannot tell you how many times I have been put off from trying to reproduce work just because of the fact that the original creator made it far too hard for me to do so.
Take care for the quality of your code, address randomness through setting seeds, comment important decisions, save environment variables or workspaces. Keep the git repo updated. Capture a record of the system and environment you ran your code in (through functions like sessionInfo() in R or its equivalent in other languages). Show others that you have made every effort to make your work reproducible and they will trust you and admire you in equal measure.
5. Watch out for misinterpretation
Your job doesn’t finish when you have handed off the results. It’s so easy for your work to be misinterpreted by laypeople once it starts to be distributed around the organisation. Weakly significant factors can suddenly become highly predictive factors. Models with poor fit can suddenly become reference models.
Attach your name to the work, and make efforts to ensure that you are involved in follow up, even if it is just to listen in for reassurance. If you hear misinterpretation, don’t be shy to follow up to correct it with the people involved. Remember that your colleagues are often harbouring their own agendas and will be tempted to use your work to build a narrative that support those agendas. Ask to see or hear their narratives and ensure that your work is not politically abused (even if unintentionally). These sorts of corrective interventions can lead to your opinion being more valued over time, and for you to be seen as a protector of truth.
Do these situations sound familiar to you? Are there other factors which you think I’ve missed. Do feel free to drop a comment if so.
Originally I was a Pure Mathematician, then I became a Psychometrician and a Data Scientist. I am passionate about applying the rigor of all those disciplines to complex people questions. I’m also a coding geek and a massive fan of Japanese RPGs. Find me on LinkedIn or on Twitter. Also check out my blog on drkeithmcnulty.com or my textbook on People Analytics.