The Critical, but Often Overlooked, Skills in Data Science & Analytics

Looking beyond technical skills

Jack C
Towards Data Science

--

Photo by Lukas Blazek on Unsplash

There are now more ways than ever to learn the technical skills needed to get into data science & analytics — online courses, Kaggle competitions, youtube tutorials, and the gold mine that is stack overflow to mention a few.

People looking to break into the field will often ask questions such as “what programming skills do I need to do X data job?”, “which machine learning techniques should I learn?”, and “what projects should I do to build my skills/portfolio?”.

And technical skills are of course important, you couldn’t become a car mechanic without first knowing how a car works (hopefully).

But focussing solely on technical skills misses the crucial aspect that makes a data professional truly useful — being able to use their technical skills to have a business impact.

Bridging the gap

Photo by David Martin on Unsplash

A data professional’s role in any business is to bridge the gap between the technical bits of the job and helping influence decisions in the business.

The first part, the technical, is well defined — you take courses to learn coding and software, or you learn on the job.

The second part, helping influence the business, is far fluffier and less defined. It involves making the technical part of the role easy for the non-technical to understand, and obsessing over the needs of your end users.

In this article I’ll try to distil the main learnings from my own experience on how to have a larger business impact — and the types of non-technical skills that separate good candidates from great ones in recruitment processes.

These are the main points that I’ll run through:

  • Know your audience
  • Keep it simple
  • Focus on the outcome

Know your audience

Photo by Product School on Unsplash

In my experience, most data professionals come from a college / university background and are used to being surrounded by people with a similar level of knowledge to them.

Sharing your work with people who have a similar background to you is pretty straightforward, you don’t need to worry about being overly technical or even having to explain core concepts.

But in any business I’ve worked in data professionals have made up at most 2% of the workforce, and the remaining 98% likely don’t know nor care about what hyperparameters are, nor how long it took to wrangle that horrendous dataset into a workable format.

Knowing who your audience is, what they do and don’t care about, and translating what your work is into how it benefits them, is a crucial skill to have.

Steve Jobs didn’t promote the original iPod by talking about its 5GB hard drive, or even its smaller size, he marketed it as “1,000 songs in your pocket”. People don’t care about the technical detail, they care about what you can do for them.

Whenever I’ve been involved in hiring new data analysts / scientists this is invariably the biggest stumbling block — being able to translate their technical work so that someone non-technical understands why it’s important.

So how does this work in practice?

Let’s say you’ve built a model that predicts whether a customer is going to buy from your business, or not. It’s a very good model that is 80% accurate.

You’re going to present this back to your head of sales, and your summary is this:

Our random forest machine learning algorithm was trained on 10,000 rows of sales data, and has an accuracy of 80% on the classification of customer sales outcomes.

The head of sales will likely care about none of that, except maybe the 80% part.

Instead, think about what the ‘so what?’, put yourself in their shoes and think about what they care about.

So why don’t we try a different approach:

We’ve built a model than can identify, 80% of the time, whether a customer is going to buy from us or not. We can use this to upsell to customers who are likely to buy from us, and target marketing to those who aren’t.

We’ve done 3 things here:

  • Got rid of unnecessary jargon
  • Put the model performance in terms they understand
  • Given a ‘so what’ as to what can be done with the outputs

Put the findings of your work in terms other people understand, overloading them with irrelevant information or jargon can cause your main points to be missed.

This brings us nicely onto the second point.

Keep it simple

Photo by Fabrizio Chiagano on Unsplash

“If you cannot explain it simply, you don’t understand it well enough” — Albert Einstein, probably

The head of data at my old job said a phrase that stuck with me when he gave advice on an (overly complicated) bit of analysis I’d written:

Don’t make people think

I was a bit puzzled at first, as it came across like he was suggesting that people reading it needed to be spoon fed. But then it clicked — make it as easy as possible to get for someone to get to the point you’re trying to make.

An infamous example of poorly thought out presentation is the case of “death by PowerPoint”, where Boeing engineers aimed to communicate to NASA the risk of the Columbia space shuttle (that had been damaged on takeoff) disintegrating upon re-entering the earth’s atmosphere.

This was the key slide where they tried to make their case:

Source: NASA

It’s certainly not the most visually appealing slide ever made — but crucially they didn’t get their warning message across (this article goes into more depth) and the shuttle ended up crumbling upon re-entry.

The most relevant messages on the slide were buried in the last 3 points, and even then — it takes you a few reads to grasp it.

Whilst this tragedy won’t have been caused solely by this slide, it really didn’t help. The main offences:

  • The title of the slide didn’t summarise the message they wanted to give
  • The most important points were at the bottom in small font
  • The slide itself was 6th in the order of the presentation, rather than 1st
  • It was full of jargon (‘SOFI’ and ‘ramp’ both mean foam)

So how do we ‘not make people think’? Let’s run through a basic example.

Let’s say you work for Company Ltd. (full marks for creativity), and in week 9 of 2021 they started a marketing campaign — and you’ve been asked to show how effective it is.

So, you plot a line chart (after all, if you’re looking at a change over time — isn’t a line chart the best way?), and you end up with this.

Image by author

Now, this contains all of the key information — sales go up in week 9 and stay there, great! But let’s think, if you had no line chart and you had to talk through your findings — what would you say?

After launching our marketing campaign, sales have gone up by X%/£

If this is the message we want to get across, then why don’t we show something that gets to the point?

Image by author

This gets the main point across in the title, and the bar chart is useful to show the ‘before vs. after’ difference. Minor tweaks, but it requires far less thought to get to the exact same point.

Granted, this is a very simple example — this doesn’t mean you can replace every line chart with a bar equivalent, but the point is to think about how you can distil your message into as simple a format as you can.

So often I’ll see presentation slides with a title of “analysis” or “findings”, the title of a page is the first thing people read — why not make it a summary of what you’re about to tell them?

People rarely have the time nor the attention span to read through reams of methodology and background material, just get straight to the point otherwise the important bits of your work will get overlooked.

Focus on the outcome

Photo by 金 运 on Unsplash

Businesses care about money (shock horror), either making more of it or spending less of it. The work you do as a data professional has to contribute to that in some way.

As someone technical, you’ll likely take joy in building things and solving difficult problems— however this comes at the risk of becoming the perfectionist.

The perfectionist wants to spend time endlessly tweaking their models, or fully automating that data input, but this isn’t always what has the largest impact.

“Don’t let perfect get in the way of good enough”

Most of the impact your work has will come from the initial part of your effort, whereas spending all of the extra time on perfection rarely has as much of an additional benefit.

This trade-off is better known as the Pareto principle, or the 80 : 20 rule, 80% of the impact of your work comes from 20% of the effort.

But what does this actually mean in practise?

Let’s run through a few data specific examples:

  • You’re building a predictive model, using the 5 most important features gives an 80% accuracy, but using an additional 5 features gives 85% accuracy [do you spend the extra time on those extra 5?]
  • You’re building a customer profitability model, and you can either spend a long time bringing in the exact cost of onboarding a customer into your model, or you can use a hard coded assumption [do you hard code it?]
  • You’re building a dashboard for a sales team, and they’ve asked you for 10 different charts/graphics — you can build 7 of them easily, but the last 3 will take a lot of work [do you spend the time on those last 3?]

You’re probably looking at that list and thinking “well, in each example doesn’t it depends on the context?” — and you’d be absolutely right.

This is more of an exercise in taking a step back and thinking “is what I’m doing right now the most impactful thing I could be doing?”.

Endlessly tinkering with something you’ve built is always tempting, and it can easily become something of a comfort zone. Make sure you periodically take the time to take stock of what you’re doing to avoid being sucked into the rabbit hole of the perfectionist.

Now, this doesn’t mean all of a sudden that you can accrue tech debt like there’s no tomorrow..

Obligatory xkcd comic (source)

..but as with everything in life, there’s a balance to be struck somewhere in the middle.

Keep yourself to targets on when something you build will be ‘done’, or what level of functionality / performance you’d be happy with. Stick to it ruthlessly and keep asking yourself whether what you’re doing is truly the thing that will have the most impact.

Summary

Hopefully this has given you a good overview of some of the often overlooked skills in the field of data.

To reiterate, this doesn’t mean you can skimp on the technical knowledge, it’s just to make you think about the wider skills that can make a huge difference in how effective you are.

Hope you enjoyed reading!

--

--