How to outsource data science effectively

Tips for businesses looking for professional data science help

Published in

Towards Data Science

7 min readOct 23, 2018

Technical expertise are a necessary, but not sufficient, condition for success in a data science project. So, if you’re considering hiring external machine learning or predictive analytics capabilities; whether it be solo freelancers on Upwork, companies like ours, or the analytics wing of a Big 4 consultancy, here are a few things to bear in mind.

You want business-minded people.

While it’s exceedingly easy to get swept along in a conversation about green-field, blue-sky tech, I’m assuming you’re hiring an analytics team to help with your business. If you’re not, you can safely skip this part and in fact the whole article. After all, data science is an applied discipline. Machine learning research at the very cutting-edge is all about mathematics and moving the needle on the time it takes to invert a particularly problematic matrix. If that sounds like gibberish to you, it should.

Any outsourced project should have some tangible results, and data science is no exception. While ex-academics would prefer it to not be this way, the results we’re talking about are delivered in scalable, maintainable code and not in research papers. And while I definitely do not think that you should treat your project as another software engineering exercise (an article in itself, trust me), you should receive a complete system that you can implement, audit and improve upon once the pros have left the building.

But how can you be sure that you’ll be able to make changes or even know when things are broken?

Because your outsourced analytics team should set you up so that you can monitor the business impact (read: cold hard cash) of their innovations.

You can (and should!) learn about things like cross-validation, receiver-operator characteristic and precision-recall curves, but that must not stop you from asking your team of experts what all that means for the lifeblood of your business.

Similarly, you shouldn’t expect MBAs from the people you send RFPs to. What you can do though is have a frank business discussion with those that you’re considering for the project, and if they don’t break out in a flop sweat when you start talking about revenue, see if they can come up with ideas or questions that tie your business goals with their development and statistical knowhow. That’ll be a good sign that they’re attempting to help you as a business and not just as a potential case study.

If you’re designing a new product on the side, or if you need really deep expertise in one technical domain or another, it’s okay to occasionally forgo business savvy, as long as someone with that knowledge is around to help keep the expert focussed.

Occam’s razor always applies

You can’t do everything in Excel. But you shouldn’t use neural networks for everything either.

Coming back to what I said above about case studies, a lot of analytics teams want to use the latest and greatest to make their portfolios as on-trend as possible. I’m sure some graphic designers suffer from the same tendencies. This isn’t helped by marketplaces like Upwork where hundreds of contractors chase a single opportunity and have to differentiate themselves on the basis of a portfolio and a cover letter.

You should be careful that the money you’re paying (and it could be a lot) is buying the wisdom of knowing which solution to use, rather than being just the premium you pay for following the tech hype. I’d wager that a large percentage of the data science project failures we’re all reading about are a case of using cutting-edge methods to deliver simple results.

Whenever I approach a new project, I always ask ‘how can this be done heuristically?’ Which is a fancy way of asking ‘if we guessed at some rules or sketched out an algorithm, how close would we get?’

It usually turns out that there is already such a system in place; a raft of Excel spreadsheets, some sort of Top 10 or randomised selector, or perhaps a rules engine left over from before the AI Winter.

This is always a good place to start. This is what is currently giving you, as a business, the results that you want to improve upon. So, when it comes to your initial call with your chosen consultants, ask them to review it and learn about how it works. Most importantly, ask them if all the money you’ll spend to improve it will be worth it for your business.

The cloud costs money

When I first started out, I always budgeted for the initial expense of setting up cloud compute power and data storage, but I never fully prepared clients for the costs of running this system day in, day out.

It’s an easy mistake to make.

Data people forget that they’re working on a radically attenuated selection of the data. They forget that there may have been a series of complex encryption, anonymisation and lookup procedures performed on the data before they got it.

They forget that they’re querying the API they designed at human speed, from the terminal.

They forget that businesses grow.

Ask any data engineer: ETL costs money, machine learning models can be huge, poor indexing kills join speeds. And so on.

It’s crucial that the person or team you hire have experience of taking data from it’s raw state and producing predictions and that they have experience of doing that efficiently. These are the skills that many data science types lack. Those who are used to CSVs in, CSVs out may not be the best option for your project, unless you want a bunch of CSVs.

When you’re getting a bid on a project, make sure you ask about what the ongoing cloud costs will be, how they’ll increase when you have more data and how many API servers they think you’ll need when you’re making 10,000 requests a second.

Easy integrations aren’t always easy

Things are getting so much better. Stitching together radically different systems used to be a headache-inducing months’ long activity. And while we’re not completely out of the woods yet, it is easier than ever to put the data from two or more systems together in one place.

All that being said, data science is not data warehousing.

Feature engineering, the most data- and creativity-intensive part of the data science endeavour, requires strange data. By strange I mean unaggregated, un-relational, and essentially useless to humans.

This means that the textbook integrations aren’t always going to work smoothly and I believe that any serious analytics company should be willing to hire an expert in whatever niche CRM or ERP you’re using. While trawling through LinkedIn in search of one is no fun, neither is wasting client money picking apart documentation that isn’t fit for purpose.

The rub here is that your analytics partner won’t know about all of these systems unless you tell them. So please tell them.

Don’t split development (wherever possible)

Those last two points were basically long ways of saying that whoever you outsource analytics to should have some engineering chops. At least enough to know what they don’t know.

In my experience, as a data guy, it’s always easier for the client if I find and work alongside the development team. And this ease increases exponentially with the number of projects we’ve completed together.

Data scientists view everything functionally, numbers in -> numbers out. Software engineers have another skillset, that is, reducing complexity by splitting things into their composite parts. Those approaches don’t exactly gel, and the downstream effects of talking cross-purposes at every technical meeting include non-sensical user interfaces, poorly designed data infrastructure and subpar product performance overall.

Hiring a team diverse enough to cover areas such as design and data engineering will always work way better than sitting a data scientist near a software developer and hoping for the best.

So when you’re talking to potential vendors, especially on an end-to-end project, ask them about who they know that can help build a great solution to your problem.

A word is worth a thousand data points

Tips on questions to ask in the early stages are all well and good, but how can you choose a partner that you’ll have a fruitful relationship and solid ongoing communications with?

As I’ve mentioned, I don’t treat data science projects like software development ones. And I don’t think that daily stand ups are the correct format to discuss ROC improvements.

On the other hand, I don’t think that a PowerPoint presentation with a flashy looking graphic on it once every two weeks is the best method of sharing progress either.

I think one of the best things you can do is treat a data science project sort of like a design one. Taking a moment to process analyses on an emotional level, whenever you receive an update.

That means discussing the intangibles and the impact this will have on the experience of your business. While it might be silly to say that you don’t like a particular analysis, the way you can with logos or business card layouts, you can use your intuition and deep knowledge of your own business to raise pertinent questions for your data team to consider.

Any team who continually point in the direction of esoteric metrics that you don’t understand is likely to not buy in to your objections about a particular result on the basis that it feels wrong.

You can field for this by asking questions about how the project could fail. If they say, ‘we’ll fail when we provide no improvement to xyz business metric’, then they’re a keeper. If they say, ‘even though you’re losing money, our system has over 89% accuracy stable under cross-validation’, well, I think you know what I’m going to say.

Overall, you should know that while data science projects take time and cost money, they should generally move the business in the right direction. If an analysis gives some clearly bogus results, say so. It’s much more preferable than following a blind alley.