Getting Started, The Business of Data Science

How to Prioritize and Execute your Machine Learning Research

A summary of how to get it done: taking ideas and putting them into production

Sam Black

Published in

Towards Data Science

7 min readDec 8, 2020

I generally have lots of ideas when I’m thinking about machine learning. I dream up new architectures and new methods all the time, but often find myself with a combinatorial explosion of ideas to test.

If you’re a researcher, there are likely 5–10 different ideas you are working with in your head at any one time. Within just one of those ideas, are probably 5–10 more variations or offshoots of the idea.

In this article, I introduce a framework that I use to help me prioritize and most importantly, execute, my ideas. I hope that this framework will help you be more successful, whether you are an independent researcher, a working data scientist or machine learning engineer, or a researcher at an institution.

Framework

Start with the goal
Iterate through all possibilities
Prioritize the expected value
Organize, organize, organize
Know when to pivot

These steps sound intuitive and simple, but need to be consistently applied in order to achieve success in your projects/research tracts.

Start with the goal

First and foremost, you need to have a target. Whether your goal is to beat the state of the art on an industry benchmark or solve for a specific problem, you need to be clear on what that goal is and why you are solving for it. The goal should be clear enough to fit in a single statement.

Achieving this goal should be the thing that derives value for you or your team, whatever domain you happen to be working in.

Example:

Achieve SOTA on GLUE

or:

Build a recommender engine that achieves at least a 65% purchase rate for our company

Again, this sounds simple, but when you’re “in the weeds” and thinking about graph traversal algorithms or some other esoteric/unrelated realm of research — this goal statement will bring you back to your original intent and prevent you from getting too far down the rabbit hole. It should almost become a mantra as your work through your research, guiding your decision making and ensuring that you’re keeping the end state in mind

Iterate through all the possibilities

Here’s where that handy spreadsheet will come in handy. You and your team will have tens to hundreds of viable options for achieving your goal. Each of which has two factors — ease of execution and likelihood of success.

At the beginning of your project, you’ll need to sit down for days or a week at most and iterate through all of the possibilities. You’ll need to time box this, to make sure you can actually deliver, instead of getting stuck in information paralysis.

This is where you or your team will need to comb through the latest research and get a lay of the land. You may encounter some promising research tracts that you haven’t encountered before or learn some new techniques or methodologies.

Track anything and everything that may make sense to apply. If it has significant applicability, you should aim to understand the research well enough to understand how difficult or easy it would be to execute and to explain the value to the rest of your team. This is the time to list every possibility and idea that you have. Keep note of the sources (papers) and any enablers (code/repos/examples) that you can to evaluate or implement later.

Prioritize on the expected value

At the end of the process in step 2, I find it’s useful to sit down and have one or multiple sessions where you discuss all of the ideas. Some ideas will have immediate promise and applicability, while others will be unclear.

Each idea will need to be prioritized on two factors as mentioned, ease of execution and likelihood of success. There is no easy way to determine ease of execution or likelihood unfortunately, as each researcher and team is different.

Some considerations:

How complex is the method or idea?
Did this method show success in another area?
Do I/we have any expertise in applying this method?

At the end of your prioritization process, you should have the ideas which have the highest likelihood of showing results, followed by the ease of execution. If you were to create a score for this (for sorting), it would be weighted more on likelihood of success vs ease of execution.

It’s also important to add in any tools/processes you need to develop in order to realize some of the ideas. It’s likely that some tools or processes have impact on multiple areas of research, thus should be considered as well, especially when you’re thinking about how to distribute work to your research team.

Organize, organize, organize

This step can make or break your project, as it’s crucial to remain disciplined through this entire process.

You need to clearly organize the following elements:

Who is working on which idea/tool
What is the critical path (the expected time from start to finish)
The status of each
The results of each (should be distributed to the rest of your research team)

This is where project management and research meld. Without this level of organization and accountability, it’s likely that you will “go down a rabbit hole” on one tract or idea and burn through a lot of valuable time.

Use something like Weights and Biases to track your outcomes and ensure that the names of each model and any hyperparameters tested are clear. I know I’m not the only one that has probably wasted cycles on retraining a model architecture I’ve trained before because I didn’t track the overall accuracy. Lesson learned. Now, organization is paramount. These tools make it easy to keep track.

Use something like Trello to track the status of each research tract and to provide accountability to your team. Make sure your researchers are keeping up to date.

Every team/researcher is different. Use whatever tools make sense, but ensure that you are keeping everything tightly organized.

Know when to pivot

Knowing when to shelf an idea is a valuable skill that is difficult to apply. Sometimes we fall in love with an idea, but that idea doesn’t love us. I’m especially vulnerable to this; theories are almost always beautiful and elegant, but occasionally fail hard in real experiments. In which case, we need to know how to test the idea (quickly) and move forward when the idea fails to bear fruit.

Avoid the sunken cost fallacy — You should know how complex/simple the idea is initially prior to investing time into it. You should also state a time bound on working through this area of research, so you don’t chase an idea which may not eventually work out.

This can be especially hard. Sometimes, shelving an idea is harder than breaking up with a significant other. You spend weeks/months/years theorizing about something and it doesn’t materialize, no matter how you work it. It could be that there’s a piece missing, or maybe it’s the wrong domain, or a number of other factors. You don’t have to divorce yourself from the idea entirely, because it may prove useful later on, but you do need to ensure you can reach the goal stated in step 1. Know how to pivot, and when to do it.

Some key questions to ask:

Are there any promising results that I’ve seen from this idea/method?
Do I believe that there is a modification or set of modifications that would improve my outcomes? If so, why?
Can I quickly prove out my hypotheses?

If you answer “No” to at least 2 of the 3 questions above, it’s probably time to move to a different avenue or on to the next area.

Remember — stay focused on the goal. Just because one of your elegant theories didn’t work out, doesn’t mean your next set of ideas won’t. If a theory didn’t work, it’s not because you aren’t capable and intelligent. Try to divorce your emotions from your research.

It’s better to say

“This idea didn’t work. Although we had high hopes, when we tested it, it didn’t deliver what we expected”

Than to say

“We need more funding because we need to continue iterating on idea that we’re not sure will work”

Summary

Achieving results isn’t magic, it’s just consistency and discipline. As researchers, we like to dwell in possibilities, think about elegant solutions to difficult problems. However, at the end of the day, you are held accountable for the results you can deliver — especially if you’re at a private company and not a research institution.

Talk the talk — and walk the walk.

Thanks for reading!

If you liked this article, you may also like:

How to hire amazing Data Scientists

How to build hire and retain amazing data scientists

towardsdatascience.com