Dos and Don’ts for Testing a New Data Scientist Candidate

Casey Whorton
Towards Data Science
8 min readJan 1, 2021

--

Photo by JESHOOTS.COM on Unsplash

Testing candidates for a Data Scientist position gives a hiring organization a great sense of how well they can do job-related tasks and manage time effectively. Skills that Data Scientists need to succeed vary by company or even by teams within a company, so testing candidates should be tailored. In general though, Data Science is a process that includes many steps and independent skills that aggregate to something greater than the sum of its parts. While difficult to test for, in this article, I will share the guidelines I feel are important to follow when testing a new Data Scientist candidate.

Technically, testing can be done at any part of the interview process, even before the interview process begins, but only if you want to give a candidate a reason NOT to work for you. For me, there are two times at which to give tests to Data Scientist Candidates:

  1. After an initial screening interview with Human Resources or somebody else at the company that is hiring a Data Scientist, or
  2. During the interview process with the department that is hiring a Data Scientist

After an initial screening is when you can test for bare-bones skillsets with short to-the-point tests that focus on them. If the job requires Python and SQL as skills, a test that asks candidates to do basic data manipulation or to write down how they would query a dataset using a WHERE clause is appropriate. The point here is to filter out those that are obviously not capable of functioning independently on the team.

When the candidate has already spoken with somebody on the team, and they look appropriate for the role, now is a good time to give an in-depth test of their data science abilities. This is when your expectations of this candidate of actually performing on your team are at their highest, so you want them to walk you through their data science approach. Live. I suggest creating a test for candidates at this point that both tests their ability to function in their roles and to be creative or innovative.

Remember, reciting what hyperparameters can be used to train a model or answering probability theory questions doesn’t always directly translate into innovative problem solving or being a team player. Make sure you test for what your organization values in a Data Scientist.

Using a Pre-Built Data Science Skills Test

No matter what Data Science skillset you want to test for, there is probably a test already designed for it. If you work in Human Resources and/or don’t have the time or experience to design a test of your own, search online for “data science candidate test” and be greeted with a dozen or so hits for websites of companies that provide them.

I have taken these types of tests as a candidate and I feel that these tests are great for screening large numbers of candidates. (I took a test once that basically asked how to select a column in a pandas dataframe). This is what you give candidates before first-round interviews, but be sure to follow the Dos and Don’ts at the end of the article.

Even though you can outsource the actual administering of the test and analyzing of results, here are some questions to ask yourself before engaging:

  • What skills should be tested?
  • What length of time do I want candidates to spend on this test?

Obviously, the results of the tests should tell you who is right for the job you are hiring for, not some other job. Also, respect the candidates’ time by not expecting them to give up a full day for your test. Unless, of course, they are on the very short list of candidates; a handful or so.

As a general rule: the length and involvement of the test given should directly correlate with how serious you are considering them as a candidate for the job.

Create Your Own Test

The reason this lane is reserved for top candidates is that you won’t have the time to administer or review tests for all candidates as a screening mechanism, instead rely on resumes or basic testing for that purpose. I personally find this way more appealing because you can find out how a candidate will behave working on your team. You can design questions specifically around how they would tackle an actual problem in the role, or something hypothetical.

Here are questions to ask yourself while creating the test:

What Skills Should We Test For?

Note: When I saw “we”, I mean the team the new Data Scientist will be working on. This applies even when just one person is making the test.

A comprehensive skills list needs to include basic skills, advanced skills, and nice-to-have skills that can expand your team’s horizons. I suggest writing these in a column and highlighting the basic skills because these have the option of being tested earlier in the interview process. Here is an illustrative example to work from (add and remove whatever you want, don’t get salty in the comments):

Skills (Basic):

  • Python: dataframe manipulation
  • Python: scikit-learn model fit & predict
  • SQL: Joins
  • Missing value imputation

Skills (Advanced):

  • Python: Keras or Tensorflow
  • Python: time series with statsmodels
  • SQL: stored procedures
  • LASSO fit explanation

Skills (Nice-to-have):

  • Natural Language Processing Expert (NLP)
  • Docker expert

Creating your own list should be easy. My suggestions is to enlist the help of the team with a 1/2 hour meeting on the subject. Just ask. The skills that get the most votes are the most important, and frame the meeting request as defining what skills are basic, advanced, and nice-to-have for the role.

What Data Should We Use?

Part of being a Data Scientist is being comfortable handling data and making inferences from data. In my opinion, your test has to have data in it to be viable. Give your candidate something to get their hands on (metaphorically) and manipulate with code or whatever tool you want to see them work with.

For the data you provide, here are some guidelines:

The data is relevant to the industry they would work in. If the job is financial technology, give financial time-series data. If its something in higher education, give them anonymized demographic data.

The data is not confidential or sensitive. This might be a no-brainer, but don’t release sensitive data to non-employee candidates over email. Use publicly available data to create a test from when in doubt, or make up your own.

Make the data a little dirty. This is great to test candidates on what they do with dirty data. This step in testing is often overlooked.

How Long Should the Test Be?

How long would you give yourself to take this test? That’s your answer. If you assign twenty tasks that call for justification down to the basic premise then you have to give more time. Based on tests I have taken and constructed, here are some guidelines:

  • Open-ended analysis of dataset with some guiding questions (3 hours)
  • Business problem with data analysis, visualizations (8 hours)
  • Machine Learning problem, including data cleansing/transformation (24 — 48 hours)

How Do We Analyze the Test Results?

Ask yourself: If this was the end of the month, or the end of a sprint, would this submitted work be at, above, or below the team’s expectations? In my opinion, this is the most natural way to assess the results. Basically, you want to check off that the candidate can handle the easy tasks, do well on the advanced tasks, and give a little indication that they will expand the team’s horizons in at least one direction. Maybe they are better at time-series analysis than you, good communicator, actually add comments to the code or anything else that stands out.

Dos and Don’ts When Testing New Data Scientist Candidates

DO give feedback. No matter what.

There’s nothing worse than no follow up for 3 months, then you get a robotic email saying the position is filled. Everyone reading this knows what I am referring to, and for that reason alone you need to avoid doing it. Job seekers, especially Data Scientists, understand the competitiveness of the market, but they also remember those that treated their time as valuable.

DO test for the skillset relevant to the job.

Along with the job description and interview process, candidates need a realistic image of the job. Giving a test that quizzes on docker containers and the latest in image recognition neural network architecture when you are really looking at data cleansing and automating reports will ensure turnover. Data Scientists leave companies all the time when the expectations are different from reality. Don’t make your organization look like posers.

DO give the test to your current employees.

Seriously, see how well your current employees do on the test. It’s a great barometer for what to expect from your candidates. It’s also a great reality check for the expectation that your new Data Scientist will be an expert at everything data related, where none of your team members are.

DO NOT administer tests prior to any direct human contact.

If your response to receiving an application is to generate an email prompting them to take a test, then you might as well just say: “Thank you for your interest, and even though you’re just a number to us, take time out of your schedule to do this while we provide no genuine interest in you and no indication of any further meaningful contact.” You need to give something, even if in an email, showing your interest in them as a candidate that justifies them spending their time on any testing. In my opinion, if they haven’t spoken to somebody in the department that is conducting the hiring, then they shouldn’t be taking any tests.

DO NOT administer long tests as a part of your screening process.

When I say “screening process”, I am referring to the process of screening out candidates before going to first or second stage interviews. If you want to screen out people that can’t do basic tasks, then a short test for that is appropriate. We are talking a maximum of 30 minutes. Hours of testing just to have the information on file in case you need it is bad practice and a drain on the collective time of the industry.

DO NOT administer those “intelligence tests” that ask you to guess the next sequence of shapes in a series. (You know what I’m talking about).

For real, what the &%$! is this? I’ve been working as a Data Scientist for years and this never comes up unless under the context of what bothers candidates. You never hear back how the test went anyways and it’s not like companies need it as justification to not call you back for interviews, so why bother?

“Thank you for your interest, but your aptitude at guessing the next image in a series of shapes is, by itself, a reason for us to consider other candidates.” — Nobody

Conclusion

Testing is great tool to determine how well a Data Scientist candidate will fit in with your team, but it can also tarnish your organization’s reputation if you do not respect the time of candidates and pose unnecessary hoops to jump through. You can easily make sure the tests you administer align with the job you are hiring for by writing down what skills the job needs with the help of the other Data Scientists. Following the dos and don’ts outlined in this article can be a great outline for how your organization conducts testing of new Data Scientist candidates.

--

--

Data Scientist | British Bake-Off Connoisseur| Recovering Insomniac | Heavy Metal Music Advocate