The world’s leading publication for data science, AI, and ML professionals.

Guide into Data Science Interview Questions & Answers (1/5)

When you go to a Data Scientist interview, you will encounter questions covering a wide range of tools, algorithms and technologies that…

Starting a new adventure into DS? Then you need a few things…

Contents: (1), 2, 3

Every day the concept of Data Science keeps evolving and with it we find more concepts of other fields assimilated into data science. That is because we keep discovering new ways of applying the tools that Data Science provides.

As consequence, when you go to a Data Scientist interview, you will encounter questions covering a wide range of tools, algorithms and technologies that try to replicate what you are going to use in your day to day work.

Photo by Christina @ wocintechchat.com on Unsplash
Photo by Christina @ wocintechchat.com on Unsplash

The goal of this article and the following series is to explore together little by little some of the questions and skills that you need to cover to apply for a Data Science Position.

SQL

The first step working with data is…. to be able to gather the datasets that you require so that you can create analytics, reports and models. This includes the data retrieval but as well aggregations, basic data cleaning and filtering.

So let’s start with some examples of simple problems that you may be asked to solve on the spot:

*Students enrolled []**

You have a table with records of students but there are faulty records…

/*
  A table containing the students enrolled in a course has
  incorrect data in records with ids between 40 and 100 (inclusive).
  This is the definition of the table:
    TABLE course_enrollments
      id INTEGER NOT NULL PRIMARY KEY
      year INTEGER NOT NULL
      studentId INTEGER NOT NULL

  Write a query that updates the field 'year' 
  of every faulty record to 2019.
*/

This requires to update the table by filtering the specific ids and specifying the new value as per the requirements:

UPDATE course_enrollments
SET year = 2019
WHERE id BETWEEN 40 AND 100

*Session Duration []**

Now suppose that you are managing a website and you want to understand how your users behave and how successful is your website…

/*
  Write a query that selects the userId and the average 
  session duration for each user who has more than one session.
  Sessions are kept in the following table:

  TABLE sessions
    id INTEGER PRIMARY KEY,
    userId INTEGER NOT NULL,
    duration DECIMAL NOT NULL
*/

The main two points are first the aggregation across userId so that we can calculate the average and second the condition to apply in the aggregation…

SELECT userId, avg(duration)
FROM sessions
GROUP BY userid
HAVING COUNT(*) > 1

*Race participants []**

Retrieve how many race participants we have with the name Jackson.

/*
  Write a query that returns the number of race participants
  whose first name is Jackson.
  String comparisons should be case sensitive.
You are given to following data definition:

  TABLE participants
    id INTEGER PRIMARY KEY,
    firstName VARCHAR(30) NOT NULL,
    lastName VARCHAR(30) NOT NULL
*/

This is quite simple, isn’t it?

SELECT COUNT(*)
FROM participants
WHERE firstname = 'Jackson'

*Faulty records []**

/*
  A table containing the students enrolled in a course has
  incorrect data in records with ids between 40 and 100 (inclusive).

  This is the definition of the table:

    TABLE course_enrollments
      id INTEGER NOT NULL PRIMARY KEY
      year INTEGER NOT NULL
      studentId INTEGER NOT NULL

  Write a query that updates the field 'year' of every faulty record to 2019.
*/

Again the problem definition is longer than the solution…

UPDATE course_enrollments
SET year = 2019
WHERE id BETWEEN 40 AND 100

*Distinct company names []**

/*
  We have information about companies in two separate tables:

  TABLE uk_companies
    id INTEGER NOT NULL PRIMARY KEY,
    name VARCHAR(50) NOT NULL

  TABLE usa_companies
    id INTEGER NOT NULL PRIMARY KEY,
    name VARCHAR(50) NOT NULL

  Write a query that select all distinct company names.
*/

The title is already a big give-away of the problem and the only thing left is to join together the two tables…

SELECT DISTINCT name
FROM (
  SELECT name
  FROM uk_companies
  UNION ALL
  SELECT name from usa_companies
)
Photo by Richard Tao on Unsplash
Photo by Richard Tao on Unsplash

*Create table []**

Sometime we have need to create new tables. In this case there are going to be variations depending on the database (PrestoDB, MySQL, PostgreSQL…)

/*
  We have the following two tables 
  to allocate users and their roles:

  TABLE users
    id INTEGER NOT NULL PRIMARY KEY,
    userName VARCHAR(50) NOT NULL
  TABLE roles
    id INTEGER NOT NULL PRIMARY KEY,
    role VARCHAR(50) NOT NULL

  The new users_roles table should contain the mapping between each user and their roles.
  Each user can have many roles, and each role can have many users.

  Write a SQLite create table statement so that:
  - Only users from the users table can exist within users_roles.
  - Only roles from the roles table can exist within users_roles.
  - A user can only have a specific role once.
*/

Keeping everything tidy, we need to consider the new key that we will consider in our table as well as the primary keys of the existing tables that will become our foreign keys…

CREATE TABLE users_roles (
  userId INTEGER NOT NULL,
  roleId INTEGER NOT NULL,
  FOREIGN KEY(userId) REFERENCES users(id),
  FOREIGN KEY(roleId) REFERENCES roles(id),
  PRIMARY KEY (userId, roleId)
)

*Company Employees []**

/*
  We own a company with the following employee hierarchy.
  We consider that an employee is a manager if other employee 
  includes in his/her managerId the first employees id.
  And any manager may or may not also have a manager.

  TABLE employees
    id INTEGER NOT NULL PRIMARY KEY
    managerId INTEGER REFERENCES employees(id)
    name VARCHAR(50) NOT NULL

  Write a query that selects the names of employees 
  who are not managers.
*/

This can be solved with an inner join of the table with itself as follows:

SELECT name
FROM employees
WHERE id NOT IN (
  SELECT emp1.Id
  FROM employees emp1
  INNER JOIN employees emp2 ON emp1.id = emp2.managerid
)
Photo by Dallas Reedy on Unsplash
Photo by Dallas Reedy on Unsplash

Well done arriving at this point. With this, we close the chapter on SQL and I hope you enjoyed it, and maybe you even learnt something new with these simple examples.

If you already use SQL on your daily routine, then probably this has been too easy. It is true that there is much more to explore in SQL queries (going into the performance of the queries and more complex joins and filters for example) but interviews are time limited. So unless your role specifically focuses on the data management…. your interviewer will move on to other topics like the ones we are about to cover in the following articles.

That will include probability, machine learning models, deep learning and much more….

Contents: (1)

[*] These queries are examples similar to the queries that I use on initial assessments of the people that I interview, but we aware of other queries that may involve more complex tasks… I cannot give everything away, right?

Thanks for reading

I hope you have enjoyed this article. Please leave your thoughts and ideas if you are interested in the topic.


Related Articles