What is Data Science?

Interviewing for a Data Scientist position

Sheenal Srivastava
Towards Data Science

--

A long time ago, I graduated with a degree in Computer Bioinformatics. When people asked me what I had studied, I would provide them with the name of the major, they would try to pronounce it, stumble a couple of times and then give up. I found it tiresome to explain to them that my major allowed me to write software for biologists to analyze biological data and evaluate it using statistics.

After I graduated, I struggled to find a job that allowed me to combine my research, programming and statistical skills. I looked far and wide and kept falling into roles that did not deliver what they had promised. One role involved a lot of research, but it involved reading government publications. It also involved some statistics that was possible with the minimum sample size of 30 (allowing me to assume that the central limit theorem was satisfied and the distribution was normal — which it never was).

I moved from job to job looking for the golden egg and despite the increasing demand for analysts, programmers, and now data scientists and engineers, I couldn’t find a role that allowed me to do much. I ended up managing projects, building dashboards, preparing elaborate PowerPoint presentations with a plethora of pie, bar, and scatter plots, and participating in Hackathons. None of these roles satisfied me and I came home feeling dejected and with a lost sense of purpose. Many a days when people marveled at how lucky I was to be a “Data Scientist” I cynically replied that all I did was refresh my inbox during my work day — which some days and even some weeks and months is all I did as I waited to receive data, waited for those above me to make up their mind, and for the tools to be able to do any work.

Most recently, I worked as a Data Scientist where all I did was query a database and put together tables. These tables were then changed over a number of weeks to make them more optimal to access. All the field names were changed to meet some obscure formatting requirements. These tables were then loaded and reloaded onto different sections of the database. I felt myself drowning in work that was nothingness leading to despair. People kept remarking that I was lucky to have a job where I was not doing anything. It was relaxed and paid well. But, without a family and friends, the 40 hours of emptiness at work only added to my state of loneliness and gloom.

I do not want anyone else to go through what I have been through after they have achieved the most touted and sexist job of the century — the Data Scientist. Currently, job descriptions ask for every programming language, data visualisation software, and data ETL tool under the hood as required skills. These job ads also ask for presentation and story-telling skills, statistical knowledge, and ability to explain and carry out machine learning. However, despite completing Kaggle-like case studies, going through 7 rounds of interviews, you might end up in a job where for months and even years, all you ever do is “SELECT * from A”.

A lot of organisations hiring Data Scientists are only hiring them because everyone else is and they don’t want to fall behind in this rat race. They want the brightest, the smartest, and the most talented individuals to work for them. And they want them right now, before they have an analytics platform, before they even have data that is accessible, and even before they figure out why they need Data Scientists. What are they essentially trying to do, well they are trying to put in no-nonsense terms — “Lipstick on a pig”.

Lipstick on a pig

So how do you avoid succumbing to such glamorous job descriptions which within a few days you quickly find out are nothing but grunt data ETL roles. Well, maybe you first need to take a step back and understand “What is data science” for you.

For me, it is a scientific way of approaching a business problem. This means that you spend some time trying to understand the situation, the current business context and what is the end goal that the business is trying to achieve. If their aim is to predict customer churn, then understand why so and if yes, how quickly do they want this information and will it be feasible to implement the solution.

The next task is for you as a Data Scientist to attack this problem by coming up with some data hypotheses, similar to the null or alternate hypotheses you developed in STAT101 at university. An example is, there is no relationship between whether or not a customer will churn and their tenure with the company, their age, and <insert all features>.

Your next step is to create an analytical dataset with existing variables from the data you have and engineered features based on speaking to the business regarding factors that may influence customer churn. Once you have this engineered dataset, the world is your oyster as you run various machine learning models and tune these models to improve model accuracy. It is an iterative approach as you relay these findings to the business, and add and remove variables from your model before you are ready to have it implemented. Ensure that you keep the tech people within the business in your radar as you will need to liaise with them to ensure that the model can be implemented to provide the business with timely results.

So, this is my version. Now ask yourself and the people hiring you, whether that is what you will do in the role and whether you can really carry out the above, before you apply for that shiny diamond quality Data Scientist position.

--

--