The 2x2 Data Science Skills Matrix that Harvard Business Review got completely wrong!

Published in

Towards Data Science

3 min readNov 14, 2018

Data Science is the current buzzword in the market. Every company at the moment is looking to hire Data Science Professionals to solve some Data problem that they themselves are not aware of currently. Machine Learning has taken over the industry by storm and we have a bunch of self taught Data Scientists in the market.

Since this Data Science word is an altogether different universe, it is very difficult to set up priorities on what to learn and what not to. So in this case the Harvard Business Review published an article on what you as a company or individual should give importance to. Let’s have a look.

Figure 1. The empty matrix taken from HBR original Article (Source)

This is the empty matrix shared by the HBR so that you can prioritise your learning path. Now let’s look what they filled in for Data Science:

Figure 2. The Data Science Matrix proposed by HBR.

Now lets look at inference one can make from this matrix.

Plan to learn Machine learning but ignore the Predictive Analysis as it is not useful
Plan to learn Machine learning but ignore the Mathematics as it is not useful and very time consuming to learn.
Plan to learn Statistical Programming but ignore the Statistics & Mathematics as it is very time consuming to learn.
Learn Data Science now but don’t care much about Data cleaning.

I am not an expert but even to a novice these 4 statements would look like a sarcasm. Let’s take an example.

Suppose you are working on a data which has 800 features with 1000 records. Now you find out that you need to reduce the features as most of the features are redundant(only if someone told you) but keep in mind you ignored all the mathematics or statistics so you don’t know about Maths or Statistics. Since you just browsed the Data cleaning you are not even aware how to clean the data. So in this case how will you approach this situation?

Ok so you google “Reduce features in dataset” or somewhere while learning Machine Learning you found out that there is something called PCA that does this job for you.

So next step is you google “PCA sklearn”. Check the documentation and apply the following:

from sklearn.decomposition import PCA
X = PCA(0.99).fit_transform(X)

Great, now your feature set is reduced to 60 and your training results are good. Now my question to you is:

Do you want to be this type of developer that just get the things done ?
Do you want to be a developer that just know how to do, rather than the internal functioning on how it happens behind the scenes?

If you answered ‘yes’ to both the questions then you are a perfect fit for the companies who just want to get things done rather than mastering them. Then ‘Software Engineer’ is a good post for you.

If you answered ‘no’ to both the questions then you are a perfect fit for the companies who are expanding their research division at the moment. Then ‘Research Engineer’ is a good post for you.

In case you want to approach the problem of PCA from research point of view, you would exactly use the same code but with a different thinking behind the scenes. Because then you would visualise the relationship between features, combine new features, calculate the co-relation, calculate the eigenvalues, eigenvectors and finally select those features that contribute 99% to the variance.

The choice to follow this matrix is yours. Choose wisely!.

If you have any point please do let me know in the comments or on LinkedIn.

The 2x2 Data Science Skills Matrix that Harvard Business Review got completely wrong!

Written by Harveen Singh Chadha