The world’s leading publication for data science, AI, and ML professionals.

You Need This to Thrive as a Data Scientist – In a Business Environment

Drastically Increment Your Value Contribution and Business Understanding with this Must-Read Book

Picture from Unsplash
Picture from Unsplash

Introduction and Motivation

Thanks to Data Science and Machine Learning, just by writing a few lines of code, we are able to build a model able to detect and classify objects in images, to build a movie recommender system, or to predict the price of a house merely by introducing its characteristics.

But, if there is one thing that Machine Learning allows, it is the complete revolution of the productive environment, of the business and industry world, and in short, of everything that produces value.

The automation of repetitive and monotonous tasks and the prediction of variables with only the introduction of certain input data are marking (and will mark) the path towards a democratic and fair future in which people will be able to engage in creative tasks to satisfy their deep passions and leave the unfulfilling tasks to the machines.

And we, as Data Scientists, have both the power and responsibility to do our part, in the form of applying our knowledge and our art to develop and implement predictive and analytic models, to accelerate the arrival of that future.

One key step of the path towards that exciting future is the implementation of Data Science solutions in business environments. Traditionally, Machine Learning resources like books, courses, videos, and articles lack a clear way to bridge this gap between technical and business knowledge.

Moreover, as Data Maturity grows in organizations it is becoming of paramount importance to have strong knowledge in how to correctly implement Data Science solutions. Because, as we can see in the image below, the more advanced the Data Analysis technique to be applied is the greater the value that it provides, but also the greater its complexity.

Figure by the Author
Figure by the Author

Therefore, it is increasingly necessary the rise of both, leaders with strong analytical knowledge, and developers capable of understanding the business rules and be able to adapt their analytical skills to the context of the organization.

And it is on this subject that we will dive into in today’s article. So if you want to elevate your career and learn how to bridge this gap, this article is for you!

Data Science for Business

Picture from Amazon
Picture from Amazon

Heads up: This article contains affiliate links so that you can comfortably buy this book without any extra charge while contributing to the creation of more posts like this one.

This well-known book, written by Foster Provost and Tom Fawcett is a collection of the most important fundamental concepts of data science and its target audience is the following:

  • Business people working, managing, or venturing data science initiatives
  • Aspiring Data Scientists as a way to prepare for interviews
  • Data Scientists who will benefit from the business perspective offered

It is important to take into account that the book deliberately avoids a too technical approach and relies on a fundamental and conceptual way to introduce its principles.

These principles, once well understood will enable the practitioner to correctly apply the proper solution when faced with a specific problem, and to start solving the puzzle of bringing together the technical and the business worlds.

The book is presented in a way that clearly lays out the correct development of an End to End Data Science solution:

  • Problem definition
  • Application of data science techniques
  • Deployment of results to improve decision making

Contents

Introduction: Data Analytic Thinking: Sets the foundation to understand nowadays context and data opportunities and why they should be seized. It defines Big Data, Data Science, Data Engineering, and Data-Driven Decision Making.

Business Problems and Data Science Solutions: Introduces a set of canonical data analytic tasks, data learning process, and differences between Supervised and Unsupervised learning.

Predictive Modeling: From Correlation to Supervised Segmentation. Shows how to identify informative attributes and segmenting data by progressive attribute selection. Lays out methods to finding correlations, and show how Tree inductions work.

Fitting a Model to Data: Finding optimal parameters based on data. Definition of objective and loss functions. Study of some popular algorithms: Linear Regression, Logistic Regression, Support Vector Machines.

Overfitting and its Avoidance: Introduces the concept of generalization, what fitting and overfitting are, and how to avoid the latter through Complexity Control. Presents techniques such as Cross-Validation, Attribute Selection, Tree pruning, and Regularization. (We discussed some concepts here)

Similarity, Neighbours, and Clusters: Calculating similarity of objects described by data, and using similarity for prediction (Clustering as similarity-based segmentation). Searching for similar entities introducing the techniques of Nearest neighbor methods, Clustering methods, and Distance metrics for calculating similarity.

Decision Analytic Thinking I: What is a Good Model?. What is desired from data science results? Expected value as a key evaluation framework. Comparative baselines, Evaluation metrics, Costs, and benefits, expected profits.

Visualizing Model Performance: Under various kinds of uncertainty. Profit curves, cumulative response curves, Lift curves, ROC curves.

Evidence and Probabilities: Explicit evidence combined with Baye’s Rule, probabilistic reasoning via assumptions of conditional independence, Naive Bayes Classification, Evidence Lift.

Representing and Mining Text: Representation of text for data mining, Bag of Words, TFIDF calculation, N-grams, Stemming, Named entity extraction, Topic models.

Decision Analytic Thinking II: Towards Analytical Engineering. Designing an analytical solution, based on the data, tools, and techniques available.

Data Science and Business Strategy. Principles of success for a data-driven business, acquiring and sustaining competitive advantage via data science, the importance of careful curation of data science capability.

Proposal Review Guide: Lays out a set of key questions to structure and tackle the development of a data solution. Highly recommended to take into account in any business project.

Conclusions and Final Words

If I had to sum up this book in one phrase it would look something like this:

The essential book for leaders and technicians of current, or aspiring, data-driven organizations

I personally believe that the best value that brings this book is that it links greatly the data science concepts to the business rules and objectives. As I experienced myself as an Engineer and Data Scientist, it is very common for us, technical people, to focus so much on the details of the tasks we are dealing with that we lose the big picture.

Data Science for Business does a great job of never losing sight of the goal of bringing these two worlds together, business and data science.

Also, it is very well written, in an easy and digestible way, so its reading is quite light and, although it covers a lot of concepts, its reading is a true delight for those used to much more technical resources and won’t scare those managers or leader who aren’t as technical.

After reading it for the first time, it is a great tool to have at the side and I encourage you to come back to it and refresh the related concepts every time that you face a Data Science challenge.

Like always, I hope that you have enjoyed the post and that you will give a try to this amazing book. You can find it on the following link:

If you liked this post then you can take a look at my other posts on Data Science and Machine Learning here.

If you want to learn more about Machine Learning, Data Science and Artificial Intelligence follow me on Medium and stay tuned for my next posts


Related Articles