The world’s leading publication for data science, AI, and ML professionals.

Basic Business Concepts You NEED to Know as a Data Scientist

A beginner's guide and a refresher on essential business fundamentals

Photo by Sean Pollock on Unsplash
Photo by Sean Pollock on Unsplash

Table of Content

  1. The Primary Goal of a Business
  2. Revenue Maximization: Product and Marketing
  3. Cost Minimization: Operations and Customer Service
  4. OKRs and KPIs
  5. Planning Phase of the Machine Learning Life Cycle

If I were to ask you what data science is all about, what would you think?

I bet a lot of you would think about big data, machine learning models, and Artificial Intelligence. But I bet that you didn’t think about business, which is completely reasonable. If data science was all about business, companies wouldn’t be hiring those with a STEM background.

That being said, having a basic understanding of business fundamentals is essential for all data scientists. By understanding business, you’ll better understand the motive(s) behind each data science project and how each project benefits the overall company.

With that said, let’s talk about the main objective of a Business.


The Primary Goal of a Business

Photo by Sharon McCutcheon on Unsplash
Photo by Sharon McCutcheon on Unsplash

The primary goal of a business is to maximize profits for its owners

Owners refer to the founders of the company, and if the company is public, the shareholders of the company as well. It’s essential that you understand this point because every company’s decision revolves around maximizing profits.

Profits are equal to the differences between revenue and cost. Revenue is defined as income. Cost, also known as expenses, is defined as the money spent on company operations to generate revenue.

Why is this important for you to know? Because every initiative in business (eg. a Data Science project) is undergone to either increase revenues or decrease costs to maximize profits. By nature, different departments of the business tend to fall into one of the two categories: revenue-maximizing and cost-minimizing.


Revenue Maximization

The two main departments that fall into this category are product and marketing:

Product and Data Science

Photo by Rachit Tank on Unsplash
Photo by Rachit Tank on Unsplash

A product can be a good, service, or idea and is created to satisfy a want or a need. For example, a car is a good and is created to satisfy the need for transportation.

However, in the case of business departments, product refers to all activities related to product R&D, development, and maintenance. Don’t get this definition of "product" confused with the one above!

Data science serves two purposes when it comes to product; to create new products and to enhance existing products. Autonomous cars are an example of how data science has helped create new products. More commonly however, data science is used to enhance existing offerings. A popular example of this is how Netflix developed a state of the art recommender system.

Marketing and Data Science

Photo by Merakist on Unsplash
Photo by Merakist on Unsplash

Marketing is formally defined as all activities related to promoting the buying or selling of a product. Marketing includes things like advertising, pricing, and understanding one’s target market. Marketing used to be more of an art than a science, but data science has revolutionized the way companies do marketing.

Examples of how data science is used in marketing include the following:

  • Marketing attribution: There are dozens of marketing channels that a company can use. Social media, affiliate marketing, SEO, SEM, blogging, TV, radio, and more. But how does a company know which marketing channels are more effective than others? Data science is used to try to measure the impact of various marketing channels through methods like attribution modeling and marketing mix modeling.
  • Pricing and Discount Optimization: By increasing the price of a product, it becomes more profitable but fewer customers will want to buy the product and vice versa. This means that there’s an optimal price for every product that maximizes profit. Data science is used to find that optimal price.
  • Customer Segmentation: While it may seem surprising, many businesses don’t know their customers that well. Clustering methods are commonly used to better understand the characteristics of one’s customer base. For example, if a company realizes through clustering that their product is popular among teenagers instead of middle-aged adults, then the company can refocus their advertising strategies.
  • Churn Prediction modeling: When a customer churns, it means that they decided to stop doing business with the company. For example, if a company was with cellphone company A but moves to company B, then the customer churned from company A. Companies are now developing models to predict when a customer will churn and are creating retention plans to keep their customers.

Cost Minimization

Data science activities that fall into this category can be described in one word; automation. The two main departments where data science plays a huge role is operations and customer service.

Operations and Data Science

Photo by Ruchindra Gunasekara on Unsplash
Photo by Ruchindra Gunasekara on Unsplash

Operations in business refer to the activities that a business engages in on a daily basis. Typically, the main goal in operations is to make tasks and processes as efficient as possible – this means maximizing time to complete things while minimizing errors. Humans are inefficient and costly, but automation can replace humans in repetitive and mundane tasks. A popular example of automation in operations is Amazon’s warehouse robots.

Customer Service and Data Science

Customer service refers to the support that a company offers to its customers. As a company grows in size, it’s fair to assume that there will be a proportionate increase in the number of customers that will need support. Like I said before, humans are inefficiently and costly, which is why data science has also revolutionized customer service. The most prevalent example is the development of chatbots. While chatbots can’t completely replace humans (yet), they can answer simple questions and redirect customers to the right service representatives through NLP algorithms.


OKRs and KPIs

Now that you know the primary goal of a business and how every initiative is conducted to increase revenue or decrease costs, we can talk about OKRs and KPIs.

OKR is short for Objectives & Key Results and it is a framework for defining and tracking objectives and their outcomes – you can think of them as goals. Typically, each department has its own OKRs that they set every quarter. For example, the marketing department might have an OKR to increase conversions from X to Y.

KPI is short for Key Performance Indicators and they are essentially metrics that show how effectively a company is achieving its business objectives. Continuing with the marketing example, a KPI for the objective above can be an increase in website traffic or an increase in marketing ROI.

These two terms are important to know as a data scientist because they’re a part of the planning phase of the Machine Learning life cycle, which I’ll talk about next.


Planning Phase of Machine Learning Life Cycle

Before you start any machine learning project, there are a number of things that you need to plan. The main point of this step is so that the company can understand how a given project will add value to the company, whether it be through increasing revenue or decreasing costs.

Planning includes the following task:

  • State the problem that you are trying to solve. This may seem like an easy step, but you’d be surprised at how often people try to come up with a solution to a problem that doesn’t exist or a problem that isn’t really a problem.
  • Define the business objective that you are trying to achieve in order to solve the problem. This sounds like defining an OKR doesn’t it? It’s not a coincidence.
  • Determine the target variable if applicable and potential feature variables that you may want to look at. For example, if the objective is to decrease the number of fraudulent transactions, you’ll most likely want labeled data of both fraudulent and non-fraudulent transactions. You may also require features like the time of the transaction, the account ID, and the user’s ID.
  • Consider any limitations, contingencies, and risks. This includes, but is not limited to, things like resource limitations (lack of capital, employees, or time), infrastructure limitations (eg. lack of computing power to train a complex neural network), and data limitations (unstructured data, lack of data points, uninterpretable data, etc)
  • Establish your success metrics. And this sounds like a KPI doesn’t it? How will you know that you’ve been successful in achieving your objective? Is it a success if your machine learning model is 90% accurate? What about 85%? Is accuracy the most suitable metric for your business problem? Check out my article on several metrics that data scientists use to evaluate their models.

Thanks for reading!

If you like my work and want to support me…

  1. The BEST way to support me is by following me on Medium here.
  2. Be one of the FIRST to follow me on Twitter here. I’ll be posting lots of updates and interesting stuff here!
  3. Also, be one of the FIRST to subscribe to my new YouTube channel here!
  4. Follow me on LinkedIn here.
  5. Sign up on my email list here.
  6. Check out my website, terenceshin.com.

Related Articles