The world’s leading publication for data science, AI, and ML professionals.

On-premise or Cloud? Where Should Manufacturers Implement Machine Learning Platforms?

Financial and application perspectives when building a ML platform

Notes from Industry

A customer contacted me recently regarding the machine learning platform. As a manager, he manages not only the operation and maintenance of the platform but also compliance and user requests. Sometimes, it can be challenging to fulfill the expectation of different stakeholders. Moreover, there’s a remarkable trend for businesses turning to the cloud. "Is there any best practice for enterprise to choose between on-premise or cloud? " he said.

In this article, we will cover the following topics

  • What is Machine Learning
  • What’s the benefit of implementing Machine Learning?
  • What are the scenarios in manufacturing?
  • Comparison between on-premises and cloud from financial perspective
  • Comparison between on-premises and cloud from application perspective
  • Summary

What is Machine Learning?

Machine Learning is part of artificial intelligence. It is the study of computer algorithms which can make prediction and decisions based on amount of sample data. The algorithm evolves as the amount of data grows. For example, social media such as Instagram predicts our preference based on our views and likes, even put tags on our profile. As time goes by, Instagram will have a more comprehensive tagging and customer profile, which leads to a more precise prediction.

What’s the benefit of implementing Machine Learning platform?

As with most of the technologies, ML helps us to create more value in two ways. First is increasing revenue through increasing capacity, creating new services or products, and enhancing customer engagement. Another angle will be decreasing cost through streamlining the business process and improving efficiency.

Let’s see what are the scenarios that Machine Learning can help manufactures.

What are the scenarios in manufacturing?

Manufacturing process can be very complex throughout the raw materials to final products. In general, there are big phases including raw material extraction, manufacturing production, transportation, disposal recycling. If we take a closer look into manufacturing production, it contains product design, scheduling, manufacturing itself, and supply chain management. In this article, we will focus on how to use machine learning to reduce error and increase capacity. If you are interested in learning more about different scenarios, feel free to check here.

Comparison between on-premises and cloud from financial perspective

The most important factor for enterprises when evaluating whether turning to cloud is probably the finance aspect. There are different top of mind for rapidly growing company, mature company, and conservative company:

Rapid growing company focuses on revenue growth

Rapid growing company focuses on how to acquire more customers. Many companies would gain more resources, including talent and capital, to achieve their goal through fundraising. For example, Shopify needs to expand its global footprints for acquiring more customers. Also, spending on R&D for new features to differentiate themselves among competitors. All these need resources. The investor would evaluate the company to decide how much they should invest. For the company at this stage, it is more important to focus on business development rather than worry about IT infrastructure planning in a global scale. Cloud service providers are good partners to team up with as they have data center coverage globally.

Mature company wants to keep margin

Mature company has a rather stable business model. The top of mind of the C suite would be maintaining the margin. Let’s take Foxconn as an example, when the sales of its electrical components have been stable, the next question would be: how to keep the cost low? One of the nature of cloud computing is flexible cost structure, which would help companies lower the cost while the sales decrease, so to keep the margin.

Conservative company optimizes free cash flow

Rick management would be crucial for conservative companies, especially during the special situation. The company prefers to have enough free cash flow for uncertainty so that it can keep operating in the market. For example, many companies are affected during the pandemic. They can’t keep business running due to the lockdown. If they don’t have enough cash to pay the bill, such as salaries or rent, it would be an alert. Cloud computing turns CAPEX into operating cost to avoid upfront cost and keep cash on hand.

Comparison between on-premises and cloud from other perspectives

There are other benefits when considering turn to the cloud:

Flexibility

In the past, IT expense was considered as CAPEX, which are funds used by a company to acquire, upgrade, and maintain physical assets such as technology or equipment. The company needed to plan ahead for forthcoming demand, such as storage and computing power, from the business side. The consequence of this approach was the waste of idle machines and budget. Take World Cup as an example, Broadcasters would invest hugely to fulfill streaming demands globally when the game took place. However, the resources became obsolete after the season. If the broadcaster leverages cloud computing, the budget allocates more efficiently as the machine can switch on and off on an hourly basis. The broadcaster won’t need to pay when the machine is not in use. This strategy also applies to shopping festivals for e-commerce or surge demand for manufacturers.

Time to market

TTM, or Time to market, are synonyms describing the period of time it takes from initial idea to finished product. When expanding business overseas, companies demand the fast provisioning of applications to speed up time to market by leveraging the global footprints of cloud service providers, saving time on purchasing property, equipment, and construction.

Security

Many companies manage data security by themselves. However, it’s worthwhile to consider leveraging the insights from the cloud service providers as they’ve been targeted by hackers for a long time. These actionable insights coming from vast sources including billions of web pages, emails, updates, and authentications. After gathering the data, thousands of global cybersecurity experts use machine learning to analyze and generate insights to help the companies detect threats faster. We used to keep our valuables in a safe deposit box from a local bank, why not apply the same strategy for your valuable data?


Case Study

A customer of mine recently has just migrated its machine learning platform, including data warehouse, to the cloud:

Problem Statement

Some industries make a large variety of products in small quantities, or high-mix low-volume (HMLV), in response to numerous orders placed by their customers. Comparing to another end of the spectrum, low-mix and high-volume, large quantity doesn’t equal high profit. The production line becomes much busier but the profit doesn’t grow because of defect loss or changeover. On top of that, manufacturers are looking for a new business model aiming to diversify their portfolio and create more value for the company. To tackle these challenges, the customer turning the machine learning platform to the cloud as the first step.

Solution

We used C# with Azure Storage SDK for data split and upload. PolyBase allowed processing data using native SQL queries from external data sources. Databricks provides a ready-to-go environment for build, train, manage, and deploy machine learning models. The last step was to go through Azure Machine Learning Service and Container Registry, which streamline the whole model training process including containerize and deployment. For the next step, the customer plans to implement Azure Analysis Service and Power BI for visualization.

Reference Architecture by Microsoft on Azure Icons
Reference Architecture by Microsoft on Azure Icons

Result

As the project goes live, I’ve collected user feedback as follow:

The first is optimized user experience. In the past, data scientists needed to request data access when training the data or publishing the model due to internal regulation. The process can be a hassle. Now, with the streamlined process, the data scientist can collect, download, analyze, and push the model live all in one place.

Time to market has been decreased significantly. Before the platform turning to the cloud, data scientists spent time waiting for the data to download and upload. The computing power also limited by personal computers. As a result, it took days to finish a model training. As the platform goes live on the cloud, data scientists can ingest, train, test and deploy the model within hours. This allowed the customer responding the root cause faster and as a result, improve capacity.

The team now conducts a more comprehensive analysis as more data sources are available. As time goes by, various data sources are using in the fab, such as web service, database, or VM. We built the platform absorbing different data and trained the user to analyze easily. On top of that, there are many services on the cloud which the users didn’t realize before, such as API management and Kubernetes service. This helps the customer to understand its data better and make quality analyses.

Finally, the project also makes a positive impact on employer branding in recruitment and employee retention. The world has gone massively digital and companies must support their tech talent, including providing them with meaningful work and offering them opportunities to learn. Tech talents see themselves to be more impactful in the career along the way turning to the cloud.

Challenges

Looking back, there were challenges during the project implementation:

Rewrite Job and Script

There are many jobs and scripts for the existing on-premises Data Warehouse and there were not many local consultancies can maintain. Fortunately, the customer has engineers to work with us. It’s important to have someone who understands the logic and code. The team spent extra hours working with customers to rewrite and put everything to the cloud.

Parallel Processing

In the as-is environment, many python codes weren’t designed for Parallel Processing. To enjoy the power of parallelization on the cloud, we need to redesign the code. We conducted workshops to educate the customers about the concepts and practices of parallel processing and spark. It’s a necessary step to move from one single machine to clusters.

Network Bandwidth determines the speed of data upload

We’ve calculated the speed of data upload to catch up with the project timeline. There are different solutions to transfer the data from your on-premises environment to the cloud. One is upload or Network transfer, and another is offline transfer using shippable devices. Considering the data size, transfer frequency, and network to determine which solution works for you. For more detail please check Azure Box.

Complex on-premises system integration

Not every on-premises system fits the cloud. For example, when all the related or linked systems are on-premises, considering the complex architecture or internal regulations, it’s a recommendation to implement in phases. In this project, we picked one specific fabrication and use it as a model for the rest to follow.

The learning curve

Engineers are usually busy as they’ve been occupied by different tasks, such as cross-team meetings, Production Yield Analysis, Overall Equipment Effectiveness Analysis, optimization plan, and system development. Even though the toolchain on the cloud is comprehensive, it is usual that the engineers use limited functions. Little by little, the proficiency increases with increased hands-on experience they have.

Summary

The cost of turning to the cloud isn’t the most cost-effective way for some companies, such as server manufacturers or original design manufacturers, as their cost of having storage or computing powers is much lower than the others. In other words, the TCO, or Total Cost Optimization, of turning to the cloud doesn’t necessarily ideal for them. However, there are many key factors to consider, such as flexibility, time to market, and security. The impact of these factors is usually not easy to convert into a dollar amount. In general, I recommend the companies experiment the Cloud Computing to reduce costs, improve processes, meet compliance requirements, gain agility and even create new services.

Next Step

This project is just the beginning of the journey. The chief digital officer plans to collect real-time data in the fab, including equipment, environment, and telemetry data, and outside the fab, such as sales data. The middle and long-term goals will be Data Governance and Common Data Platform. For more information, please check Common Data Model.

Stay Connected


Related Articles