Hybrid — Cloud/Edge-based — AIML Model Deployment Architecture for Data Scientists

Huzaifa Kapasi
Towards Data Science
6 min readJan 19, 2021

--

AIML Application Service Unified Architecture @Copyright -Image by Author

Introduction

The deployment of Machine Learning models is one of the key elements of the Data Science Process. Often Data Scientists struggles on the deployment part to expose the model as a seamless API that can be consumed by a number of endpoints. In this article, I will explain the architecture and design of the API — both on-prem and cloud. The most popular frameworks for Data Scientist are python-based Flask API server and Plumber for R users. We will explore how to use these frameworks in the production environment to make the API more robust, scalable, and fault-tolerant.

Platform as a service (PaaS) or On-prem servers

The core of the web application is the servers on which the application will reside and execute. It forms the foundation of web applications. It is also referred to as the production environment for the application. It is either a single server, cluster of servers either deployed on-prem or on the cloud. The hosting could be bare metal HW like Dell, IBM, etc for on-prem on which we build the applications, or Infrastructure as a service which is cloud-based bare metal HW like EC2 for Amazon, VM for azure. Another option from cloud deployment is Platform-as-a-service, where all components of web-application from bare metal HW to OS, Web Framework, Web servers are provided as fully managed off the shelf service. As shown in the diagram, Azure App Service, Amazon Elastic Bean Stalk, GCP App Engine are a few popular cloud-based PaaS.

OS — Operating system forms the main system for Control and Management for the HWs, SW resources on which computer programs can execute. One can use Linux, Windows, or macOS based core OS for web applications. Most popular is Linux based flavors for both on-prem and could applications.

Web-Servers

Webservers respond to HTTP requests from clients, as shown in the figure, and respond back with status code and content in JSON, XML, or HTML format. From a Data Science perspective, JSON format is the most popular form of web server response. It also works as a reverse proxy to WSGI servers. Reverse Proxy is just a middle layer between WSGI and the outside world. Webservers face the outside world and re-route the requests to WSGI servers. As your application grows, we would want to optimize and distribute it across servers (VPS) to be able to handle more connections simultaneously. Having a reverse-proxy in front of your application server(s) helps to scale seamlessly.

Load balancing across multiple application instances is a commonly used technique for optimizing resource utilization, maximizing throughput, reducing latency, and ensuring fault-tolerant configurations.

Web servers have an efficient HTTP load balancer to distribute traffic to several application servers and to improve the performance, scalability, and reliability of web applications.

Popular Web servers are APACHE and NGINX. Apache has been a leading web server for decades, but NGINX has captured about 50% of web traffic now. NGINX has a performance edge over apache and relatively easy to configure. If you want to use apache, it comes preinstalled with Linux distributions.

According to data from w3techs, Nginx's market share has been steadily growing, pushing Apache out and dethroning it from the first place.

If you want more details on Apache Vs NGINX, you can visit https://kinsta.com/blog/nginx-vs-apache/

For HTTP requests, if you do not want to enter direct server IP, but with URL, it is best to get domain name registration and also get the SSL/TLS certificates to secure and encrypt the transactions to your webserver. Since the webserver is world-facing, it is exposed to a number of security threats hence it is important to secure it with HTTPS encryption.

For on-prem deployment, where API server is used as a microservice for other internal application, direct IP based HTTP access is used. We can secure the server with SSH keys and a Firewall if needed.

WSGI Servers

Apache, NGINX does not understand the way to run Python web applications. WSGI — Web Server Gateway Interface — is designed specifically to understand the python application and pass it to Web server like Apache, NGINX. It is standardized to promote web application portability across different WSGI server implementations like Gunicorn Mod_wagi,uwsgi, etc. If you are using Apache then Mod_wsgi is used and for NGINX, Gunicorn is a popular choice as it works with minimal configurations.

Image by Author

If you are using cloud-based app services, it uses off the shelf Gunicorn WSGI servers as default.

More details on WSGI can be found @

https://wsgi.readthedocs.io/en/latest/servers.html

https://www.appdynamics.com/blog/engineering/an-introduction-to-python-wsgi-servers-part-1/

Web Frameworks

Web Frameworks are libraries that take care of the utilities to build web applications like HTTP requests, response, URL routing, authentication, data handling, security, sessions.

Popular Web servers for Data scientists are Flask for Python developers and Plumber for R developers. Both Flask and Plumber have the capability to connect with WSGI servers seamlessly.

Note how web framework interacts with application source code, models, and backend database if any. Typically a top wrapper — python or R- will call web framework instances along with other dependencies to create the application.

Also Note that Cloud app services will only provide containerization, WSGI support, and inbuilt web servers with fully managed services. The creation of an application using a web framework is the low-level code that Data Scientists need to write.

Supervisor

Ref- https://www.opensourceforu.com/2019/10/how-to-run-multiple-services-inside-a-single-container-using-supervisord/

Imagine the situation when your Flask or Plumber app crashes or the actual server is rebooted. To maintain continuous availability, the persistence of your application, and make the application more fault-tolerant, we need a process manager that takes care of it all.

The supervisor is a client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.

Workflow for REST Webserver

1. Register the Domain name and get SSL Certification from a domain provider like namescheap.com. If Internal Microservice call, make HTTP call through server IP address with necessary firewall and SSH Key

2. Setup the webserver. Apache or NGINX are the two most popular web servers.

3. Setup the WSGI server. If using Apache then use Mod_WSGI. If using NGINX use Gunicorn. Most cloud providers have Gunicorn WSGI as default.

4. Install web frameworks that support Python or R. If using python, go for Flask. If using R goes for Plumber.

5. Write your application wrapper that calls instances of Webserver, your model, backend Database with request/response endpoints.

6. Install supervisor to take care of some of the aspects of fault tolerance in webserver/applications.

7. Create a dockerized container for dynamic load balancing /scale requirements.

In the next article, we will go through the steps for the setup/installation of each block of the architecture. We shall see the setup of both Apache and NGINX

References

  1. https://wsgi.readthedocs.io/en/latest/servers.html
  2. https://www.appdynamics.com/blog/engineering/an-introduction-to-python-wsgi-servers-part-1/
  3. https://www.opensourceforu.com/2019/10/how-to-run-multiple-services-inside-a-single-container-using-supervisord/
  4. https://www.digitalocean.com/community/tutorials/how-to-install-and-manage-supervisor-on-ubuntu-and-debian-vps

--

--

Huzaifa Kapasi is Double MS Full time Res. from Warwick University. 15+ Years’ experience in Machine Learning, AI, big data, Cloud, Signal Processing Algorithms