Real-Time Simulation-based Analytics Services

Proof of concept using SimPy on AWS, ft. Terraform and LocalStack

Wladimir Hofmann
Towards Data Science

--

Photo by Wolfgang Hasselmann (Unsplash)

In his highly interesting, recently published PhD thesis (German), Toni Donhauser from the Erlangen-Nürnberg University gives an excellent example on how a production-synchronous digital twin can be used for automated, simulation-based order scheduling in masonry plants. As a core feature, the developed simulation allows to initialize the work-in-process of the manufacturing system to precisely mirror the current state and create accurate short-term forecasts, which serve as a basis for comparing alternatives and optimizing production plans in case of unexpected disruptions. Tecnomatix Plant Simulation (Siemens) is used for the implementation of the simulation model.

Since Plant Simulation is known for extensive features as well as for extensive licensing fees, this blog post will present an alternative implementation of such a production-synchronous digital twin, based on open-source frameworks and building on easy-to-operate, pay-per-use AWS infrastructure. The complete setup can be deployed and tested locally using Docker, Terraform, and LocalStack (no AWS account required).

Scenario & Scope

The diagram below shows a fictive and simplified order processing flow , and will serve as a minimal example to illustrate how a digital twin of the system can be implemented.

Simple order processing flow

After being created, orders are received and accepted by the company (ingest-step), and the order-specific raw material is ordered (order_material), leaving the order waiting until the corresponding material arrives (wait_for_material). When the material is delivered, the order proceeds to a buffer queue (wait_for_sop), waiting to be processed in a capacity-constrained production-step, which is only able to process one order at a time. Eventually, the finished order gets delivered to the customer and leaves the system.

Whenever material for an order is requested, an initial estimated time of arrival (ETA) is assigned. However, unexpected supplier-specific process deviations or other delivery problems may introduce delays at any point in time, so that ETA-updates are possible while the order waits for material delivery. Since the production step uses a capacity-constrained resource and represents a possible bottleneck of the system, any unplanned under-utilization here may delay every upcoming order and diminish the system throughput (depending on how tight the schedule looks like). Therefore, it is desirable to be able to quantify the effects of any shift in time as soon as any ETA-update for an order occurs.

Synchronized Digital Twin: Concept and Implementation

The next figure shows a simple event-processing pipeline, able to ingest defined events and to persist the system state (event tracking), which in turn enables the simulation-based creation of forecasts for expected order completions times and delays (event analytics). A simple web-dashboard will be used to visualize the results.

Processing pipeline for simulation-based forecast creation (icons by AWS)

1. Publishing events of data producers

During the processing of an order in the production system, data producers are publishing information on the progress, e.g. start or finish of the processing steps of an order. While those events would actually be happening in the physical manufacturing system, a simulation model might be used to create test-data for the digital twin during development (see this post on Virtual Commissioning for another example of this use-case for logistics simulation).

2. Capturing events with AWS Kinesis

Kinesis is an AWS service for continuous buffering and real-time processing of streaming data. A Kinesis stream decouples data producers and consumers and consists of a configurable number of shards, each of which is able to ingest up to 1 MB or 1000 records of data per second. Each record is put into one shard based on it’s specified partition key value, which gets important since in-order processing of records is guaranteed only on partition key level.
In the described scenario, in-order processing becomes critical for ETA-updates of orders, since the message of an expected delay must not be processed before any earlier submitted update.

3. Processing events with AWS Lambda

Lambda is the function-as-a-service offer of AWS, which allows to run code on-demand, paying for the number of invocations as well as for execution time. Lambda functions can easily be coupled with other services such as SQS and DynamoDB. Since AWS provisions the function runtime on-demand, the short cold-start times of NodeJS and Python make them a popular choice for implementing lambdas.
The lambda implemented for processing order updates is simple and just updates the corresponding item of the affected order in a specified DynamoDB table with data from the event provided as part of the invocation.

4. Persisting the system state with DynamoDB

DynamoDB is used as a fast, flexible and managed NoSQL database. While this type of database by design lacks some of the amenities of relational databases (such as proper means to enforce referential integrity on the database level, or the availability of sophisticated ORMs and schema management tools), it is fine for our simple use-case which just involves updating single items, and basic queries. DynamoDB requires a partition key and optionally a range key, both of which are used in combination to uniquely identify a stored item. For orders the string id can be used as the partition key. A nice feature of DynamoDB is the option to enable streams, automatically providing information on table-updates. This way, order ETA-updates can trigger new forecasts.

5. Simulating the future

AWS allows to use Lambda functions as DynamoDB stream event consumers, so that simulation runs can forecast future order completion times on every state change. For each run, the current system state is fetched from the DynamoDB (which might actually need multiple requests, since a single scan can only return a page of up to 1 MB of data).
Based on the registered process timestamps, the currently relevant process step of each order can be identified.
The simulation model is generated from the process diagram shown above using Casymda. For the sake of simplicity of this proof of concept, processing times are assumed to be deterministic. Model blocks are implemented to account for already elapsed processing time of work-in-process-entities at the start of the simulation (one of the possibilities to initialize online simulation models, discussed in the often-cited paper of Hanisch and Tolujew, 2005, further explored by Hotz, 2007). During the execution, forecast metrics are collected in form of predicted process step completion times.
Currently, AWS allows a Lambda function execution to take up to 15 minutes, so that even complex models can be run this way. However, frequent and long running calculations might make it more attractive to create a dedicated service.

6. + 7. Forecast persistence and visualization

At the end of each run, the gathered results are persisted in a second DynamoDB table, from where a dashboard application can access and visualize the data.
Plotly Dash is a popular framework for analytics web-apps. It enables the quick creation of dynamic dashboards just by writing Python code. Under the hood, it uses flask to serve React websites with plotly charts to a browser. Data queries and analysis are done on the backend using Python. The implemented dashboard just contains a simple gantt-chart. Automatic dashboard refreshes are implemented using an interval-callback to cyclically poll the database for updates. A dashboard’s Docker container could be run on AWS (e.g. ECS/Fargate, but since the free version of LocalStack does not include this, it will just be run locally for demonstration).

Result

To run the setup locally from within the cloned repository, Docker and Terraform need to be installed.
Even though the performance is not comparable to the actual cloud service, LocalStack is an awesome option to mock a multitude of AWS services locally, including Kinesis, Lambda, and DynamoDB. LocalStack can be started in a Docker container, spawning more containers as needed, e.g. for executing Lambdas:

docker-compose up localstack

Before the Lambda functions can be deployed, the function code and its dependencies need to be packaged:

docker-compose up package-ingest-lambda package-simulation-lambda

Terraform is a great and widespread tool which can automatically provision infrastructure resources described in configuration files (however, have a look at this article for a more nuanced analysis). To create all required resources, two terraform commands are needed from within the corresponding directory:

cd terraform
terraform init # (only required once)
terraform apply
# (enter 'yes' when prompted to confirm the changes,
# or use -auto-approve)
cd ../ # return to project root

(To prevent 404 errors when calling apply after a restart of LocalStack without having called terraform destroy, first delete the terraform.tfstate files next to main.tf.)

After the successfull creation, two more containers can be started, one serving the dashboard and one running a simulation model to emulate real event producers:

docker-compose up dashboard emulation

Before (re-)starting any test-run, the DynamoDB-tables need to be cleared:

docker-compose up truncate-tables

http://localhost:8050 should now show the empty dashboard, while http://localhost:5001 should show the Casymda web canvas animation controls.

Sample flow

When starting the emulation, orders will be created at the source and flow through the defined process. At the same time, the dashboard should update with a minor delay and visualize the completion times (x-axis) of the relevant process steps of all orders which are currently present in the system (y-axis). A vertical line in the chart indicates the point in time when the simulation run started and the forecast was created.

Screen-cast of emulation model and forecast dashboard

An interesting situation is created when an expected future material delivery delay of Order-2 is occuring (orange):

Forecasted effects of an expected material delivery delay (orange) of Order-2 on Order-3 (red)

Caused by the capacity constraint of the production step (max. one order concurrently), the delay (orange) of Order-2 should be expected to also delay the start of production of Order-3 (red).

While the presented example is simplistic, there are plenty of extensions imaginable. From a business perspective, it would be interesting to examine a more complex process, including e.g. inventory for raw materials, and different replenishment strategies. Similarly, impacts of stochastic or planned machine maintenance intervals might be evaluated. This might also ask for automatic determination of optimal decision alternatives, considering e.g. order-specific due dates or throughput goals. Interesting technical extensions could include preliminary analytics of ingested event data, using stream processing solutions such as AWS Kinesis Data Analytics in order to identify relevant patterns and trigger forecast/optimization runs only in case of critical process deviations.

--

--