How To Axe Db2 But Keep Your Code

Published in

Towards Data Science

11 min readMar 10, 2020

Monte Zweben, Gene Davis and Carol Pearson

This is a story about modernizing a legacy application powered by DB2. It depicts a journey of migrating the application to the cloud and extending it with AI — WITHOUT changing any application or SQL code within that legacy application.

Background

We recently partnered with Accenture to meet one of the most iconic financial services companies in the world who realized that in order to rapidly enter new emerging markets, it needed to make its enterprise business system more nimble. This application runs significant portions of its business. It manages customers, touchpoints, products, contracts, and payments/ disbursements across several businesses and countries. The company had invested heavily in the application’s business logic and underlying SQL code over decades and did not want to throw away this very competitive asset.

The problem the company faced was that it wanted to swiftly expand into new emerging markets. To maximize its speed to market, the company needed to implement this comprehensive application in record time with newly configured products and services that met the local demands and the regulations of the new market. Unfortunately, this traditionally required building a new data center in each territory, but this was overly restrictive.

Additionally, the company launched a new SaaS business offering to other companies globally. They were now faced with the need to rapidly scale the implementation and operation of the application as they acquired new SaaS customers.

Solution: Cloud Migration

A cloud-based architecture would enable the business to meet its goal of significantly accelerating its time to market by configuring its products and services according to local requirements. For the IT organization, the benefits of moving to the cloud were also obvious. IT wanted to avoid vendor lock-in and retain cloud sovereignty, instantly spin up storage and compute resources without going through procurement for new hardware — and choose from a wide variety of instantly available services. By migrating to the cloud, the company would have the predictability and flexibility of consumption-based pricing so it could avoid blowing up its IT budget.

A cloud-based architecture would uniquely solve the business and IT problems without building new data centers in every market.

Extending the Application: Injecting AI

But the company has greater aspirations than just cloud agility. They would like to employ machine learning (ML) directly in the real-time business application. They already invested in AI and ML, and built several predictive models that rivaled human teams at detecting fraud, personalizing marketing offers, pricing risk, and the like. But they cannot operationalize these models directly into the enterprise business system due to architectural constraints.

Most deployments of ML have separate stages of computation. Usually, raw operational data is extracted from transactional systems, transformed into summaries of customer and relevant data by analytical engines, and then sent to statistical machine learning frameworks to train models. This is an inherently slow process and exacerbated by the delays in model deployment which are typically separate from the business application. Sometimes this process can be so long that the operational data changes too fast for the models to predict accurately.

The company has a broader vision. What if you could inject predictive models directly into the business application to predict claim fraud directly upon first notice of loss, or present an extremely timely and personalized offer to the customer as they call the call center about a billing problem?

For this, they would like to explore a new approach. Call it an “application retrofit”. One where the cloud migration would give them agility, but perhaps also serve as the foundation of a new scalable platform that can power in-line intelligence where ML was native to the business application.

So their question is whether there are modern databases that can facilitate this cloud and AI journey.

From DB2 to Cloud

For decades, DB2 databases have reigned supreme in industry, and with good reason: DB2 powers applications with reliable, ACID-compliant performance and plenty of horsepower. Architects and developers love its reliability, comprehensive SQL and low-latency operations.

The company realized that, if it wanted to take advantage of tectonic shifts in the availability of diverse data, AI, and the Cloud, there may be other modern database options. In addition to not being a cloud native solution, the costs of scaling up DB2 could quickly become untenable. Even for companies that can afford these costs, they would still have to build and maintain an ETL pipeline between operational DB2 instances, analytical DB2 instances, and separate ML/AI frameworks. Building ML models on a separate platform would make operationalizing them cumbersome and hamper their ability to take in-the-moment actions at scale.

The Cloud Lock-In Problem

In addition to these functional requirements, the company does not want to be locked in to one cloud infrastructure vendor. Every cloud vendor offers innumerable compute and storage primitives that they lure companies into composing together until they realize they just got their feet stuck in proprietary cement. They’re left with tons of integration scripts and latency connecting proprietary cloud services for operational, analytical and machine learning workloads. The company would like a multi-cloud solution — one where their new “smart” business system can be nimbly provisioned, operated, and maintained on any cloud easily.

Evaluating a Modern Solution: Splice Machine

The company’s strategic technology consulting partner, Accenture, introduced Splice Machine’s scale-out SQL database to the company as a new modern solution that could:

Run the enterprise business application originally designed for DB2 with little to no modifications;
Support multiple clouds;
Scale elastically.

As an added bonus, Splice Machine’s ML Manager can enable them to inject AI directly into their application to make it smarter, and perhaps make the entire data science process more productive.

So the company engaged with Accenture and Splice Machine on a technical evaluation for this combined cloud migration and application modernization.

The company wanted to run the enterprise application with zero modifications because of the cost and risk of modifying the original legacy application.

We positioned that any other database might require extensive rewrites.

The POC

In order to minimize the risk of any serious blockers, the company engaged in a Proof of Concept (POC) with Splice Machine. The objective of the POC was to validate Splice Machine as a viable alternative to the existing DB2 platform for powering certain applications. The company provided a DB2 backup of its test database for its enterprise application, as well as execution query logs for about 5,000 of its most common SQL queries that involved creating users, contracts and updating customer information, etc. They also provided some performance targets.

Splice Machine used its Data Migration Tool to extract the schema, data, indexes, triggers, keys, etc. into our platform. This represented several hundred tables and thousands of indexes. In some cases, there were gaps that were noted in this process for which workarounds were created. We’ll go into more detail on these below.

Once the Splice Machine database was created, the company could run the 5,000 queries against it. Most of these queries were of a “transactional” nature and were processed by Splice Machine’s transactional engine, though some were processed by the analytic engine when the Splice Machine cost-based optimizer deemed appropriate based on statistics. Please refer to Splice Machine’s dual engine architecture below. We seamlessly integrate open source components with optimized interfaces surrounded by a unique distributed ACID transaction engine.

The POC revealed several places where we needed to tune queries to streamline performance. For example, the team profiled and improved the performance of the most expensive parts of statements involving joins. Sometimes the system mistook a query to be analytical and chose the wrong join algorithm. The POC team used a workaround by hinting the use of a nested loop join, which is critical to fast performance of the OLTP-style operations. They also created additional indexes and supplied hints to the optimizer to achieve particular join orderings for some queries.

After tuning these queries, Splice Machine was able to meet or beat performance objectives on all but one query. The volume of data was relatively small for this test database (approximately 1.5TB) so the test did not show off Splice Machine’s scale-out architecture on larger datasets (5TB-PB’s) where we believe we have shown substantially superior performance over traditional databases.

But, we had to modify the application code with additional indexes, syntax rewrites, and hints.

As a result of the successful POC, the company challenged us to address the identified gaps and show Splice Machine can literally run their application WITHOUT additional indexes, syntax rewrites, and hints.

If this follow-on “launchpad” project were successful, we would have the potential to power their applications in new emerging markets and maybe even large existing operating entities. This was a huge challenge but one well worth undertaking.

Powering the Application Without Modifications — The Real Test

The “launchpad” project was significantly more challenging than the POC, as the requirement from the company was to run the application without changing any SQL in the application.

During the POC, Splice Machine was allowed to modify some queries, or even hint the optimizer in rare cases to execute properly or achieve the target performance. But these options were no longer available to Splice Machine during the implementation phase of the project.

As described previously, during the POC, Splice Machine’s team identified a number of gaps in the platform. These included:

Simple extensions to SQL syntax (e.g. certain built-in functions, special characters in identifiers)
FULL OUTER JOIN support
More complex Trigger processing capabilities not yet supported (conditional logic, SIGNAL, multiple actions per trigger, etc.)
Error code support to match legacy database error codes and handling

The first phase of the “launchpad” project addressed these gaps in Splice Machine’s database engine.

In this phase, Splice Machine started with a recent backup of the company’s database, and applied its Database Migration Tool again to create a new Splice Database hosted in AWS. The database migration included mapping over the full set of foreign keys, triggers, and value constraints, since maintaining referential integrity within the application was critical.

The customer’s application was a “rich client” Windows application, which meant that much of the data needed to populate forms and create user selection options was also stored in the database, and needed to be retrieved before the application could run properly.

The “no changes to the application” requirement triggered additional challenges, as the team encountered new SQL syntax that was unsupported at that time by Splice Machine, and which had not been exposed during the original POC. For example, this required a specific query shorthand for specifying aliases for multiple subqueries in a UNION statement. Splice Machine addressed these exceptions as they were encountered. In less than three weeks after the application was configured to use Splice Machine as the data store, the application ran successfully without modification.

To validate functional equivalence, the company identified 22 multi-step test scenarios reflecting their core business functionality. They then executed these complex, multi-statement workflows, including value constraints, multiple triggers and foreign keys, without changing a single line of SQL in the application. Each of these scenarios was successfully validated by the company.

While the application was now running on Splice Machine without making any changes, there were still some workflows that ran noticeably slower than on the operational version. This was not surprising, as Splice Machine was now executing many new queries beyond those seen in the original POC, and these queries were written with the existing database optimizer in mind. In one example, there was a query pattern involving left outer joins that the application used extensively. The original database could run this query pattern in less than a second while Splice Machine required 60 seconds or more. But it did not take long for the Splice Machine optimization team to update the solution search logic of the optimizer for this class of queries, achieving sub-second time as well. A few queries still performed slightly slower than the original one at this point, but performance was generally considered easily tunable in the optimizer later. Further, Splice Machine’s ability to improve the full-outer-join queries by more than 50x gave confidence to the company that later performance tuning — when necessary — should be no problem.

The Next Phase: Operational Readiness

The “launchpad” project was successfully delivered on schedule, even as the project evolved and requirements and implementation priorities were adjusted. The company is now in discussions with us on a new project to make Splice Machine operationally ready to deploy where and when they choose to do so. This will entail extensive testing and shadowing of the application in environmentally realistic conditions, and the execution of operational procedures such as backup, replication, failover, and upgrades.

Extending The App: Incorporating Real-time AI

Migrating certain applications to a scalable, distributed, and cloud-native platform will accelerate the company’s primary business objective of expanding into new high-growth markets. But the story doesn’t end there: after migrating its applications to the Splice Machine platform, the company will be able to leverage the ML models it has already built and execute them in-database, greatly reducing the data latency that has prevented these models from being operationalized in the application. It will be able to infinitely scale-out the volume of data that it stores and uses for these models, on inexpensive cloud-based, on-premises, or hybrid environments. Using Splice Machine’s triggers and stored procedures, the company will be able to take actions in-the-moment to detect fraudulent transactions, cross-sell/up-sell personalized products and services all based on real-time information, and much more. In parallel with the operational readiness project, we plan to be working with the company’s data science team to inject their ML models directly into the business system to prove this and then make it operational in a later phase. Here is a description and demo of the Splice Machine ML Manager that will be instrumental in the project.

Summary

This is just one example of an application modernization journey. Here we showed the initial steps of migrating a legacy database application to Splice Machine and make it run without modifications.

To get a demo of Splice Machine or your own trial on the cloud to investigate your modernization journey click here.