The world’s leading publication for data science, AI, and ML professionals.

Unlocking Data Access: Harnessing Triggers in the Absence of API Endpoints

Fill in the missing pieces of your data puzzle using triggers

SOURCE pexels.com (https://www.pexels.com/photo/abstract-accuracy-accurate-aim-262438/)
SOURCE pexels.com (https://www.pexels.com/photo/abstract-accuracy-accurate-aim-262438/)

Overview

Have you ever faced a scenario wherein you’ve tried to extract a crucial data point from a transactional system (such as an e-commerce system) using its API, only to discover that the necessary information was not accessible through the provided endpoints? If so, read on to discover how this challenge can be effectively tackled using Triggers.

In the absence of endpoints, we might think that querying data directly from the transactional tables is an option. It is definitely not a good idea to query your transactional tables directly as it can have a significant impact on the performance and stability of the transactional system, particularly when it involves an e-commerce system. When you try querying data from a live, e-commerce system, it is likely to have an adverse effect on user experience (Imagine waiting for 5–10 minutes for your cart to be retrieved while shopping on Amazon!).

In addition to that, running jobs on a transactional system’s tables can potentially disrupt ongoing transactions. This issue becomes even more critical if you are considering a ‘Truncate-Load’ operation in your data warehouse tables on a daily basis. Furthermore, the aforementioned option falls short of a sustainable solution since it lacks support for smooth historical data loads assuming periodic data purges happening in the transactional system.

Therefore, it becomes crucial to automate the process of extracting data from the transactional system and seamlessly integrating it into the data warehouse while also not adversely impacting the system. Database triggers provide an effective solution in this scenario. But before we dive into the solution, here is an introduction to triggers.

A Short Introduction to Triggers

Database Triggers

An often overlooked concept, Database triggers have been present since the inception of relational databases. A database trigger is a function that gets triggered every time a record is created or updated (or even deleted) in the source table (in this case, a transactional table).

Database triggers are of two types: DDL Triggers and DML Triggers.

DDL Triggers are set up whenever you want to get notified of structural **** changes in your database. For example, DDL triggers are useful when you wish to get alerted every time a new schema is defined; or when a new table is created or dropped. Hence, the name DDL (Data Definition Language) triggers.

DML Triggers are fired when new records are inserted, deleted, or updated In other words, you get notified anytime a Data Manipulation change happens in a system. An important point here is that database triggers can be programmed to not just alert you about a change but also to perform actions such as moving data into a staging table.

Specialized Triggers

Modern cloud platforms such as Azure and AWS provide specialized triggers as part of their services. It is to be noted that specialized triggers are not the same as Database Triggers. While database triggers are specific to database management systems (DBMS) and operate within the database itself, specialized triggers have a wider scope. They can be used for a variety of automation tasks, for event-driven workflows, and also to create smooth integrations between cloud services and their components.

Here are some specialized triggers offered by AWS as part of its cloud services:

  • AWS Lambda Triggers: These triggers help initiate a lambda function when a specified event happens. In other words, you can specify an event that should trigger the lambda function. Events can be internal to AWS, or external in nature. Internal events can be related to AWS services such as Amazon S3, Amazon DynamoDB streams, or Amazon Kinesis. External events can come in from the database trigger of a transactional system outside of AWS or an IoT event.
  • Amazon S3 Event Notifications: These triggers enable you to get notified whenever an S3 bucket is created, modified, or deleted. They use AWS’ Simple Notification Service (SNS) to broadcast a message.
  • AWS Cloudwatch Events: If you have used standalone relational databases such as Microsoft SQL Server and SQL Server Management Studio (SSMS), you may have used SQL Server Agent to notify users of a job failure. Cloudwatch is specific to AWS and is used not only to notify users of a job failure but also to trigger Lambda functions and to respond to events. The important difference between a CloudWatch Event and a Lambda Trigger is that while Lambda triggers refer to the capability of AWS Lambda to respond to events, CloudWatch Events is a broader event management service that can handle events from sources beyond Lambda. On a side note, while SQL Server Agent requires an email server to be configured, Cloudwatch has no such requirement.

Here are some specialized triggers offered by Azure as part of its cloud services:

  • Blob Trigger – Azure blobs are similar to S3 buckets offered by AWS. Similar to how Amazon S3 notifications can be used to get alerts about changes in S3 buckets; blob triggers can be used to get notified of changes in Azure blob containers.
  • Azure Function Trigger—These are the Azure equivalent of AWS Lambda Function Triggers. These triggers can be used to initiate an Azure function in response to an event within Azure or an external event, such as an external transactional database trigger, an HTTP request, or an IoT event hub stream. Azure functions can also be initiated based on a pre-defined schedule using a Timer Trigger.

Now that we’ve looked at the different types of database triggers and specialized triggers offered by AWS and Azure, let’s revisit the use case to refresh your memory. Allow me to remind you about the use case we mentioned earlier.

The use case – You see a couple of data points in your transactional system’s tables that you would require for your reporting metrics but these data points are not being provided by your transactional system’s API endpoints. So, there is no way you can write a script in Python or Java to grab these data points using the API. You cannot use direct querying on your transactional system as it can negatively impact its performance.

To tackle this, we use a combination of database triggers and specialized triggers offered by cloud services. Here is a high-level approach:

Pre-requisite: Identify the table in your transactional system’s database that has the data points which are not accessible via API endpoints. Once you have identified the table, follow the steps below –

Step 1: Create a staging table having the same columns as the transactional table. Ensure that you don’t have any additional constraints copied over from the source transactional table. This is to ensure as minimal impact as possible on the transactional system. In addition to that, also have a column to indicate the operation performed such as Insert, Update, Delete). Assuming that your transactional table’s backend has a SQL Server backend, here is an example of the transactional table and the staging table that would need to be created.

-- Sample transactional table
CREATE TABLE Pricing_info (
    ProductID INT PRIMARY KEY,
    ProductName VARCHAR(50),
    Quantity INT,
    UnitPrice DECIMAL(10, 2),
    OperationDate DATE
);

The staging table would then be:

-- Create a Staging table without constraints
CREATE TABLE StagingTable_pricing (
    ProductID INT,
    ProductName VARCHAR(50),
    Quantity INT,
    UnitPrice DECIMAL(10, 2),
    OperationDate DATE,
    OperationType VARCHAR(10)
);

Step 2: Set up a DML trigger directly on the ‘Pricing_info’ table (main transactional table).

The trigger would need to be programmed to insert data into a staging table whenever a new record comes in, or an existing record is updated or deleted. The idea behind using a staging table is that it will avoid any unnecessary strain on the main transactional tables.

Below is an example of the same. As seen below, the two most important aspects of the DML trigger (in fact, for any database trigger) are the Trigger Event and the Trigger Timing. The Trigger Event refers to the action that should activate the trigger. In this case, we are interested in all DML events namely Insert, Delete, and Update in the transaction table ‘Pricing_info’. The Trigger Timing refers to whether you need the trigger to perform an activity before the event occurs or after the event occurs. For our use case, it is clearly an ‘After’ event trigger. We create three triggers, one for each DML event.

Below is the Trigger for Insert:

-- Create the trigger
CREATE TRIGGER TransactionTrigger_pricing_Insert
ON Pricing_info
--Trigger Event
AFTER INSERT
AS
BEGIN
    -- Insert new records into the staging table
    INSERT INTO StagingTable_pricing (ID, Column1, Column2, OperationType)
    SELECT ID, Column1, Column2, 'INSERT'
    FROM inserted
END;

Next is the trigger for Update:

-- Create the trigger
CREATE TRIGGER TransactionTrigger_pricing_update
ON Pricing_info
--Trigger Event
AFTER UPDATE
AS
BEGIN

    -- Insert record in the staging table with the data that was updated
    INSERT INTO StagingTable_pricing (ID, Column1, Column2, OperationType)
    SELECT ID, Column1, Column2, 'UPDATE'
    FROM inserted
END;

Lastly, we create the trigger for Delete:

-- Create the trigger
CREATE TRIGGER TransactionTrigger_pricing_Delete
ON Pricing_info
--Trigger Event
AFTER DELETE
AS
BEGIN
    -- Insert record in the staging table with the data that was deleted
    INSERT INTO StagingTable_pricing (ID, Column1, Column2, OperationType)
    SELECT ID, Column1, Column2, 'DELETE'
    FROM deleted
END;

Step 3: Let’s now move on to the portion of setting up the specialized triggers.

Step 3a. If your data warehouse is housed in AWS, below is a high-level solution that can be implemented.

In the above solution, the source is the transactional system. We set up a database DML trigger in the transactional system’s database. Whenever a new record comes into the transactional database table, the trigger would insert the new data into a staging table within the transactional database. Based on a schedule (using AWS Cloudwatch events), the Lambda trigger would trigger a lambda function to grab the data from the staging table to a table in the datawarehouse (Redshift). Let’s see the steps within this.

Pre-requisite: Create a table in the datawarehouse that will hold the transactional information.

Step 3a. (i) Create an AWS Lambda function: Write the code for your Lambda function that will grab the records from the staging table and insert them into the datawarehouse table, along with any underlying calculations necessary.

Step 3b. (ii) Create the AWS Lambda Trigger – Schedule a Lambda trigger using AWS Cloudwatch services to run the lambda function in a nightly schedule (It is advisable to run the Lambda function during off-business hours or periods of low activity in the transactional system.)

Step 3c. (iii) Set up Event Mapping using EventBridge – Configure the Lambda trigger to an event such that the trigger gets initiated based on the event condition specified. A typical scenario would be to trigger the Lambda on a schedule – once per day.

AWS offers great documentation around setting up Lambda functions, and hence, it is out of scope for this article.

Step 3b. If your data warehouse is housed in Azure, we can use Azure Functions and the timer trigger or an Azure Function Trigger offered by Azure.

In this case, it is a good idea to use timer triggers. When the timer trigger activates, it would run the Azure Function which would then pick up the new/updated/deleted records from the staging table. (Note: The staging table will have an additional flag variable indicating whether the record was inserted, updated, or deleted).

Here are the steps to follow:

Step 3b. (i): Create an Azure Function: This is analogous to setting up the AWS Lambda function. Set up the code to gather records from the staging table and insert them into the datawarehouse table, along with any underlying calculations.

Step 3b. (ii): Set up the Azure Function Trigger: Using the Azure Function app, set up a timer trigger and specify the schedule and a timestamp parameter name.

Step 3b. (iii): Set up Event Mapping using Azure Eventgrid: Configure the trigger to map the event data to the appropriate parameters in your Azure Function. This enables the trigger to get initiated based on the event condition specified.

Handling Historic Data Loads

The solution that we discussed so far is for new data coming in for the data points of interest. How do we bring in the historic data?

For this, one option would be to do a ‘CREATE TABLE AS SELECT’ (SELECT * INTO in SQL Server) while creating the staging table. This will create the staging table pre-populated with all the data currently available in the transaction table. The remaining steps would remain the same (via a specialized Azure timer trigger/AWS Lambda trigger as the case may be).

Another option would be to do an ‘EMPTY UPDATE’ on all the records in the transaction table. Here is a sample empty update based on the current example/use case –

UPDATE TABLE Pricing_info SET OperationDate=OperationDate

As you can see, none of the values get updated. However, from a database perspective, all records were updated and hence, the trigger would be fired for all the updates. Thereby, all the data would reach the staging table. Note that this is not a recommended approach as it could bog down the transactional system due to the number of updates and undo statements generated. Moreover, the transaction table will also be locked during the entire update operation and will be unavailable for other processes thus impacting the transactional system. This method is okay to use if your transaction table is extremely small in size.

Concluding Remarks

In this article, we explored triggers’ versatility in capturing critical data points not readily available through standard API endpoints of the transactional system.

They help ensure data integrity and eliminate the need for manual intervention. Thereby, they also ensure the effective inclusion of crucial reporting metrics. The solution provided here is asynchronous in nature to ensure that the transactional system is not burdened in any way. If your transactional system does not have a lot of traffic (or) is not directly used by end-user applications, then it can be set up as a synchronous process. In that case, the lambda or the Azure functions would need to have the trigger event as the transactional database’s staging table. The appropriate database connection information would also need to be provided.

Hope this article was helpful. Let me know if you have any questions in the comments. Happy learning!


Related Articles