The world’s leading publication for data science, AI, and ML professionals.

Timelines

A common timeline view helps us understand the various interactions users have with our systems.

Notes from Industry

Photo by Peter Stumpf on Unsplash
Photo by Peter Stumpf on Unsplash

This is an excerpt from my book Data Engineering on Azure. The code examples in this article use Azure Data Explorer. If you want to run them, create an Azure Data Explorer cluster and database. Once this is set up, you can use the Data Explorer web UI to connect to your cluster and database and run the code samples. Azure Data Explorer uses KQL, the Kusto Query Language. If you are not familiar with it, you can check out the quick reference. The book covers Azure Data Explorer in more depth, but this article is meant as a stand-alone.

#

In my previous article, Identity keyrings, we saw how an identity keyring helps us connect datasets across an enterprise. Another useful view is one that shows all events that occur across our systems – a timeline.

A common timeline view helps us understand the various interactions users have with our systems. Let’s say we’re trying to understand how customer support issues are correlated with user retention. To do this, we need to see when support tickets are opened and closed, and when users cancelled their subscriptions. These data points again come from different teams: the customer success team handles support tickets, and the payments team knows when users cancel their subscriptions. Figure 1 shows how this can be plotted on a timeline.

Figure 1: Timeline view of multiple events: when the support ticket opens and closes and when the subscription is cancelled.
Figure 1: Timeline view of multiple events: when the support ticket opens and closes and when the subscription is cancelled.

Building a timeline view

We can define a common schema for these events with a Timestamp column to capture when an event occurs, KeyType and KeyValue columns to capture the type and value of the identity tied to the event, an EventType column, and a dynamic EventProperties column to capture event-specific properties. We’ll leverage semi-structured data to package different types of properties because various event types have their own associated properties. The next listing creates this table.

.create table Timeline (Timestamp: datetime,
  KeyType: string, KeyValue: string, 
  EventType: string, EvenProperties: dynamic)

Now let’s assume we want to ingest a SupportTickets table from the customer success team and a SubscriptionOrders table from our payments team that captures new and cancelled subscriptions. The following listing populates these tables with some sample data:

.set SupportTickets <|
datatable (Timestamp: datetime, SupportCustomerId: int,
  TicketId: int, Status: string, Message: string) [
    datetime(2020–07–01), 21, 5001, 'Opened', '...',
    datetime(2020–07–03), 21, 5002, 'Opened', '...',
    datetime(2020–07–04), 21, 5001, 'Updated', '...',
    datetime(2020–07–05), 21, 5001, 'Closed', '...',
    datetime(2020–07–19), 21, 5002, 'Closed', '...',
]
.set SubscriptionOrders <|
datatable (Timestamp: datetime, CustomerId: int,
  SubscriptionId: guid, Order: string) [
    datetime(2020–06–01), 1001,
      'fd10b613–8378–4d37-b8e7-bb665999d122', 'Create',
    datetime(2020–07–19), 1001,
      'fd10b613–8378–4d37-b8e7-bb665999d122', 'Cancel'
]

The following listing shows how we can ingest these tables into our Timeline table:

.append Timeline <| SupportTickets
| where Status == 'Opened'
| project Timestamp, KeyType='SupportCustomerId',
    KeyValue=tostring(SupportCustomerId),
    EventType='SupportTicketOpened',
    EventProperties=pack("Message", Message)
.append Timeline <| SupportTickets
| where Status == 'Closed'
| project Timestamp, KeyType='SupportCustomerId',
    KeyValue=tostring(SupportCustomerId),
    EventType='SupportTicketClosed',
    EventProperties=pack("Message", Message)
.append Timeline <| SubscriptionOrders
| where Order == 'Create'
| project Timestamp, KeyType='CustomerId',
    KeyValue=tostring(CustomerId),
    EventType='SubscriptionCreate',
    EventProperties=pack("SubscriptionId", SubscriptionId)
.append Timeline <| SubscriptionOrders
| where Order == 'Cancel'
| project Timestamp, KeyType='CustomerId',
    KeyValue=tostring(CustomerId),
    EventType='SubscriptionClose',
    EventProperties=pack("SubscriptionId", SubscriptionId)

pack() creates a dynamic value from a set of property names and values, which enables us to store different shapes of data in the EventProperties column.

Using timelines

If we query the Timeline table, we get something like the following:

We have the various events on a timeline with their specific properties captured in the EventProperties column. Combining this with the keyring gives us a great perspective on how our systems are used.

For the next query, you will need the Keyring table build in the previous article:

Let’s go back to our example where we want to correlate support tickets with subscription cancellations. The query in the following listing retrieves all support tickets opened within 30 days of a subscription being cancelled.

Timeline
| where EventType == 'SubscriptionClose' // #1
| join kind=inner (Keyring
    | where KeyType == 'CustomerId') on KeyValue  // #2
| join kind=inner (Keyring
    | where KeyType == 'SupportCustomerId') on GroupId  // #3
| join kind=inner (Timeline
    | where EventType == 'SupportTicketOpened') on
      $left.KeyValue2 == $right.KeyValue  // #4
| project Delta=(Timestamp - Timestamp1), CustomerId=KeyValue,
  SupportCustomerId=KeyValue2    // #5
| where Delta < timespan(30d)    // #6

We first get all SubscriptionClose events (#1). Then we join with the Keyring table on CustomerId to get GroupId (#2). We then join again with the Keyring table on GroupId to get the associated SupportCustomerId (#3). Finally, we join again with the Timeline table on the SupportCustomerId (KeyValue), looking for SupportTicketOpened events (#4).

We now have all the data we need – we compute the time difference between SubscriptionCloseand SupportTicketOpenedas Delta. We have both CustomerId and SupportCustomerId (#5). Finally, we filter by a time difference less than 30 days (#6).

The key takeaway here is that these data points (support tickets opened/closed, subscriptions cancelled) come from different systems of our enterprise and are identified using different IDs. Using an identity keyring and a timeline, we aggregate these into a common schema from which we can produce our business reports.

To recap, both keyrings and timelines are generalized data models we can build on top of the raw data available in our system. These help connect the dots and bring together otherwise disjoint datasets into a unified view of how users interact with our system. It falls to the data engineers to build and maintain such models.


Related Articles