Understanding Events with Artificial Intelligence
We have come across a lot of client requirements that boil down to using AI to understand events. Some systems need to classify events into types, others need to listen for specific events, and some need to predict events. These requirements often involve assigning a score to an event, and then ranking all the events according to the assigned score.
These events can be google calendar events, a medical alarm, dates on a dating website, or in the case of GenRush, events can indicate new potential clients for a company based on events in the real world.
These event understanding problems are classic classification and regression requirements hiding behind the complexity of the word “event”. For sequence prediction, we use LSTM/GRU/RNN, and sometimes DNN (e.g. when the sequence of events forms a pattern you can graph/see). But let’s focus in this article on the non-sequence and non-location part of event processing. Let’s look in some detail at how to turn an event into a set of features that an AI can use.
So, we have a whole lot of events, and we want the AI to notify human users when specific events happen. Immediately we find that a classic false positive versus false negative conundrum emerges. I have seen a lot of this in hospital/medical notification systems back when I was working on medical devices with Mathieu Lemay. If the AI notifies incorrectly too often (false positive) then the users will ignore the notifications. Whereas if the system misses key events, then users will think, and rightly so, that the AI is not paying attention.
Let’s go one level deeper and discuss what are some of the features of an “event” that the AI can observe in order to make notification decisions. Models observe features of an object in order to understand it. The job we do in the machine learning world is to identify the features and expose them to the machine learning model in a way that it “likes”. Think of this mapping of events to features like a pinup girl (boy?) magazine where height and cup size are “features” of models. Just like the pinup models, we need to have some way to compare and contrast events. Think of each event like a baseball card that has a bunch of crunchable info. An event can contain the following types of information: sequence, location, numerical, categorical, image, text, and relationship.
We decided not to think about sequence data (e.g. what happens before and what happens after the event), and so let’s talk about these other event features. For GenRush we get location data from a Google API that turns addresses into longitude and latitude. I mentioned this sort of thing in this past article as well in a bit more detail. This is generally referred to as GIS. The location data can be used in classification and regression, but let’s not focus on this aspect because like sequence data, there are a lot of caveats to leveraging the data. Numerical data is the stuff that makes sense as a number AND that you can safely compare to other numbers. An example of numerical event data is a field called “capacity”. The capacity of events is a thing you can compare. A house number, for example, is NOT numerical. It is a number, but the house with number 888 on Broadview Ave is not smaller than the house with number 900 Lady Ellen Pl. House number, ZIP code, and country, are examples of categorical data. Perhaps the even/odd numbers can be used to determine the fronting of the house, but it is not useful as a number. To avoid having a zillion categories with mostly a frequency of 1 in the dataset, this data needs to be organized by frequency and binarized. Low frequency categorical features don’t really tell you about the event, and so we can safely throw that stuff away into an “other” bucket. Image data in an event can be the group of pictures associated with a google calendar event listing. We have some cool CNN-based techniques for crunching this data into a fixed size vector, but that’s kind of beyond the scope of this article. Text data is typically the body of the event description, but can also include text in metadata related to the event, such as social media posts and event responses. We use word embedding models to better understand the meaning of the text. Relationship data on events is pulled from a knowledge graph built from the relationships that a crawler can see. More specifically, we can set up a crawler to build a graph of all the attendees of all events, where each node (i.e. vertex) in the graph is an email address or an event, and the arcs (i.e. edges) connect an email address to an event. So three people (emails) who attended the same event, can be pulled into one relationship in the graph. This simple graph tells you who attends events with who. It has all sorts of useful information for classifying events, like how many people attend an event, what are the groups of people who attend events together, who attends lots of events, and so on. Cardinality is an important classification feature, because when we want to understand events by labeling them “SMALL” or “BIG”, it really helps to know how many people attend. I know. It seems obvious. But take a step back and think about how many features we extracted from a simple “event” item.
So now that we exposed these features, let’s zoom back out and see how an event becomes a feature vector with some pseudocode:
And here below is a simple DNN in keras for classifying feature vectors (x) and the accompanying ground truth data (y):
Now there are some thorny issues that we didn’t get into here, like building the knowledge graph, running several models in parallel (CNN with DNN), building a notification framework with AWS SES/SNS, building the binarizer map, etc. However, I hope you get a good sense from this article how to build an event classifier starting with a feature extraction function and moving on to a DNN that classifies the feature vector.
So there you have it. Events can be turned into features that an AI can understand.
I’m writing this article on the heels of some great news. Our recent work on deep learning AI got two awards! First, the best paper award for the Visual Deep Learning Recommender System. And, to top it off, we also got a top paper award for an unsupervised deep learning paper; the one I discussed in a previous article. Fun Times!
I’m listening to Avicii at an unsafe 98 dB and climbing, and hammering away at 2 different projects due tomorrow. Why am I telling you this? I want to tell you how I feel. Right now in the here and now of October 8, 2017. After fasting for 25 hours, on Yom Kippur and taking a 3 day break this past weekend, I’m back into the swing of things, up at 2am scheming the next big thing with a new client. Why? I’m super motivated.
As expected, we are growing to keep up with this spike in projects. The reason for the lag in new articles (and some late work product) is the pace of work we have taken on in the past month, combined with a heavy travel and holiday calendar. We have 11 active projects. Our usual in the past year was 5 at any given time. Thank God for JIRA and Slack.
We are now a staff of 6 engineering ninjas. We’ve got a PhD and a PhD candidate, a masters and masters candidate, a senior dev and an MBA. All with engineering undergraduates. Why so much higher education? Well, this machine learning stuff is hard, and we need the heavy hitters. And I’ve been doing way too much myself. Time to delegate more.
While trying to keep this article general, I started getting a bit deep into the details. Too deep? I’m open to some constructive criticism. If you enjoyed this article on artificial intelligence, then please try out the clap tool. Tap that. Follow us on medium. Go for it. I’m also happy to hear your feedback in the comments. What do you think?