Why You Should Learn About Streaming Data Science

Mark Palmer
Towards Data Science

--

Adaptive learning and the unique use cases for data science on streaming data. By Dr. Tom Hill and Mark Palmer

Traditional machine learning trains models based on historical data. This approach assumes that the world essentially stays the same — that the same patterns, anomalies, and mechanisms observed in the past will happen in the future. So, predictive analytics is really looking-to-the-past rather than the future.

Recently available tools help business analysts “query the future” based on streaming data from any source including IoT sensors, web interactions, transactions, GPS position information or social media content. Similarly, we can now apply data science models to streaming data.

No longer bound to look only at the past, the implications of streaming data science are profound.

Data science models based on historical data are good but not for everything

The majority of applications for machine learning today seek to identify repeated and reliable patterns in historical data that are predictive of future events. When the relationships between dimensions and “concepts” are stable and predictive of future events, then this approach is practical.

For example, the number of visitors expected at a beach can be predicted from the weather and the season — fewer people will visit the beach in the winter or when it rains, and these relationships will be stable over time.

Likewise, the numbers, amounts, and types of credit card charges made by most consumers will follow patterns that are predictable from historical spending data, and any deviations from those patterns can serve as useful triggers for fraud alerts.

And, even when the relationships between variables change over time — for example when credit card spending patterns change — efficient model monitoring and automatic updates (referred to as recalibration, or re-basing) of models can yield an effective, accurate, yet adaptive system.

Streaming Data Science applies algorithms in-stream

In some cases, however, there are advantages to applying learning algorithms to streaming data in real time. Sometimes, a critical factor that drives application value is the speed at which newly identified and emerging insights are translated into actions.

In some use cases, there are advantages to apply adaptive learning algorithms on streaming data, rather than waiting for it to come to rest in a database.

For example, to identify the critical factors that predict public opinion, fashion choices and consumer preference, an adaptive approach to continuous modeling and model updating can be helpful.

Streaming BI — an enabling technology for Streaming Data Science

To understand streaming data science, it helps to understand Streaming Business Intelligence (Streaming BI) first.

The video below shows Streaming BI in action for a Formula One race car. Embedded IoT sensors stream data as the car speeds around the track. Analysts see a real-time, continuous view of the car’s position and data: throttle, RPM, brake pressure — potentially hundreds, or thousands of metrics.

By visualizing some of those metrics, a race strategist can see what static snapshots could never reveal: motion, direction, relationships, the rate of change. Like an analytics surveillance camera.

Streaming Business Intelligence allows business analysts to query real-time data. By embedding data science models into the streaming engine, those queries can also include predictions from models scored in real time.

The innovation of Streaming BI is that you can query real-time data, and since the system registers and continuously reevaluates queries, you can effectively query the future.

That is, once you create a visualization, the system remembers your questions that power the visualization and continuously updates the results. You just set it and forget it.

In this case, the BI tool registers this question:

“Select Continuous * [location, RPM, Throttle, Brake]”

When any data changes on the stream — location, RPM, throttle, brake pressure — the visualization updates automatically. Computations change. Relationships change. Visual elements change.

The ground-breaking innovation of Streaming BI is that you can query for both real-time and future conditions.

New questions become possible

What questions would you ask if you could query the future? A race team can ask when the car is about to take a suboptimal path into a hairpin turn; figure out when the tires will start showing signs of wear given track conditions, or understand when the weather forecast is about to affect tire performance.

So by continuous queries with query registration, business analysts can effectively query the future.

But what if those queries could also incorporate data science algorithms? Well, they can!

Adaptive learning use cases

Adaptive learning with streaming data is the data science equivalent of how humans learn by continuously observing the environment.

Adaptive learning with streaming data is the data science equivalent of how humans learn by continuously observing the environment.

For example, in high-tech manufacturing, a nearly infinite number of different failure modes can occur. To avoid such failures, streaming data can help identify patterns associated with quality problems as they emerge, and as quickly as possible.

When never-before-seen root causes (machines, manufacturing inputs) begin to affect product quality (there is evidence of concept drift), staff can respond more quickly.

Adaptive learning from streaming data means continuous learning and calibration of models based on the newest data, and sometimes applying specialized algorithms to streaming data to simultaneously improve the prediction models, and to make the best predictions at the same time.

Other examples where continuous adaptive learning is instrumental include price optimization for insurance products or consumer goods, fraud detection applications in financial services, or the rapid identification of changing consumer sentiment and fashion preferences.

Towards the Future of Streaming Data Science

Learning from continuously streaming data is different than learning based on historical data or data at rest. Most implementations of Machine Learning and Artificial Intelligence depend on large data repositories of relevant historical data and assume that historical data patterns and relationships will be useful for predicting future outcomes.

However, when streaming data is used to monitor and support business-critical continuous processes and applications, dynamic changes in data patterns are often expected. Different analytic and architectural approaches are required to analyze data in motion, compared to data at rest.

Streaming BI provides unique capabilities enabling analytics and AI for practically all streaming use cases. These capabilities can deliver business-critical competitive differentiation and success.

Dr. Thomas Hill is Senior Director for Advanced Analytics (Statistica products) in the TIBCO Analytics group. He previously held positions as Executive Director for Analytics at Statistica, within Quest’s and at Dell’s Information Management Group.

Mark Palmer is the SVP of Analytics at TIBCO software. As the CEO of StreamBase, he was named one of the Tech Pioneers that Will Change Your Life by Time Magazine.

--

--

Board Advisor for Correlation One, Data Visualization Society, and Talkmap | World Economic Forum Tech Pioneer | Data Science for All Mentor