Data Jenga is a risky game

Why a sustainable data foundation is key for any AI initiative

Matthias Graeber
Towards Data Science

--

A scientist in a white coat building an unstable Jenga tower.
Image generated with DALL-E 2.

There is huge potential in bringing AI into manufacturing. Companies are investing large amounts into smart manufacturing and Industry 4.0 initiatives — yet there is still a significant gap between expectations and actual successes delivered.

There are different reasons for this, but this article will focus on one of the most fundamental reasons — a data foundation unfit for purpose. One may call it Data Jenga, because just as in the game Jenga, you may create progressively more unstable structures by starting to build on top.

The Quick-Wins-Dilemma

AI initiatives are often kicked off as innovation projects, somewhat disconnected from the core of the company. To demonstrate success and gain more acceptance and support, project teams typically use a pragmatic approach, going for quick wins — opportunities around available data with some (but not massive) business value.

While this approach is understandable (and often expected by management) there is one big problem with it:

The focus lies on what is possible with the current data and existing (meta-)data management, rather than on what could be possible, with the right data foundation in place.

The value creation may be neither scalable nor sustainable because the necessary data foundation has not been put in place. Just like in the final stages of a thrilling Jenga game — you may still be ok removing a block and putting it on top — but the tower may collapse for the next player.

Is the data foundation fit for purpose?

The success (and ultimately the business impact) of AI initiatives in manufacturing initially depends on three enabling pillars, that data teams need to identify data-driven pathways toward value creation:

  • A clear business value proposition
  • Domain expertise (process, industrial automation)
  • Relevant and well-managed data in sufficient quantity (the data foundation)

Good data management practices are generally in place for data from ERP systems due to the widely recognized impact of high-quality data in established business processes [1].

In emerging fields such as AI in manufacturing, however, data management may still be at an early stage, with data originating from systems primarily designed for process control and only secondarily for data collection. While data collection may be straightforward in industrial automation systems, the sheer amount of data should not be mistaken for the actual ability to extract useful knowledge.

Also, Andrew Ng calls for a more data-centric approach: “Instead of merely focusing on the quantity of data you collect, also consider the quality, make sure it clearly illustrates the concepts we need the AI to learn.” [2]

Without well-managed, relevant data, the probability of the transformative impact of an AI initiative may be greatly reduced, right from its start.

Tackling strategic gaps to unleash the full potential of AI

When planning an AI initiative, a lack of a solid data foundation becomes visible fairly quickly. However, there is a risk that this strategic gap may not be recognized by relevant stakeholders outside the core project team.

Due to the expectations to deliver impactful results in a short time, communication may become success-biased, focussing on quick wins. Fixing strategic gaps in the data foundation may seem less attractive, as it will require significant effort and will pay off only in the mid- or long-term.

A constant focus on quick wins may lead to a misperception that simply building up some data analytics capabilities is sufficient to master the AI transformation. It may even result in organizations not being aware of the true potential data and AI hold for them. In the worst case, they may even miss out on the opportunity completely.

Replacing shaky towers with solid pyramids

Ultimately, the task for an organization is to put in place one of the most fundamental concepts of computer science — the Information or Data Pyramid [3]. It enables the journey from raw to managed data (information), via knowledge built through analyzing the information, all the way to the top — to data-driven actions with measurable impact.

The Data Pyramid.
The Data Pyramid and its two strategic directions. Image by author.

A real-world example in the manufacturing industry is the aggregation of field maintenance data (information) to identify the statistically most important root-cause conditions of equipment failure (knowledge).

For example, the obtained knowledge can then be fed back to the R&D teams that create a new and improved design of the equipment (wisdom) (non-AI example). It may also be possible to combine this knowledge with industrial automation data to provide live decision support systems, e.g. to avoid unplanned downtime of equipment (AI example).

The structure of a pyramid is wider at the bottom than at the top. The Information Pyramid, therefore emphasizes the need for a rock-solid data foundation. A significant amount of (managed) data is required to extract insights, largely governed by the laws of statistics.

But how to build the pyramid if it is not there yet? To facilitate this process, some organizations have now started to create the concept of data products and the role of Data-Product Managers [4].

Top-down & bottom-up: Building the pyramid in two directions

While there might be a case for (unexpected) knowledge discovery from mining readily available data, in a strategic context, the definition of the information pyramid should be more thought of as a top-down process with bottom-up validation.

Based on its strategy, an organization will be able to formulate actionable insights (wisdom) that will give it a competitive edge in its markets. It can formulate the knowledge it needs to build, that will enable the organization to sustain the competitive edge in an efficient manner (= target knowledge).

But is it feasible? Bringing in data, business, and domain experts will validate the approach bottom-up. The target knowledge can be refined based on feasibility considerations, and eventually, the business strategy will be perfectly embedded in data structure and architecture.

Going through this analysis, together with the multi-disciplinary teams will shine a way forward and highlight gaps and obstacles. It will help frame the full potential of AI for an organization, and bridge the gap between technology, domain expertise, and business strategy.

Conclusion

Building the future information pyramid of an organization requires significant time, effort, and investment. But to truly embark on the AI journey and unlock its full business potential there is no way around it.

Do not build future success on top of a shaky Jenga tower — rely on a rock-solid pyramid instead.

References:

[1] Richard Wang, Yang W. Lee, Leo L. Pipino, and Diane M. Strong (1998). Manage your information as a product. MIT Sloan Management Review, Summer 1998: 95–105

[2] Andrew Ng. AI Doesn’t Have to Be Too Complicated or Expensive for Your Business. Harvard Business Review, July 29, 2021.

[3] Ackoff, R. L. (1989). From data to wisdom. Journal of Applied Systems Analysis 15: 3–9.

[4] Thomas H. Davenport, Randy Bean, and Shail Jain (2022). Why Your Company Needs Data-Product Managers. Harvard Business Review, October 13, 2022

--

--

Passionate about using data, scientific principles, AI & ML to boost manufacturing efficiency, creating value for industry and doing good for the planet.