The world’s leading publication for data science, AI, and ML professionals.

The Top 3 Data Architecture Trends (And How LLMs Will Influence Them)

Embracing the Next Era of Data Architecture: Unveiling the Top 3 Trends and the Influential Role of LLM

I published an article last year on Data Architecture trends.

This was before Large Language Models (LLMs) became all the rage and influenced most industries. Gartner reports, "Venture capital firms have invested over $1.7 billion in generative AI solutions over the last three years." It is, without a doubt, that LLMs will influence most areas of Data Architecture.

With that in mind, let’s explore three Architecture trends and how LLMs will influence them.


1. Cost Optimisation Using Co-Pilots

I’m a big fan of co-pilots that help the end user efficiently complete their tasks.

Being a regular user of Grammarly, I appreciate how it helps expedite the editing process of any written form of content. Similarly, co-pilots will take the main stage in most of our work, including Data Architecture.

A data architect’s daily in-tray will include aspects of data model design, setting standards, and implementing governance structures. Co-pilots like Microsoft can help finish off sentences in an email and create announcements based on spec documents. Similarly, a co-pilot for a Data Architect can complete entity-relationship diagrams (ERD) based solely on user requirements by understanding your design constraints. Co-pilots can work alongside the architect and help expedite their daily process.

It should be no surprise that companies will start looking at ways to optimise their cost if productivity starts to skyrocket. Some estimates of thousands, if not millions, of jobs to be impacted.

For example, management consultants have been helping organisations restructure and reduce overhead costs by finding efficiencies. Similarly, the implementation of co-pilots will see a reduction in human resources due to more reliance on AI-led task completion. Tasks such as writing design documents, following approved patterns to create data architecture diagrams, creating data models and associated SQL queries, auditing SQLs against approved standards etc.

Co-pilots will lead to efficiencies and cost savings!


2. Context-Driven Analytics

We may have solved the storage problem with Cloud but we still need to solve the context problem.

Data in and of itself is just a series of text/numbers; the value is realised when you add context to it. And "data context" is a multi-billion dollar industry.

Data context includes business or technical metadata, governance or privacy needs, and accessibility or security requirements. Although this industry was expected to double by 2028, I wonder how much of that growth will be capitalised by LLMs. As an example, using Semantic Embeddings and Vector databases, organisations will be able to quickly contextualise data without needing to implement extensive data-context tools. If I can detect anomalies using Embeddings, do I need a comprehensive governance framework? This enforces point 1 of further cost optimisation due to LLMs.

Embedding (pun intended!) AI in the data pipelines, transformations and lineages can help build context. And this context can be relied upon to answer end-user questions for analytics or regulatory needs. For example, does this data contain personal information? If it does, filter it out from specific analytics use cases.

The image articulates how a context layer captures information like a traditional data catalogue would, except it would be using the power of LLMs and drastically reducing human intervention.

Context makes the data valuable; it can be achieved quicker using LLMs.


3. Launching Data Architecture Ecosystems

We are tired of siloed & disparate architectures.

The architecture where governance tools don’t integrate with your data lake, the source system is not designed with analytics in mind, or multiple sources of truth exist.

The ecosystem needs to mirror the offering from consumer companies like Apple. A key product with various supporting composable products that are useful individually but collectively creates a mind-blowing ecosystem. As an example, a data product marketplace (iPhone) displays information from the Data Observability framework (Watch) and is governed by a single access method (Face ID). Data Architecture will be in an ecosystem where integration is no longer a weakness. And this will be a game-changer.

An ecosystem will also reduce the risk of information redundancy across disparate sources (like your iMessages sync across all your devices). There are already startups looking at revolutionising this using concepts such as OBT (One Big Table). Ecosystems also mean data definitions; standards are set once and propagated through each area, reducing the cost of replication.

For example, a customer transactional table captures information from the CRM system; by default, the CRM is designed to capture the mandatory fields required for analytics [1]. Once the data is transferred, it goes through a series of data quality checks to ensure its fit for purpose [2]. Once it is transformed, the reconciliation information is captured to assure that the data has not gone missing [3]. Before consumption, it is classified into personal data buckets and appropriate governance harnesses are set [4]. All these processes are important in themselves; however, they are significantly more powerful; when the data is finally productised, you can visualise [1] – [4] for that dataset and, in turn, trust that data.


Conclusion

As if Modern Data Stack didn’t have its own hype, we now have the GenAI hype to contend with. It will be interesting to see how these trends unfold over the next 12–18 months. I expect companies that have already invested in the foundations to capitalise on these trends and those that did not invest in data quality or governance to lag behind continually.

The underlying ask for all these trends is good data. You can’t co-pilot, add context or have an effective data architecture without good data. It is one of the hardest things to achieve, but consequently, the biggest ROI.

Do you want to discover how good you are with Data & AI in your organisation and areas where you can improve? Find out by taking this FREE Data Mastery Index (DMI) assessment. Click here to access the scorecard assessment and elevate your data strategy today.


Related Articles