Innovation in the field of data is progressing rapidly.
Let’s take a quick look at the timeline of GenAI: ChatGPT, launched in November 2022, became the world’s best-known application of generative AI in early 2023. By spring 2025, leading companies like Salesforce (Marketing Cloud Growth) and Adobe (Firefly) integrated it into mainstream applications – making it accessible to companies of various sizes. Tools like MidJourney advanced image generation, while at the same time, discussions about agentic AI took center stage. Today, tools like ChatGPT have already become common for many private users.
That’s why I have compiled 12 terms that you will certainly encounter as a data engineer, data scientist and data analyst in 2025 and are important to understand. Why are they relevant? What are the challenges? And how can you apply them to a small project?
Table of Content Term 1–6 in part 1: Data Warehouse, Data Lake, Data Lakehouse, Cloud Platforms, Optimizing data storage, Big Data technologies, ETL, ELT and Zero-ETL, Even-Driven-Architecture 7 – Data Lineage & XAI 8 – Gen AI 9 – Agentic AI 10 – Inference Time Compute 11 – Near Infinite Memory 12 – Human-In-The-Loop-Augmentation Final Thoughts
In the first part, we looked at terms for the basics of understanding modern data systems (storage, management & processing of data). In part 2, we now move beyond infrastructure and dive into some terms related to Artificial Intelligence that use this data to drive innovation.
7 – Explainability of predictions and traceability of data: XAI & Data Lineage
As data and AI tools become increasingly important in our everyday lives, we also need to know how to track them and create transparency for decision-making processes and predictions:
Let’s imagine a scenario in a hospital: A deep learning model is used to predict the chances of success of an operation. A patient is categorised as ‘unsuitable’ for the operation. The problem for the medical team? There is no explanation as to how the model arrived at this decision. The internal processes and calculations that led to the prediction remain hidden. It is also not clear which attributes – such as age, state of health or other parameters – were decisive for this assessment. Should the medical team nevertheless believe the prediction and not proceed with the operation? Or should they proceed as they see best fit?
This lack of transparency can lead to uncertainty or even mistrust in AI-supported decisions. Why does this happen? Many deep learning models provide us with results and excellent predictions – much better than simple models can do. However, the models are ‘black boxes’ – we don’t know exactly how the models arrived at the results and what features they used to do so. While this lack of transparency hardly plays a role in everyday applications, such as distinguishing between cat and dog photos, the situation is different in critical areas: For example, in healthcare, financial decisions, criminology or recruitment processes, we need to be able to understand how and why a model arrives at certain results.
This is where Explainable AI (XAI) comes into play: techniques and methods that attempt to make the decision-making process of AI models understandable and comprehensible. Examples of this are SHAP (SHapley Additive ExPlanations) or LIME (Local Interpretable Model-agnostic Explanations). These tools can at least show us which features contributed most to a decision.
Data Lineage, on the other hand, helps us understand where data comes from, how it has been processed and how it is ultimately used. In a BI tool, for example, a report with incorrect figures could be used to check whether the problem occurred with the data source, the transformation or when loading the data.
Why are the terms important?
XAI: The more AI models we use in everyday life and as decision-making aids, the more we need to know how these models have achieved their results. Especially in areas such as finance and healthcare, but also in processes such as HR and social services.
Data Lineage: In the EU there is GDPR, in California CCPA. These require companies to document the origin and use of data in a comprehensible manner. What does that mean in concrete terms? If companies have to comply with data protection laws, they must always know where the data comes from and how it was processed.
What are the challenges?
- Complexity of the data landscape (data lineage): In distributed systems and multi-cloud environments, it is difficult to fully track the data flow.
- Performance vs. transparency (XAI): Deep learning models often deliver more precise results, but their decision paths are difficult to trace. Simpler models, on the other hand, are usually easier to interpret but less accurate.
Small project idea to better understand the terms:
Use SHAP (SHapley Additive ExPlanations) to explain the decision logic of a machine learning model: Create a simple ML model with scikit-learn to predict house prices, for example. Then install the SHAP library in Python and visualize how the different features influence the price prediction.
8 – Generative AI (Gen AI)
Since Chat-GPT took off in January 2023, the term Gen AI has also been on everyone’s lips. Generative AI refers to AI models that can generate new content from an input. Outputs can be texts, images, music or videos. For example, there are now even fashion stores that have created their advertising images using generative AI (e.g. Calvin Klein, Zalando).
"We started OpenAI almost nine years ago because we believed that AGI was possible, and that it could be the most impactful technology in human history. We wanted to figure out how to build it and make it broadly beneficial; […]"
Reference: Sam Altman, CEO of OpenAI
Why is the term important?
Clearly, GenAI can greatly increase efficiency. The time required for tasks such as content creation, design or texts is reduced for companies. GenAI is also changing many areas of our working world. Tasks are being performed differently, jobs are changing and data is becoming even more important.
In Salesforce’s latest marketing automation tool, for example, users can enter a prompt in natural language, which generates an email layout – even if this does not always work reliably in reality.
What are the challenges?
- Copyrights and ethics: The models are trained with huge amounts of data that originate from us humans and try to generate the most realistic results possible based on this (e.g. also with texts by authors or images by well-known painters). One problem is that GenAI can imitate existing works. Who owns the result? A simple way to minimize this problem at least somewhat is to clearly label AI-generated content as such.
- Costs and energy: Large models require a very large amount of computing resources.
- Bias and misinformation: The models are trained with specific data. If the data already contains a bias (e.g. less data from one gender, less data from one country), these models can reproduce biases. For example, if an HR tool has been trained with more male than female data, it could favor male applicants in a job application. And of course, sometimes the models simply provide incorrect information.
Small project idea to better understand the terms:
Create a simple chatbot that accesses the GPT-4 API and can answer a question. I have attached a step-by-step guide at the bottom of the page.
9 – Agentic AI / AI Agents
Agentic AI is currently a hotly debated topic and is based on generative AI. AI agents describe intelligent systems that can think, plan and act "autonomously":
"This is what AI was meant to be. […] And I am really excited about this. I think this is going to change companies forever. I think it’s going to change software forever. And I think it’ll change Salesforce forever."
_Reference: Marc Benioff, Salesforce CEO about Agents & Agentforce_
AI Agents are, so to speak, a continuation of traditional chatbots and bots. These systems promise to solve complex problems by creating multi-level plans, learning from data and making decisions based on this and executing them autonomously.
Multi-step plans mean that the AI thinks several steps ahead to achieve a goal.
Let’s imagine a quick example: An AI agent has the task of delivering a parcel. Instead of simply following the sequence of orders, the AI could first analyze the traffic situation, calculate the fastest route and then deliver the various parcels in this calculated sequence.
Why is the term important?
The ability to execute multi-step plans sets AI Agents apart from previous bots and chatbots and brings a new era of autonomous systems.
If AI Agents can actually be used in businesses, companies can automate repetitive tasks through agents, reducing costs and increasing efficiency. The economic benefits and competitive advantage would be there. As the Salesforce CEO says in the interview, it can change our corporate world tremendously.
What are the challenges?
- Logical consistency and (current) technological limitations: Current models struggle with consistent logical thinking – especially when it comes to handling complex scenarios with multiple variables. And that’s exactly what they’re there for – or that’s how they’re advertised. This means that in 2025 there will definitely be an increased need for better models.
- Ethics and acceptance: Autonomous systems can make decisions and solve their own tasks independently. How can we ensure that autonomous systems do not make decisions that violate ethical standards? As a society, we also need to define how quickly we want to integrate such changes into our everyday (working) lives without taking employees by surprise. Not everyone has the same technical know-how.
Small project idea to better understand the term:
Create a simple AI agent with Python: define the agent first. For example, the agent should retrieve data from an API. Use Python to coordinate the API query, filtering of results and automatic emailing to the user. Implement then a simple decision logic: For example, if no result matches the filter criteria, the search radius is extended.
10 – Inference Time Compute
Next, we focus on the efficiency and performance of using AI models: An AI model receives input data, makes a prediction or decision based on it and gives an output. This process requires computing time, which is referred to as inference time compute. Modern models such as AI agents go one step further by flexibly adapting their computing time to the complexity of the task.
Basically, it’s the same as with us humans: When we have to solve more complex problems, we invest more time. AI models use dynamic reasoning (adapting computing time according to task requirements) and chain reasoning (using multiple decision steps to solve complex problems).
Why is the term important?
AI and models are becoming increasingly important in our everyday lives. The demand for dynamic AI systems (AI that adapts flexibly to requests and understands our requests) will increase. Inference time affects the performance of systems such as chatbots, autonomous vehicles and real-time translators. AI models that adapt their inference time to the complexity of the task and therefore "think" for different lengths of time will improve efficiency and accuracy.
What are the challenges?
- Performance vs. quality: Do you want a fast but less accurate or a slow but very accurate solution? Shorter inference times improve efficiency, but can compromise accuracy for complex tasks.
- Energy consumption: The longer the inference time, the more computing power is required. This in turn increases energy consumption.
11 – Near Infinite Memory
Near Infinite Memory is a concept that describes how technologies can store and process enormous amounts of data almost indefinitely.
For us users, it seems like infinite storage – but it is actually more of a combination of scalable cloud services, data-optimized storage solutions and intelligent data management systems.
Why is this term important?
The data we generate is growing exponentially due to the increasing use of IoT, AI and Big Data. As already described in terms 1–3, this creates ever greater demands on data architectures such as data lakehouses. AI models also require enormous amounts of data for training and validation. It is therefore important that storage solutions become more efficient.
What are the challenges?
- Energy consumption: Large storage solutions in cloud data centers consume immense amounts of energy.
- Security concerns and dependence on centralized services: Many near-infinite memory solutions are provided by cloud providers. This can create a dependency that brings financial and data protection risks.
Small project idea to better understand the terms:
Develop a practical understanding of how different data types affect storage requirements and learn how to use storage space efficiently. Take a look at the project under the term "Optimizing Data Storage".
12 – Human-In-The-Loop Augmentation
AI is becoming increasingly important, as the previous terms have shown. However, with the increasing importance of AI, we should ensure that the human part is not lost in the process.
"We need to let people who are harmed by technology imagine the future that they want."
_Reference: Timnit Gebru, former Head of Department of Ethics in AI at Google_
Human-in-the-loop augmentation is the interface between computer science and psychology, so to speak. It describes the collaboration between us humans and artificial intelligence. The aim is to combine the strengths of both sides:
- A great strength of AI is that such models can efficiently process data in large quantities and discover patterns in it that are difficult for us to recognize.
- We humans, on the other hand, bring judgment, ethics, creativity and contextual understanding to the table without being pre-trained and have the ability to cope with unforeseen situations.
The goal must be for AI to serve us humans – and not the other way around.
Why is the term important?
AI can improve decision-making processes and minimize errors. In particular, AI can recognize patterns in data that are not visible to us, for example in the field of medicine or biology.
The MIT Center for Collective Intelligence published a study in Nature Human Behavior in which they analyzed how well human-AI combinations perform compared to purely human or purely AI-controlled systems:
- In decision-making tasks, human-AI combinations often performed worse than AI systems alone (e.g. medical diagnoses / classification of deepfakes).
- In creative tasks, the interaction already works better. Here, human-AI teams outperformed both humans and AI alone.
However, the study shows that human-in-the-loop augmentation does not yet work perfectly.
Reference: Humans and AI: Do they work better together or alone?
What are the challenges?
- Lack of synergy and mistrust: It seems that there is a lack of intuitive interfaces that make it easier for us humans to interact effectively enough with AI tools. Another challenge is that AI systems are sometimes viewed critically or even rejected.
- (Current) technological limitations of AI: Current AI systems struggle to understand logical consistency and context. This can lead to erroneous or inaccurate results. For example, an AI diagnostic system could misjudge a rare case because it does not have enough data for such cases.
Final Thoughts
The terms in this article only show a selection of the innovations that we are currently seeing – the list could definitely be extended. For example, in the area of AI models, the size of the models will also play an important role: In addition to very large models (with up to 50 trillion parameters), individual very small models will probably also be developed that will only contain a few billion parameters. The advantage of these small models will be that they do not require huge data centers and GPUs, but can run on our laptops or even on our smartphones and perform very specific tasks.
Which terms do you think are super important? Let us know in the comments.
Where can you continue learning?
- Book – The data lakehouse for dummies
- Medium – SQL and Data Modelling in Action: A Deep Dive into Data Lakehouses
- Gartner – Top 10 strategic technology trends
- AWS – Start your journey with AWS
- DataCamp Course – AWS Concepts
- Snowflake Blog – Avro vs. Parquet
- Medium – Why ETL-Zero? Understanding the Shift in Data Integration
- [AWS Blog – What is event-driven architecture?](http://What is an Event-Driven Architecture?)
- Medium – Can you trust AI-Models Without Explainability? Introduction to XAI
- IBM Blog – Agentic AI: 4 reasons why it’s the next big thing in AI research
- MIT Management Sloan School – Study about Humans & AI
- Blog Sam Altman – Reflections about AI

All information in this article is based on the current status in January 2025.