The world’s leading publication for data science, AI, and ML professionals.

Growing a Data Career in the Generative-AI Era

Raising awareness for learning core data concepts

Raising awareness about learning three fundamental data concepts

As a data professional, I am just amazed by all the recent developments in the area of generative AI.

While some call it hype and are willing to quickly write it off as just another tech trend, others are convinced it is a game-changer.

Regardless of which stream you support, it is hard to ignore the transformational possibilities generative AI can bring to the future of education and the workplace.

To back up this statement, it is enough to mention that Harvard University is introducing an AI chatbot into classrooms this fall (fall 2023) to approximate a one-to-one teacher-student ratio. The students will use the Harvard-developed chatbot to guide them to solutions rather than to provide them with straightforward answers.

For me, this is a clear indicator that Harvard is triggering a wave of change in how the new generations will learn and, consequently, work.

Meaning, generative AI is not just a passing trend, and we need to start finding a way to adapt our working environments to it.

Despite my enthusiastic view of generative AI, I have never had such FOMO before.

In other words, although I have navigated through various data roles in the past 12 years and gained knowledge of machine learning concepts, I am not able to keep up with the new developments in the generative AI area.

The new terminology, the concept of prompt engineering, the development of new large language models, numerous apps and solutions built on top of them, new e-learning courses, and the sheer volume of posts on this topic – all of this is simply overwhelming.

Moreover, I can’t shake off the unsettling feeling that some of my data skills are now just, well, obsolete.

The idea that my business colleagues will replace my hard-earned query skills with a few keystrokes is scary.

However, when giving it a second thought, I have to admit that I don’t even mind the fact that some (but only some) of my skills will be replaced. Executing ad-hoc queries several times per week to answer the same repetitive business questions is something I never liked to do.

Among others, I am aware that "me" being in between the data stored in the data warehouse and the generation of business insights is just slowing down the decision-making process.

The other thing I am aware of is that this transition, i.e., my substitution, won’t happen overnight.

First of all, the current development environments need to be adapted, i.e., they need to be more "business-user friendly", and less "developer-friendly".

Second, the business users will need to gain a technical understanding of what is "behind the hub". The freedom to generate analytical insights from natural text entries comes with the same issues.

Problems like slow insight generation, incorrect insight generation, enrichment of the insights without new inputs (new data sources), and the technical process of insight quality checks will still exist.

And someone will still need to handle and "fix" these problems for the business users.

In other words, generative AI won’t be able to easily replace fundamental data knowledge.


So, what do I mean by "foundational" data knowledge?

To back up my answer with the above-listed problems, it comes down to three core concepts:

  1. Building Data Architecture

Argument: Technical knowledge and an understanding of how to design an appropriate data architecture in a specific industry are crucial.

Let me give you an example from the fintech industry.

In the fintech industry, there are strict regulations, i.e., the PCI Data Security Standards, that need to be considered when building a data platform. On top of these standards, sometimes there are market-based standards.

For example, in Switzerland, among others, there are FINMA regulations that need to be taken into consideration to make your data platform, and consequently your data architecture, compliant.

Of course, the regulations are prone to changes, implying that the data architecture needs to follow these changes. And this imposes a real challenge for generative AI.

Generative AI can support architecture design and development up to a specific level.

But it is not able to design customizable architectural solutions in industries where regulations are changing.

It does not possess the ability to apply specific architectural adaptations if it’s not trained on similar historical examples.

2. Data Quality Management

Argument: The saying "garbage in – garbage out" will always be valid, and everyone working in the data world knows exactly what the cost of poor data input quality is.

Using generative AI solutions, the cost of poor output quality is even higher.

For example, I need to refer myself to the recent article I read in the Guardian. It was an article about a lawyer using ChatGPT to provide examples of similar previous legal cases. He wanted to back up his argument about why his client’s lawsuit against the aviation company should not be dismissed.

I think you can already imagine how the story goes: when the airline’s lawyers checked the cited decisions and legal citations, they found out none of them ever existed. In short, ChatGPT was hallucinating.

To derive my conclusions from this article, poor data quality outputs could cost you a business and lead to a complete project shut-down or losing your clients and reputation.

Hence, data professionals will be even more busy managing the quality of the data inputs and outputs.

3. Data Privacy and Security

Argument: As a data professional, you are aware of the concepts of SQL injections and database security.

With the developments in generative AI and the simple usage of prompts, data warehouse attacks and data breach scenarios are more likely than ever to happen.

The danger of prompt injections — e.g., the possibility that with one text input, someone could potentially drop the whole database or retrieve confidential records—needs to be placed at the centre of data security.

Meaning that data and IT professionals will continue to play crucial roles in protecting and securing the data.


To summarize: data professionals with knowledge of foundational data concepts will stay in the workplace as "constants" that will manage data efficiently, identify issues, and optimize solutions to be compliant, secure, and reliable.

This is the part that generative AI will not be able to easily replace.

So, if you are a young professional seeking advice on how to grow your data career in the generative AI era, start by learning the above-listed core concepts.

Trust me on this: investing time and resources to acquire fundamental data knowledge will pay off long-term in your data career.

Generative AI will boost your learning curve and work performance in these areas, but it will only help you up to a certain level. The "important" work will still be up to you and your knowledge.


Related Articles