A (Philosophical) Perspective on Skills Gaps in AI

What separates junior machine learning practitioners from senior solution architects in a fast-moving industry?

Published in

Towards Data Science

8 min readSep 29, 2023

Frustrations are natural in machine learning, but also avoidable. Photo by Tim Gouw from Pexels.com.

Although there is no shortage of AI courses available, we often find ourselves at a loss as to why many of our applicants are missing seemingly critical capabilities. This article is an anecdotal exploration as to why that may be.

Background: Mental Models & T-Shaped Skills
What’s Different in Machine Learning
The Perceived Cost of Missing Knowledge
Recommendations

Recently, a few projects of ours had clients asking about project handoffs to not-yet-existing internal teams. “How do we train our team to own the solution that you built?” “How can we ensure future-proofing our team with changes in AI?” Variations on these questions were, for the most part, answered with recommendations or change management plans, but a key theme remained: “How can we hire the right team members in AI?”

That question, especially for an enterprise AI consultancy such as ours, is a key one. The ability to identify, recruit, train (and monetize) employees in a highly dynamic industry requires a tremendous amount of effort. Even riskier is the lack of uptake of ancillary skills required for successful project delivery, such as requirements management, client communication, and project tracking.

What we often see as the biggest blocker in our employees and clients is a lack of very discrete and specific skills, usually covering only two or three missing areas of expertise — yet enough to grind their entire project to a halt. This article asks how that manifests itself.

Background: Mental Models & T-Shaped Skills

To illustrate how missing skills can cause abrupt project stoppage and how to address them, there are two simple frameworks to help navigate the question. They are:

Mental models, which are an abstraction of a system, concept, or pattern; and
T-shaped skills, which are an individual’s capacity to have general skills across multiple domains while being highly specialized in one or a few of them.

Mental Models

Mental models are independent representations intended to help interact with constructs, concepts, and systems — of which there are no shortages of in machine learning. Simply, they help compartmentalize ideas.

From the research paper “Mental Models: An Interdisciplinary Synthesis of Theory and Methods”, Natalie Jones and her team push the explanation further:

Mental models are personal, internal representations of external reality that people use to interact with the world around them. They are constructed by individuals based on their unique life experiences, perceptions, and understandings of the world. Mental models are used to reason and make decisions and can be the basis of individual behaviors. They provide the mechanism through which new information is filtered and stored.

Here’s how you know you’re working with a proper mental model:

It is finite.
While finite, it is also complete in its representational objective.
It allows for black box constructs for simplifying ideas at the correct level of abstraction.

However, mental models (and importantly, mental model stacking) has limitations. From N. Jones again:

Peoples’ ability to represent the world accurately, however, is always limited and unique to each individual. Mental models are therefore characterized as incomplete representations of reality. They are also regarded as inconsistent representations because they are context-dependant and may change according to the situation in which they are used. In essence, mental models have to be highly dynamical models to adapt to continually changing circumstances and to evolve over time through learning. Conceptualizing cognitive representations as dynamic, inaccurate models of complex systems acknowledges the limitations in peoples’ ability to conceive such complex systems.

As such, our ability to conceptualize, understand, analyze, investigate, prototype, build and deploy any level of machine learning-based automation requires conceptual fluidity across many knowledge domains, both on the technical and administrative sides of the project.

T-Shaped Skills

In modern society, the term “T-shaped skills” is a bit of a misnomer; useful are multi-pronged individuals with depth of specialization in multiple domains.

An individual with T-shaped skills typically will have general knowledge across multiple related knowledge domains while having depth of specialization in a given topic or function.

With the rise of machine learning engineering (that is the real-world, risk-aware application of scientific machine learning principles), the need for concurrent multidisciplinary capabilities is clear.

Another way of describing a T-shaped individual is someone that, in the context of a project or set of responsibilities, is capable of successfully addressing multiple required functions, and is an expert in some of them. Dangerous across all of the work, and deadly on some of it.

How these individuals often manifest themselves is by having an overall grasp of their entire scope of work. Although they may not be experts in certain aspects of their assigned tasks, they at least know how to compartmentalize these less comfortable tasks into work items with clear input and output boundaries to the rest of the system; they therefore have visibility and capacity on the entire project.

Although they’ve never interacted with a duck before, they know how the duck should look and how the duck should quack, which is sufficient to not be blocked.

What’s Different in Machine Learning

Compared to a Cloud migration project or to a SaaS, machine learning tends to have many concepts sequentially stacked (as opposed to concurrent or tree-like), as well as a combination of additional considerations specific to production-level machine learning deployments. Deployments depend on model types, model types depend on data science, and exploratory data analysis depends on project requirements.

In the paper “Quality Assurance Challenges for Machine Learning Software Applications During Software Development Life Cycle Phases” (Alamin2021), the author explains a clear difference between traditional software development and machine learning software applications. From the paper:

In traditional software development, first we gather requirements. We then design, develop, test, deploy and maintain the application. For ML systems, we still need to scope out the goal of the application, but instead of designing the the algorithm we let the ML model learn the desired logic from data [1]. Such observations lead to the question of whether and how ML models can be adopted without disrupting the software development life cycle (SDLC) of the [machine learning software applications]. Ideally, ML workflow/pipeline and SDLC phases should go hand in hand to ensure proper quality assurance. However, as we noted above, such expectations can be unrealistic due to the inherent differences in how ML models are designed and how traditional software applications are developed.

(Lwakatare2019), in her paper “A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation”, goes even further:

[…] Although in academia much focus is given to theoretical breakthroughs of learning algorithms, empirical studies show that they constitute only a small part of the operational ML system [20].
As a consequence, several challenges are encountered in practice during development and maintenance of ML systems [6]. To address the problem, emerging evidence highlights the need to take into consideration and extend established software engineering (SE) principles, approaches and tools in development of ML systems [11,19].

So, we can describe an AI project as requiring slightly more mental models to complete a similarly sized project. but AI software typically requires more sequential stacking of these principles, whereas Cloud and SaaS seem to have more concurrency in their ideas, which leads to less critical interdependencies between them.

The Perceived Cost of Missing Knowledge

Let’s take a simple project model, where a series of activities across 9 hypothetical domains needs to take place. These domains can represent a mixture of project management, requirements engineering, data science, machine learning, Cloud, and MLOps skills. Although quite simplified, most projects that we come across have a similar sequential change of expertise.

However, with the emergence of new technologies, sometimes there are new technologies that you would be required to use. Sometimes these are small changes (replacing Git LFS with DVC, for instance), and sometimes they are much, much larger (like committing to Kubernetes from a monolithic VM approach). Ideally, you and your team would be entirely competent to do all of these tasks; realistically, with the rate of change in this industry, you’re likely familiar with most of them, and some being either new or not fully mastered.

In this instance, there is a single point of missing knowledge.

This is, I would argue, a very normal operating condition. A client wants to try a new library; someone suggested a different database. Happens all the time. The module that requires a change is can be mentally hot-swapped with no major consequences other than reading some APIs, or learning a different project management approach.

Where there are issues within MLOps is that often, two or more problem areas cause a much larger perceived lack of knowledge within a project.

Although there are only two problem areas, the perceived lack of effectiveness touches a majority of your assigned activities.

A handful of missing concepts will manifest themselves as a frustrating inability to deliver or to be effective within a project.

Recommendations

To be clear, our entire team (myself included) is continuously and proactively learning new concepts and technologies to ensure that there are no unexpected topics or areas of knowledge that we are completely blind to. We usually follow the following sequence to uncover them:

Identify. As soon as you hear of a new technology or approach that will be used, make a note of its name, its description, and its context.
Isolate. Although it’s difficult to identify “unknown unknowns” in missing knowledge, we ask simple questions to resolve them: What is the context of this concept? What is it similar to what I already know? How is it different from what I already know?
Start small. “Hello World” examples are still a valid way to ensure that you have some effectiveness in using a particular tool.

Instead of focusing on the totality of your skills as a measure of progress, you should look at the combination of your skills and how you apply them together to deliver project value and success.