The problem with AI developer tools for enterprises (and what IKEA has to do with it)

Published in

Towards Data Science

12 min readAug 18, 2020

Over a year ago I shared my thoughts on why most startups that focused on deep learning tools for enterprises would fail. That post got a lot of attention (with help from Peter Norvig posting a video about it), and more than one founder of said startups told me that their investors made them write a response. I apologize for any inconvenience caused, but more scrutiny in this space definitely helps.

Unfortunately, the landscape hasn’t changed much since then. The way I lovingly describe most AI developer stacks these days is that they are “like DIY craft kits, with the instructions and 70% of the parts missing”. Innovation takes time.

In an effort to explain how we got here, and help guide where we need to go, I summarize three major challenges that are common in the early stages of the technology innovation lifecycle. Most importantly, we haven’t converged on a dominant design for ML Platforms, which leads to a proliferation of differently scoped and shaped systems with ill-defined interfaces. As a result, it has proven prohibitively hard to create appropriate form factors for AI developer tools targeted at enterprise users. Finally, and this is where IKEA comes in, the predominant way of how these tools are consumed by enterprises today tragically suffers from the IKEA effect.

**Figure 1**: ML Platforms usually come with lots of pieces missing and little to no assembly instructions (Image by author, logos from open source projects Spark, TensorFlow, Airflow, Kubernetes, and Docker)

“AI developer stacks today are like DIY craft kits, with the instructions and 70% of the parts missing.”

AI Developer Tools are lacking a Dominant Design

The emergence of new technologies is usually accompanied with subsequent phases of expansion and contraction in the number of possible solution designs. It is no longer contentious that AI will transform many industries, often becoming a strategic advantage and even creating new “AI first” business models and companies. As a result, all major Cloud vendors (and countless startups) are piling on resources to bring AI developer tools to a broader audience, most importantly big enterprises. All of these vendors broadly attempt to solve the same user needs, but with distinctly different approaches and outcomes, leading to a proliferation in different designs.

Dominant Design in ML APIs

This phenomenon exists at every level of the stack and usually progresses from the bottom up. As a PM on the Google Brain team in 2016, I remember trying to rationalize over 20 different high-level Python APIs that had emerged for TensorFlow within Google. Eventually, we converged on the Estimator and Layers APIs (which merged with Keras in TensorFlow 2.0).

Once one of the design’s user adoption surpasses its competitors by a significant enough margin, it becomes the standard (or “dominant design”) and other players in this space conform to it, e.g. see equivalent APIs in PyTorch. The need for convergence on a dominant design is especially evident with platform products where it is prohibitively expensive to maintain countless competing designs, as is the case with ML frameworks where Data Scientists, ML Engineers, ISVs, Educators, etc. can’t cope with 100’s of overlapping and incompatible APIs.

Dominant Design in ML Platforms

When we talk about “AI developer tools for enterprises” we are really talking about an emerging class of technology called “ML Platforms”. The fact that we are missing a dominant design for ML Platforms also means that there is no generally accepted definition, so I’ll just give you a very basic one: An ML Platform is a horizontal technology (i.e. not specific to a vertical use case) that provides all of the capabilities to cover the full lifecycle of ML applications. A graph from my previous blog post helps illustrate some of the different components of such a platform.

**Figure 2**: A simplified overview of the different components considered to be part of an ML Platform (Image by author)

At this point in time, there isn’t even broad agreement in the industry on the scope of ML Platforms, i.e. where they begin and end. E.g., some ML Platform products are entirely lacking capabilities from the Pre-Training (Data Prep) category.

There are several reasons why we haven’t reached a dominant design for ML Platforms. To name just the most important ones:

Underlying technologies haven’t matured. Many of the technologies leveraged within an ML Platform are themselves early in their lifecycle. It is hard to build an ML Platform that provides continuously updated ML models when the ML framework used for said models makes backwards incompatible changes to its checkpointing format. Imagine trying to build the UI for a web app without the backend APIs having been defined. You will inadvertently have to go back and change things as the APIs evolve, and in the ML space APIs evolve rapidly.
ML Platform creators don’t know what they don’t know. I have spent countless hours talking to engineering teams who had grand plans for building the canonical ML Platform. In most cases they had a mental model of only about 20% of what was required to build an ML Platform and, as a result, vastly underestimated the difficulty of what they were embarking on. A quick anecdote of how I used to make this point at Google might help: If a random Google engineer wants to build a driverless car and asks their Director for headcount, the typical response is “That’s crazy hard and we are already investing massive resources into this; go work at Waymo”. However, if a random Google engineer wants to build an ML Platform and asks their Director for headcount, the typical response is “Sounds great, here have two engineers”. Of course, we eventually got to the point where the effort of building an ML Platform was generally recognized as being more like that of building a driverless car (with some level of exaggeration to make my own job sound more important), and most of Alphabet started using TFX.
ML Platform consumers don’t know what they don’t know. Especially in the enterprise space, there are many companies who buy so-called “ML Platforms” without knowing the features they should be expecting or the questions they should be asking. It is almost impossible for enterprises to evaluate these offerings because they all sound the same but provide vastly different sets of features. The way that a customer once put this to me is that “there is an equivalence problem in the ML Platform space”, meaning that every product they get pitched sounds equivalent and they don’t know the differences until it’s too late. Below is a graph I used in a talk at MLSys (previously SysML) earlier this year to make this point.

**Figure 3**: A tongue-in-cheek illustration of the difference between the perceived and actual overlap of TFX, Kubeflow, and MLflow (taken from my talk at MLSys in March 2020) (Image by author)

AI Developer Tools have a Form Factor Problem

The previous section should already spell trouble for the attentive reader. If we haven’t converged on a dominant design, how can we agree on the appropriate form factors? First, let me explain what I mean by form factor in this context. Usually, this term is used to refer either to electronic components (e.g. motherboards) or, closer to how I use it, to different incarnations of how technology is packaged for users. E.g., the iPhone defined the predominant form factor for smartphones. I misappropriate this term to summarize everything you would consider when you talk about the “product surface”, “user experience”, or “developer experience”. What do developers actually interact with when we say they are using an ML Platform?

Right now, the form factor for most AI developer tools is like the Wild West of different API surfaces and services. Let me illustrate this with an example. To cover a minimal set of technologies you would need to train and deploy an ML model, you could:

Use a data engineering product like Spark for wrangling data.
Use a library like TensorFlow for training your ML model.
Use Docker for packaging up those models.
Use Kubernetes to orchestrate those Docker containers.

You could make the argument that there should be a separation of concerns; that Data Engineers should write the data pipelines, Data Scientists should train the models, and Software Engineers should write the deployment systems; and that an ML Platform couldn’t possibly provide all of these capabilities. But what I have seen time and again in enterprises is that this artificial separation of concerns (which is the result of technology boundaries drawn well before ML Platforms emerged) leads to significant slowdown, costly mistakes, and overall higher failure rates of ML projects.

We can’t assume that tools and processes created decades ago for Software Engineering are magically transferable to ML. E.g., you shouldn’t just check your ML model artifacts (which can be rather large in size and contain sensitive data) into version control systems intended for code. That’s why, at Databricks, we built the MLflow Model Registry to manage the versioning and deployment lifecycle of ML models. If you want to empower your Data Scientists or Software Engineers to manage the full ML lifecycle, these tools need to be accessible to a wide range of users, not just DevOps experts. At companies like Google it is not unusual for a single person to own the full lifecycle from data pipeline to model deployment. Others have come to the same realization and are creating a broad “ML Engineer” role for this very reason.

“We can’t assume that tools and processes created decades ago for Software Engineering are magically transferable to ML.”

Attempts at more consistent Form Factors

Some vendors, realizing that the target audience for solutions that require you to master Spark, TensorFlow, Docker, and Kubernetes, is limited, have attempted to create different form factors that abstract away this complexity. However, most of them fail in painful ways. Let me provide two illustrative examples:

SQL ML: There are products, which shall remain unnamed in this post, that claim they “make machine learning as easy as writing SQL queries”. However, in order to do so, they let you register a snippet of Python code as a procedure, or they simply mirror the same Python APIs in SQL (e.g. to define layers of a neural network in your SQL query). Needless to say, just allowing someone to register Python code and call it from SQL isn’t really achieving anything new. In fact, it just makes everything harder (like debugging your Python code). And if you are using hardware accelerators (e.g. GPUs), you can see what I mean by violating the basic principles of levels of abstraction: Now your SQL query will throw errors that are specific to the hardware you run it on. Or, even worse, it will just silently fail and you have to go hunt for log files.
WYSIWYG/UI ML: Another class of products tries to provide no-code solutions for so-called citizen data scientists. Nice UI-based workflows are meant to guide users through the typical Data Science & ML model building steps. I have observed two common failure modes for this type of product: (1) At one or more steps in the workflow, commonly the modeling step, they require users to specify low-level parameters like L1 regularization. Requiring knowledge of what L1 regularization is, or how to pick a good value for it, misses the whole point of building an UI-based ML product. (2) In most cases, these tools only solve for the highest level of abstraction and don’t provide an “escape hatch” for when users reach its limits. As a result, many enterprises find UI-based ML tools falling short of solving real life use cases. These two reasons result in a typical product/market mismatch and most tools in this category don’t gain much traction beyond toy demos and POCs.

The underlying issues here are, of course, ill-defined boundaries (because of the missing dominant design) and the creation of new form factors without respecting levels of abstractions. Much of this has to do with the speed of which ML tools are evolving, usually driven by research findings and open source contributions at the detriment of strict engineering principles (which would slow progress down). This balance between flexibility and stability is a function of the stage in the lifecycle of a technology, and it means that there will likely be significant changes in these tools as they mature. To give just one example for which I take partial blame: There used to be a TPUEstimator in TensorFlow. The Estimator API is fairly high level, and the TPUEstimator didn’t even try to hide the fact that it was making assumptions about the hardware it ran on (TPUs); a clear violation of levels of abstractions. In newer versions of the API the hardware assignment happens at a lower API level (as a distribution strategy).

Beware the IKEA effect when you pick an AI stack

Finally, with neither a dominant design nor an appropriate form factor, it is not surprising that many enterprises are struggling with adopting ML Platforms, let alone transforming their companies into an “AI first” business model. Those who try often suffer from the IKEA effect.

AI developer tools are slowly starting to find widespread use. Engineers love building things and they love acquiring new skills. As a result, many engineers take way-too-low-level ML courses online. I usually tell anyone who wants to listen that no Data Scientist or ML Engineer these days needs to know how things like backprop actually work. Yet, many people take online courses that teach just that. (To be fair, I too learned how to write most of the popular ML algorithms from scratch. But, then again, I am a PM for AI developer tools). All of these engineers, emboldened by their newly acquired knowledge about the nitty gritty details of ML, then go out and try to apply them to their enterprise businesses problems. This is where the IKEA effect comes in.

The IKEA effect refers to the phenomenon that people attribute more value to products they helped create. It turns out that this effect applies broadly to all kinds of products (furniture, cake mixes, toys, etc.). What I am conjecturing is that the same effect is predominant in companies with a strong engineering culture. An engineering team that built their own ML Platform from the ground up, flawed as it may be, will attribute more value to it than if they just bought something out-of-the-box from a vendor. They give it a fancy name, write blog posts about it, and everyone gets promoted.

Of course, the same applies to any kind of new technology. However, no one these days would say “let’s build our own database from scratch”. The particular challenge with ML Platforms is that, because we lack a dominant design and common form factor, and people don’t know what they don’t know, it is far too easy to think that you can build something meaningful with just a few engineers. As the story goes, a software engineer goes and asks their Director for headcount to build an ML Platform…

“An engineering team that built their own ML Platform from the ground up, flawed as it may be, will attribute more value to it than if they just bought something out-of-the-box from a vendor.”

What will AI Developer Tools look like in 10 Years?

Finally you may ask yourself, what will AI developer tools look like 10 years from now? If you’ve been around long enough you’ll probably say something like “OK, I got it Clemens, these are all common problems of any new technology. You could have written the same blog post about distributed data processing engines 20 years ago”. And I’d say “Thank you, that’s exactly my point”. If you think that, 10 years from now, millions of people will use low-level Python APIs to specify their exact model architecture (“I wonder if a skiplayer or a convolution would help here?”) and fiddle around with hundreds of parameters, I’d bet that you are wrong. In fact, in an ideal world, the entire process of building “data driven applications” (which is really the broader category that ML models fall under) is just a common part of any software engineer’s job without having to earn a PhD in AI or master Kubernetes.

At the risk of stating the obvious, here are my expectations of what will happen in the next couple of years to address the aforementioned challenges:

We will converge on a dominant design for ML Platforms. What do we think is “inside the box” when we talk about ML Platforms? Today, many vendors focus only on the ML training part, forgetting that most time in ML is spent with data wrangling. Most likely, one product will gain traction and lead the way in defining the category. Many other vendors will exit the market and others will conform to the dominant design.
There will be a few meaningful form factors for different target audiences. We don’t need to have a single form factor. In fact, I’d argue that it is desirable to have different layers of abstraction. Each layer needs to be well defined and abstractions shouldn’t leak between layers. I would argue that we haven’t seen a good example of the highest level of abstraction (e.g., SQL or UI-based ML).
Enterprise customers will realize that building their own ML Platforms is not their comparative advantage. Of course, every company can hire an engineering team and try to build their own ML Platform. However, with a dominant design in place, it will be more obvious how futile of an effort this is, and that it is not (and should not be) the core competency of most companies. For most enterprises, value comes from applying ML Platforms to their business problems, not from building their own ML Platforms and maintaining them.

As you may be able to guess, I have opinions on what a dominant design and good form factors for ML Platforms look like. If you are interested in solving these challenges, and defining the future of ML Platforms, I happen to be hiring an ML Platform Product Manager.

Clemens Mewald leads the product team for Data Science and Machine Learning at Databricks. Previously he spent four years on the Google Brain team building AI infrastructure for Alphabet, including TensorFlow and TensorFlow Extended (TFX).