The world’s leading publication for data science, AI, and ML professionals.

Tesla AI Day: Optimus Bot Was Better Than Anyone Expected

But nowhere near solving the truly hard challenges of robotics

Today I bring you an analysis of Tesla’s autonomous humanoid robot, Optimus, unveiled on Friday at AI day 2022. As I always try to do, this is a nuanced take that highlights the good – and the bad.

Last year’s AI day was especially exciting because Musk revealed Tesla was working on Optimus, a robot intended to "eliminate dangerous, repetitive, and boring tasks," and capable of following orders expressed in natural language like, "pick up that bolt and attach it to the car with that wrench." (Remember Google’s PaLM-SayCan?)

Musk also promised they’d have a working prototype for 2022’s AI day, which – given the unrealistic deadline -, hyped some and reminded others of his tendency to overpromise and underdeliver. I was in the latter group.

Before we dive into it, let me remind you that AI day is explicitly intended for recruiting purposes: The target audience isn’t journalists or investors, but engineers.

This means that most news outlets’ reviews of the event will be limited in describing the implications and non-incisive in analyzing the shortcomings. I’ve read a lot of comments that say what Tesla showed on Friday wasn’t impressive (e.g., because it’s years behind Boston Dynamics (BD). More on this later). Well, I’m impressed – although only partially.

But I’m only impressed because I didn’t expect much.

Let’s start with the brighter side.

This article is a selection from The Algorithmic Bridge, an educational newsletter whose purpose is to bridge the gap between algorithms and people. It will help you understand the impact AI has in your life and develop the tools to better navigate the future.

The Algorithmic Bridge

5 key features that make Optimus stand out

Five minutes in, amidst cheers and applause, Musk gave way to the Tesla bot. The doors opened and a standing human-height shell-less robot with visible cables and motors entered the stage. It walked, turned, waved at the audience, and performed a little dance (a nod to last year’s show?).

As Musk later explained, this wasn’t Optimus, but a development platform called Bumble C built from off-the-shelf parts and used to design and test the robot’s systems and modules (they spent 6 months working on this). Optimus is the next generation production-ready evolution of Bumble C. Throughout the article I’ll refer just to Optimus to avoid unnecessary confusion.

They then showed a few videos of Optimus doing more complex tasks than walking and waving: transporting boxes with its hands, grabbing and using a watering can, and holding and moving a seemingly heavy metallic object. Although Robotics experts have described this as standard for humanoid robots, there are a few characteristics that make the Tesla bot worth keeping an eye on.

Optimus is heavily inspired by the human body

The world is made by humans for humans. That’s the strongest argument for building a humanoid robot instead of any other shape. The flexibility and versatility of the human body are unmatched. Optimus’ joints and actuators’ design reveal the intention of mimicking our anatomy. For instance, the bot’s hands have five independently-moving fingers and opposable thumbs, and Tesla claims they can grasp adaptively – with strength or precision – and can use tools.

But evolution is a continuous optimization mechanism that searches randomly, which results in the necessity of trade-offs that we can avoid with intelligent design: Blindly following biology’s steps isn’t necessarily the best approach. The most common example is a plane. Optimus isn’t different. The human body has many, many degrees of freedom, but there’s no need to go to the last mile to implement such sophistication into a robot. Optimus has 28 + the hands (11), which is sufficient for it to be able to move the arms, the hands, and each finger independently, showing reasonable dexterity.

Efficiency-wise, biology isn’t the best master either. The human body isn’t as efficient as it could be (we spend way too much energy while laying down), which is also a focus for Tesla engineers: If you want to design a robot intended for mass production at scale, cost and efficiency are priorities.

The bottom line: The key is finding a balance between a "true human form" robot, cost, and efficiency.

Optimus is powered by last-gen AI

As Musk argued last year, Tesla isn’t a car company, but "arguably the largest robotics company in the world." Making self-driving cars is easier than making humanoid autonomous robots but both are robots that belong to the category of real-world AI. Milan Kovac, director of engineering for Autopilot at Tesla, said that going from self-driving cars to Optimus is like "moving from a robot on wheels to legs."

This means that the AI that powers self-driving cars – even if incomplete and imperfect – can be "easily" transferred to Optimus. This is a feature that sets Optimus apart from its more agile and physically capable cousins. BD’s Atlas is years ahead in terms of athleticism but it isn’t AI-powered in the same way Optimus is. Preprogrammed robots aren’t well-suited for open-ended tasks because they can’t react to an ever-changing world. In this regard, Optimus is closer to Google’s PaLM-SayCan, which is also heavily reliant on the power of AI.

Given Optimus’ broad scope, it’s hardly comparable to most other humanoid robots out there. It competes at the physical level (motor skills and dexterity) and at the cognitive level (how it perceives, plans, and navigates the world). As Musk said, other humanoid robots are "missing a brain, they don’t have the intelligence to navigate the world by themselves."

Optimus’s rate of progress is unprecedented

People keep repeating that BD is years ahead of Tesla, but testing Optimus against Atlas isn’t an apples-to-apples comparison. First, the goals are very different (BD is focused on "athletic intelligence" whereas Tesla wants a general-purpose robot). And second, Tesla engineers have made an Optimus prototype in barely a year whereas BD’s Atlas is 10+ years old.

One of my (failed) predictions was that Tesla wouldn’t have a working prototype ready for this year’s AI day. It’s true that the robot isn’t near ready for production and lacks some features that Musk promised (more on this later), but it works nonetheless. If the Optimus revelation was impressive for one reason, is this: In barely 6–8 months they managed to make a working robot with off-the-shelf parts. If we extrapolate this pace of development to the future, in five years we may be truly impressed.

Leveraging Tesla’s expertise and competences

But how was this possible? How did the engineering team manage to pull off such a feat in less than a year?

Musk explained how the vertical integration nature of Tesla would allow them to transfer and integrate the technology used for self-driving cars into a humanoid body. Tesla’s engineering expertise and infrastructure are well-suited for the challenge: Focus on autonomy, supercomputing, "neural nets to recognize the world," and generally on real-world AI – sensors and actuators.

Tesla engineers have used structural analysis simulations (car crashes) and damage control to find the rupture threshold for the joints and parts. They’ve leveraged their manufacturing knowledge to evaluate the best standard designs for the actuators (reduced to just 6 different elements), they’ve transferred the FSD hardware and software into the robot, and they’ve integrated trained neural networks into the bot’s vision modules. From robots on wheels to robots on legs.

Making Optimus scalable: Cost and efficiency

Musk emphasized that one key distinction between Optimus and other "impressive" humanoid robots is that the Tesla bot is designed to be produced at a mass scale. This makes low-cost and high-efficiency top-priority requirements. Robots like BD’s Atlas are, as Musk puts it, "very expensive and made in low volume." Tesla wants Optimus to be an "extremely capable robot made in very-high volume." And he added they want to build "millions of units" that would "cost much less than a car … probably less than $20.000."

BD’s Atlas is a research platform that costs a lot of money (I remember seeing $1 million somewhere but can’t find the source). BD’s robotic dog, which is a consumer product, costs $75.000. That’s 3X what Musk has promised for Optimus in a few years (I take this prediction with a massive grain of salt. Even leveraging Tesla’s technology and with scale production as the north star I can’t see how they’ll manage to reduce the cost so much).

But Atlas isn’t intended for production (another argument to stop comparing them). In this regard, Optimus is closer to Agility Robotics’ Digit, a robot intended to perform factory work at a mass scale. Agility Robotics, now backed by Amazon, has a few-year advantage over Tesla but lacks the latter’s optimized infrastructure. Optimus and Digit could become close competitors in the near future.

Optimus tested against the real world – and my predictions

Now let’s dive into the not-so-bright side: The limitations and deficiencies of Optimus, both those obvious after watching AI day and those not so obvious. The fast turnaround achieved by Optimus’ engineering team is great but, in the absolute sense, it’s still insufficient.

I wrote above that I failed to predict that Tesla wouldn’t have a "working prototype" ready for 2022. In reality, I only mispredicted partially. The reason is I had a very different idea of a humanoid robot in mind when I made my prediction than what Optimus is as of now.

In an article I wrote last year entitled "Why Tesla Won’t Have an Autonomous Humanoid Robot in 2022" I laid out arguments that explain why building a truly (this nuance is important) autonomous humanoid robot is a daunting challenge. I also described which features Optimus would require to solve even the simplest real-world task.

In this section I’ll outline those features and explain why Optimus isn’t anywhere ready for the real world (and won’t be soon), using what we know so far about it (and using my commentary from last year’s article).

We live in a multimodal world

_"Humans can perceive colors, textures, flavors, odors, temperature, pressure… Our brain is multisensory. Our perceptual systems capture the multimodal nature of the world and our brain integrates it into a single representation of reality. When you eat an apple you can see its reddish tone, taste its sweetness, smell its fragrance, and feel its soft touch. All that is present at the same time._

Optimus isn’t meant to taste or smell, but, at the very least, it’ll need vision, tactile and haptic (pressure) sensors, proprioception – the ability to perceive the movement and position of limbs with respect to the rest of the body – , and a representation of its body to know the extent to which it can take actions."

Tesla engineers mentioned how important is it to imbue in Optimus a sense of self-reference, proprioception, balance, and coordination. Most of the features I outlined seem to be present in the next-generation Optimus version.

Of course, quality and degree of skill matter a lot. Having some vague reference of where the hands are isn’t comparable to human proprioceptive skills. Optimus – the one that could walk – seemed quite fragile to perturbations. For instance, BD’s Atlas has shown to be quite remarkable at keeping the balance under adverse conditions (even with a human pushing it back strongly enough to make it take a few steps back).

The world fights to catch our attention

"Human perceptual systems bring on too much information. The brain uses attention to decide which events or objects get preference. Optimus will need to navigate the perceptual space the same way.

The combined power of multimodal perception and attention would give Optimus a very good sense of the complexity of the world while, at the same time, allowing it to make decisions based only on the most crucial and pressing information.

But how can Optimus learn which percepts require preference? How can it decide to look left or right to search for the rock? How can it decide to fix the attention on the rock, the hand, or the feet while walking back to the basement? The neural mechanisms of attention are very intricate and not yet fully understood. How could Tesla design an artificial brain in such a way that attention to a myriad of distinct percepts is correctly assigned?"

No robot exists with anything near human-level attention mechanisms (transformer-based attention only resembles human attention in the name). Yet, without attention – understood as a hierarchization of perception – we couldn’t function.

One could argue that robots don’t need that much multimodal information in order to function well enough, but that highly depends on the setting.

Self-driving cars are bound to the limits of roads and highways, but this can’t be extrapolated to factories or homes. The more rich and more variable the immediate reality we perceive is, the more we need attention to be able to perform any given task.

Planning, deciding, acting

"This is what a fraction of Optimus’ decision-making process could look like when searching for a gray basaltic rock on Mars surface [I used this example because I argued Musk intends to use Optimus for SpaceX Mars missions eventually]:

How many steps should I take, and in which direction? Is this rock gray, blue, or purple? Should I find a smaller rock? Maybe a larger one? Should I keep my eyes on the rock so it doesn’t fall from my hand, or should I keep them on my feet so I don’t trip over another rock? Should I go slowly so I use less energy, or should I go faster so I get sooner to the basement?…

_Even the simplest order reveals the incredible amount of choices we unconsciously take at all times. Something as simple as making a coffee – which we all do every morning – is considered a test of AGI-level intelligence. We humans evaluate the options at our disposal in terms of the possibility of success and the value they would provide. When goals are ambiguous and uncertain the calculations become less precise and so we enter the realm of intuition. But can a robot have intuition?_

And then there’s the question of how to execute the plan … The number of degrees of freedom for a humanoid robot is many orders of magnitude larger [than for a car]. A 3D environment, no boundaries to where it can walk/run/jump neither in terms of direction or magnitude, and a flexible body – the head, trunk, limbs, and fingers can move with respect to both the world and each other in an innumerable amount of combinations – all require a degree of engineering only evolution has accomplished."

Tesla engineers briefly showcased Optimus’ planning abilities but didn’t go into the details of decision-making and the difficulties of making functional real-world robots that can act in the world according to their goals while overcoming unpredictable challenges.

Even if Tesla autonomous cars manage to go from point A to point B without crashing, the technology is still unreliable at the edge cases (we shouldn’t evaluate Optimus’ readiness by analyzing common situations, but extreme ones). This means self-driving cars aren’t yet completely solved.

The tech isn’t ready yet (even if Tesla has a millions-size fleet driving on US roads): There exists too much variability in the world and the same limitations that apply to cars apply to Optimus – multiplied by the increase in complexity of the scenarios it’d face.

What makes us human – higher cognition

Beyond sensorimotor features, and the related processes of attention and decision-making, I also mentioned language, common sense, and causal reasoning as critical features for a humanoid robot.

"In the case of language, the necessity is obvious because we’d want to give spoken orders to Optimus without the need for explicit instructions … Experts have found important flaws in GPT-3 and the main reason is that it lacks contact with the real world – the great weakness of virtual AIs. It doesn’t have access to pragmatics and contextual information. If I say: "Go find a gray basaltic rock," Optimus would need to know what a rock is, what’s the meaning of gray, and how to differentiate basaltic from scoria rocks. Do we know how to imbue such language abilities in real-world AI? Not yet."

Even if Optimus is great at vision (courtesy of FSD hardware and software), it doesn’t have a natural language interface yet. Musk promised last year we could give the robot orders in natural language, but that doesn’t seem to be a priority for factory robots. PaLM-SayCan is notable ahead in this regard, given that it’s powered by the most performant large language model (LLM) in existence.

Still, not even Google’s robot can overcome the challenges I described last year: Pragmatics and context are beyond reach for virtual AIs, even if those power real-world robots. To successfully combine AI and robotics so that the overall system acquires a deep understanding of the world, some pieces are still missing.

"Causal reasoning is the ability to understand that some events contribute to producing other events. For instance, if there are clouds in the sky and it starts to rain, we know that clouds cause rain and not vice versa. If Optimus is looking for gray basaltic rocks, it’d be useful to know these rocks tend to be generated by volcanos. Instead of trying to find the rocks by searching the ground, it could look for the closest volcano on the horizon."

Again, I illustrated the necessity for causal reasoning using the Mars story. But this extrapolates to any real-world setting, be it a Tesla factory or your home. Of course, Optimus isn’t near to achieving this – as isn’t any other form of SOTA AI system.

Turing Award winner Judea Pearl has been a long-time advocate of imbuing machines with an understanding of cause and effect to achieve human-level intelligence (although that isn’t necessarily Tesla’s goal with Optimus).

Commonsense reasoning is present in daily situations. We’re constantly applying knowledge that’s shared by all people. If we’re cooking, we know the pan is hot and we shouldn’t touch it. If it’s raining, we’d get wet outside unless we carry an umbrella. If a car is coming fast, we shouldn’t cross the road.

Professor Gary Marcus, deep learning pioneer Yann LeCun, and others have been very vocal about the need to teach AI common sense – knowledge commonly shared by people. Robots don’t have common sense. They don’t grow in our culture and society and don’t enjoy the evolutionary endowment that provides us with an intuitive understanding of how the dynamics of the physical world work.

Conclusions

Calling Optimus an autonomous robot is a stretch in any sense of the word. Humans – who are truly autonomous – have developed that autonomy throughout millions of years of evolution. Making an autonomous general-purpose humanoid robot is probably the greatest challenge AI could face right now (besides AGI, although I suspect both are largely entwined).

Even if Tesla engineers aspire to build a biology-inspired AI-powered robot, they won’t manage to create something that "it’s going to do everything a human does," as they boldly claimed. Optimus isn’t comparable to humans at any level.

Regardless of the clear limitations (some regarding Optimus and some about robotics in general) I’m still impressed by what Tesla showed. Building a walking robot is hard if you only have half a year to pull it off. But we shouldn’t just analyze these results in relative terms, And, in an absolute sense, what Tesla showed is, although more than expected, less than required.


Subscribe to The Algorithmic Bridge. Bridging the gap between algorithms and people. A newsletter about the AI that matters to your life.

You can also support my work on Medium directly and get unlimited access by becoming a member using my referral link here! 🙂


Related Articles