The world’s leading publication for data science, AI, and ML professionals.

8 Hows of Augmented Reality

What will it take for AR to be widely accepted?

Building Blocks / Breakdown / Interpretation

This article will examine the "Hows?" that empower the "Whys?" listed in my last post – 10 Whys of Augmented Reality. Highly recommend refreshing your memory with visual scenarios from the previous post to better understand the concepts listed below.

Disclaimer: This is an interpretation inferred from my past experiences working with products revolving around Extended Reality(XR):

Taking an outside-in approach, the obvious consumer requirements for an ideal AR device to be widely accepted:

  • ergonomic form-factor (lightweight, prolonged daily use)
  • easy to use and develop on
  • augmented content seamlessly blending in the real environment

To get here, what do we need?

1. Augmented Content Staying-In-Place (Local SLAM & Sensors)

With the slightest movement in our real vision, an augmented floating panel should stay fixed just like how our couch stays fixed and we only see a different perspective of the couch as we move around.

Let’s do an exercise – Look first to your left, then to your right and close your eyes shut. With your eyes closed, move a little to ensure you are away from the initial observation spot. Without opening your eyes try to imagine the perspective of the space from the new spot? Finally open & compare. Close enough?

Our ears work to estimate our motion constantly, even with your eyes closed in a new space you can roughly estimate the motion. When our eyes are open, our sense of motion is more accurate.

IMU (Gyro, Accelerometer) substitutes the ears, and two or more ultra-wide-angle high refresh rate cameras substitute our eyes. A lightweight sensor-fusion based SLAM algorithm combining the input from the two technologies make up the capability of our brain to estimate our motion in the space around us.

We don’t need to understand the entire space (map) around us – just acknowledge our micro-movements. The farther we move, the more errors we might accumulate over time. Here our memory comes into play and this capability is addressed in the next section.

2. Associate Content with the Environment (Global SLAM)

We also need a robust computationally expensive slam that maps the individual spaces we are currently in to associate a mental map periodically.

To minimize the compute on our lightweight wearable device and extend the battery life we can correct ourselves with a cross-check against the sparse (not dense) map stored and processed elsewhere (remotely). Like a map in shopping malls that resides in the mall, and we don’t need to carry it back to our homes.

The vision sensors can send data periodically to an external device to know where we are and correct our local understanding of the space.

Think lightweight local slam (on-wearable-device) vs global slam (like google maps) running elsewhere that stays secured on-premise of spaces we are in. To address privacy, global maps are maintained by space owners – e.g. malls, stores, private homes, or offices that AR devices switch and connect to just like how we access wifi at Starbucks.

3. Perceive the world in 3 dimensions (Depth-Sensing)

We naturally sense the depth – how far something is, how big or small something is. Let’s do another exercise.

Close both the eyes, then open one eye while holding your thumb right in front of your nose. Observe the size, orientation and 1-dimensional thumb. Then open the other eye and notice the difference. What do you see?

With the parallax combined from your left-and-right eyes, we sense the depth enabling us to see the world in 3-dimensions. AR devices also need this capability to estimate the angle and occlusion of objects – what’s in the front, what’s behind, and how far something is; to interact intuitively in our 3-dimensional world. Think floating panels that are glued to your walls or sitting on top of your kitchen countertop.

The notion of 3D reconstruction of the environment around us is computationally expensive and can be offloaded elsewhere. AR devices can still display augmented content compromising the believable factor without this capability, making it fall in between must-have and nice-to-have.

4. Interact with the Augmented Content (Input Tracking)

While there is no limit on the technology itself, ranging from eye-gaze tracking, touchpad on the device, voice-based input to input controllers, it is our right to use what most of us are fortunately equipped with – 2 hands!

For us to track our flexible far-reaching hands seamlessly we can leverage the existing wider-field-of-view cameras used for SLAM. We can simplify the computation for real-time usage to:

  • Depth-based Occlusion Maps (for rendering and physics collision) without the need to fully track hand skeleton joints (20).
  • Track the 5 figure tips for contact points and detect multi-frame gestures

5. Development and Rendering Engines (Unity & Unreal)

The two leading mobile development platforms are Android (Java, Kotlin: SDK & C/C++: NDK) and iOS (Objective-C & Swift). AR is a visual medium where app logic is married to 3-dimensional rendering components, this is where game engines provide the functionality of writing logic taking physics and lighting into account. They also provide an abstraction hiding the details of underlying low-level graphics (OpenGL, DirectX, Vulkan) for the Graphics Processing Unit (GPU) and Operating System calls on a CPU (Linux, macOS, Windows). A developer can eliminate the nitty-gritty details of the vertices, triangles, and mini-programs that model light, darkness, transparency, and color (Shaders) required to render a computer-generated image and focus purely on the application experience.

The more the compute on a wearable AR device, the heavier, power-consuming, and heating will occur. Leveraging the benefits of offloading compute anywhere and everywhere with robust connectivity from wifi6 & 5G we can reduce an AR device to an encoder that sends sensory information to process elsewhere, a decoder that decodes content returned to display.

Imagine the freedom for developers to deploy AR apps in any existing AppStore that runs on any capable device, but the result is displayed on the AR device wirelessly.

6. Non-App Logic Components as a Remote Service (Microservices)

In the previous section, we essentially addressed the front-end (e.g. web page of a browser), what about the back-end (e.g. storage, algorithms)? With the popularity of containerized micro-services running in orchestration systems like Kubernetes or running simply in docker on remote systems the ecosystem can still innovate on SLAM, AI, and other computer vision algorithms as a micro-service submitting only the required results to the rendering engine for user-experience focused app logic.

Hypothetical Scenario: imagine the pace of AR innovation if all the industry players have the freedom to innovate and evolve the components they are already good at building? Display, CPU, GPU, OS, VPU, Wireless, and others can all leverage their existing ecosystem and markets to deliver to the common AR consumer product.

7. Augmented Display Technologies

This is the hardest part of AR – cost, power consumption, display resolution, brightness, and field-of-view! It will take another lengthy article to explain the current state of WaveGuide (Holographic, Diffractive, Polarized, and Reflective). Here is a well documented medium article by Kore on this topic:

https://medium.com/hackernoon/fundamentals-of-display-technologies-for-augmented-and-virtual-reality-c88e4b9b0895

Note: There is more innovation happening in this space e.g. Ostendo, nreal, nueyes

8. Enhanced AR with 3rd Party Technologies

A significant chunk of the ideal AR experience can be enabled by working with other technologies that cannot run on the device itself.

  • AI: From GANs that can generate dynamic images in real-time (visualizing our imagination, deepfakes and virtual avatar agents) to networks that can learn our actions to adapt, recommend and optimize the augmented content for our day-to-day tasks.
  • IoT: Smart spaces with sensors continuously generating data that can be analyzed elsewhere, and contextualized information can be augmented in our vision to make decisions faster (predictive maintenance, traffic stoplights, autonomous guided vehicles, and robotic arms)
  • Cloud Gaming: This can be extended to consuming OTA streaming content directly on the AR device.

Conclusion

You might notice a pattern in the above approach, where 5g (URLLC, private 5G in enterprise spaces), wifi6 (multi-user in small personal spaces), IoT (smart city/factory/workplaces/homes), distributed computing (remote processing from edge to cloud), algorithmic capabilities delivered as scalable microservices, rendering engines for application experiences (Unity & Unreal) all work together to deliver an ideal AR experience. The technology is already converging to answer "what it takes for AR to be widely accepted?", but again this is just one of the many perspectives of "how do we get there?"

Leaving you with my (2020) DIY early prototype of Raspberry Pi 4 + lightweight T265 SLAM+ Pepper’s Ghost effect for AR Display + Wifi + power bank + remote processing on Win10 Unity + WSL docker microservices hoping to unlock possibilities summarized in my last post 10 Whys of Augmented Reality.


Written By

Topics:

Related Articles