The Magic Zoom Green Screen

A harbinger of the wave of change

Anthony Chaudhary
Towards Data Science

--

It’s 2021 and it’s now possible to easily get my background automatically blurred on a zoom call without a green screen.

This reduces distraction — no one can see my messy workspace! It improves equality — removes bias around background- and makes it easier to focus on the people I’m meeting.

Me on a call. More examples from zoom.

This is a vast improvement from the green screen I painted in the basement. Even with that, custom lighting (soft-boxes) and more, I remember it would take hours of Adobe After Effects work to get a single video clip “keyed” out.⁰ What a pain! Now anyone can do it on my mobile? Wow!

This is not an isolated case. I can speak into my phone and be understood extremely well by google. Advanced driver assist systems like our Toyota Rav4 or a Tesla easily track lanes and soon much more.

How is this possible? What’s behind all of this?

You may have already heard of machine learning — also known as artificial intelligence — where a computer application is automatically created based on a training dataset.

This is not only about machine learning though.

It’s about training data. Training data is what comes before (and after) machine learning.

What is Training Data really?

At it’s core training data is an easier way to encode human understanding into a machine. Combined with machine learning, the cost to “copy” narrow human understanding approaches zero.

Photo by Andrew Wulf

Wait hold the phone — did you just say the “it will be nearly free to copy narrow human understanding?”.¹

Yes.

I’ll cover all that but first, let’s talk about what understanding means.

I use the word `understanding` to differentiate between something that can be written down in a book, and something that up and till now, could only exist in the human mind.

Now machines too can form `understandings`. This is what unlocks previously impossible use cases.

Narrow Understanding — Division of Labour

This technology is very much here and being used to day — that’s why I say “narrow” understanding.

Keep in mind that understanding here does not mean human level intelligence. A computer can “understand” what an orange is at the grocery store, well enough to automate the check out. This does not mean the computer knows anything about anything else. It’s a narrow understanding.

As an Analogy consider Adam Smith’s division of labour — it’s basically the same thing — at least in the productivity sense.

What’s the impact?

The impact of this over the next 50 years will be as dramatic as the impact of computers. We are at the very start of a very steep slope.

I use the word `understanding` to differentiate between something that can be written down in a book, and something that up and till now, could only exist in the human mind.

Previously Impossible Use Cases Become Possible

Specifically, the impact of being able to copy human understanding at virtually no marginal cost is that previously impossible use cases become possible.

For example here is automatic cancer detection from a smartphone³. Something that was previously only available with a trip to a dermatologist.

Example of a “Dermatologist in your pocket”. From public domain paper.

Green-screen free background removal, advanced driver assist features, incredible sports analytics, a “dentist in your pocket”, the list is endless.

As a thought experiment: Imagine sitting and watching a video call. You are asked to in real time draw pixel perfect boundaries of background and foreground. Sounds pretty tough right?

But — we can pass our understanding of what a “background” is to a machine, in a more relaxed setting, and it can repeat it.

This is fundamentally different from prior automation approaches that were based on rule driven logic.

Scarce Resources will Become more Abundant

Second, that previously scarce resources, like a radiologist time become abundant, making what was once a luxury resource as common as checking your social media feed. Imagine anyone with a smartphone having access to a version of the absolute best medical experts in the world.⁴

More generally, the frequency of use of existing knowledge can be increased by orders of magnitude. While a real time green screen free background removal was impossible at any cost, many people get medical imaging done. But now, the frequency of that existing knowledge is increased by orders of magnitude. This isn’t just for any one domain, it’s completely across sectors.

Photo by Joonyeop Baek

Second Order Effects of Increased Frequency

Consider a manual complete visual inspection of Golden Gate Bridge may only be able to happen once a year or once a decade. Soon, an analysis of a similar level will be able to happen every few seconds. Having access to this level of data will unlock second order effects, for example tracking the rate of decay, how weather events effect it, etc.

The list of impactful, new use cases of training data is growing rapidly. Human supervised training data is the way to make the impossible possible.

The machine learning side has it’s own challenges, but in a sense, it’s main goal is simply to accurately reflect the training data in similar situations.²

In other words, the control, the real human supervision is the training data, the machine learning is the repetition.

What’s the catch?

Training data is not a cure-all. Some of the use cases I mention here will take years to come to fruition.

Now say you have a use case that’s a great candidate for training data — which is pretty much anything involving expert or domain specific knowledge)— there are a few costs to do the initial building of it and there are maintenance costs.

The biggest problem is the initial and ongoing human supervision — also known as data annotation or data labeling.

From administrative concerns like collecting, sending, and receiving data. Through to philosophical concerns like what the Schema (Ontology) should be. Then there’s the task management of actually getting experts to transfer their knowledge into annotation tools like Diffgram.

There will always be a need for some level of human supervision as time goes on — becuase if someone really truly solves it, they have essentially invented Artificial General Intelligence (or maybe even Super-Intelligence).

That said many trends, such as machine learning improving, per use costs decreasing, automations like userscripts, etc. point to it being easier and easier to go from dataset to working system with less and less data.

More on Userscript Data Annotation Automation.

Our open source annotation and data science project Diffgram is our first step along a long journey. We are actively working to advance and grow this area. To make the technology more accessible.

For example, the new import wizard, makes it so anyone who can fill a form can do the previously “software engineer only” step of connecting an existing machine learning application to Diffgram for further human supervision.

The ritual of education and on the job training is as old as time. Yet we have very few rituals around training machines. That’s where we at Diffgram build products, around this new ritual of training machines. Of humans supervising machines, of teaching machines to understand the world.

Anthony

Appendix

This article contains forward looking statements and opinons.

⁰ Note that professional level green screen includes different concepts from video call based stuff (eg color matching, orientation, quality, etc.). Over time I suspect this Pro Level stuff will be automated in a similar way.

¹ Technically this is the “higher dimensional space”, in which we are unable to reasonably represent visually. Many machine learning models having hundreds of dimensions while we can only reasonably graph 4 (space (x,y,z) and time (t))).

Does a higher dimensional space == understanding? My opinon is that it’s as close as we can currently get. What is understanding if not something that we can’t represent in 4 dimensions?

² Many parts of machine learning are incredibly difficult. This is not to take anything away there. Simply that as methods improve, and the many aspects of training and validation become more standardized and automated, the control will shift to the subject matter experts for many domains. The same way that the definition of “programming” and “computer user” has shifted over time.

³ As a counter point, papers like this are skeptical about the sample size of some of the “skin cancer detection from mobile phone” things. While it’s true that some of this is very new and has a lot of hoops to go through, it’s equally clear that the fundamental technology exists, is working it’s way into useful applications, and that there are many trends that are likely to converge to overcoming issues and hiccups. Recent forbes walkthough

--

--