How to make measure theory usable for your problem?

A (deep) dive into the Lebesgue measure & probability distributions

Leonard Schuler

Published in

Towards Data Science

18 min readOct 31, 2020

One evening a data scientist, an engineer, and a mathematician meet in a bar.

They complain to each other about their problems at work.

The engineer says, “I have this huge oddly-shaped container for which I need to know the volume. Of course, I wanted to do the obvious thing and fill the entire container with water and then do some weighing to get the volume. Sadly this is a tad expensive to set up because of all the sealing and the expensive materials involved. That is why my boss rejected the idea. If only he would deny himself his yearly bonus… Anyway now I have to fill up the entire container with boxes and approximate the result.”

“Wouldn’t that be very inaccurate though? I mean how will you fill up all the nooks?” asks the mathematician uncomfortable with the idea of such practicabilities.

“Maybe, but it should be good enough. If not I can try modeling it using a computer where I can approximate the volume of the container much more precisely.”

“And how does a computer do that?” interjects the mathematician.

“With the computer, I can place much smaller boxes into the container and can approximate the container as closely as I want,” the engineer replies.

Unconvinced the mathematician asks, “But how do you know that your approximation in such a way is truly getting better when you decrease the volume of the boxes in the container? If you repeat that approach forever will you really get the true volume of the container? This sounds very much like a transition from finite approximation to infinite approximation. Is that even allowed in this situation?”

“Well, technically the computer does not work with infinity. To entertain your thought though. If I understand what you are saying correctly, you are asking: Does my approximation approach the true volume of the container if I repeatedly reduce the size of the boxes I put into the container and always fill the container as much as I can, correct?”

“Almost,” the mathematician clarifies. “What happens if you do that forever or better said (countably) infinitely many times. Does the combined volume of the boxes approach the true volume of the container? I mean think about it. Maybe you have round sections in the container. No box will ever truly capture the entire volume there. Sure, you can always make the box smaller and fit more boxes in. But it does not matter how many times you decrease the size of your boxes, they will still never capture the round shape completely. So what happens if you would truly do that infinitely many times? Can infinitely small boxes capture round shapes and other weird nooks in your container and give a correct value for the container’s volume?”

“That sounds so mathy,” the engineer smiles. “I believe that it will work. Just visually speaking. But you are the mathematician. You tell me.”

“Hm, you are right. This is right up my alley. Let me get back to you on that. I am sure measure theory can help answer that question somehow.”

Leaving the mathematician to his thoughts the data scientist starts talking, “you guys won’t believe what I did today. Do you remember in stats class where we always had to calculate the probability of the height of a person being between two values? Guess what. I actually had to do that today.”

“No way. I always thought those examples were so forced,” the engineer exclaims.

“Exactly what I thought. Our statistic professors were on to something when they picked that example. So today I redid those calculations and it was so nostalgic. I even pulled out my old standard normal cdf lookup sheet to determine the probability.”

“Nice. I would have totally done the same thing,” the engineer agrees.

“Oh I remember that,” the mathematician says. “isn’t it weird that we can just integrate a density function like that and get a valid probability measure out of it. I mean why is that even possible?”

“You always with your questions,” the engineer smiles.

“Why wouldn’t it be?” the data scientist continues. “I mean the integral over the entire density is 1 and partial integrations always return values between 0 and 1. That seems like a pretty open and shut case to me.”

“For simple applications like that I agree,” the mathematician says. “But don’t you use it in many more ways than just that? Let’s take for example the expected value of the product of two independent random variables.”

“But that is simply the product of the individual expected values,” the data scientist interjects.

“Right. The proof uses Fubini-Tonelli to split the expected value over the product of the two random variables into expected values of the individual random variables. But as a requirement, the two random variables have to follow a true measure-theoretic probability measure. But why does the integral of a density function define a proper measure-theoretic probability measure?”

“Hm… weird,” the data scientist says. “I have never thought about that before. Wouldn’t that problem apply to almost all results of probability theory?”

“Yeah exactly.”

“Isn’t it strange that such an important detail was never mentioned in any of my lectures?” the data scientist muses.

“Tell you what,” the engineer says to the mathematician. “Why don’t you take a day and find some answers to those questions? Then you can tell us if boxes really can approximate round shapes and why integrals over density functions really result in proper probability measures.”

“This is a great idea,” the data scientist says. “I am very interested in the answer. Let’s give him a further incentive. If you can explain it to us in an understandable manner we will pay for your drinks the next time.”

Delighted by this challenge the mathematician leaves his friends and returns home.

After sitting down at his desk the mathematician first looks if his hunch about measure theory can help solve his two questions.

To do that he opens his old dusty measure theory book¹ and starts reading.

After some time the mathematician finds two infinite approximation schemes that positively answer the question if the engineer’s container can be approximated by infinitely many boxes.

The first scheme resembles the approach the engineer proposed. Under this approach, one needs to find a way to add more and more simple objects into the container and fill the entire container after infinitely many repetitions. Doing that measure theory then guarantees that the sum of those infinite individual volumes is equal to the volume of the entire container. That means it is possible to calculate the volume of the container exactly if the engineer can find an infinite strategy to do so.
Alternatively, the engineer could also encase the entire container in a box and start chipping away small shapes. If he again finds a way to reveal the container after removing things infinitely many times then he can calculate the volume of the container exactly, by subtracting the infinitely many removed volumes from the encasing box volume.

The approaches only work though if a box measure is a proper measure under the definition of measure theory.

Sadly it is generally not possible to prove the properties of a measure-theoretic measure directly. A proper measure-theoretic measure must be able to measure a huge infinitely large range of objects. Additionally, it has to fulfill certain properties. Among others, it has to fulfill the above schemes.

Fortunately, measure theory provides a way to extend a simple measure to a proper measure-theoretic measure in a reasonable way that fulfills all those properties and many more.

The following list describes what needs to be proven for a simple candidate measure to make it into a proper measure-theoretic measure.

A set (without any properties) is needed.
The candidate measure needs to be defined on sets that are subsets of the set in 1. only.
The candidate measure needs to be always greater or equal than zero but can be infinite.
The empty set needs to be measurable and its measure must be equal to zero.
If two sets can be measured by the candidate measure their intersection needs to be measurable by the candidate measure as well.
If we remove one measurable set from another measurable set the new set must be writable as a union of finitely many other measurable sets that are disjoint to each other. Note that the new set does not need to be measurable by our candidate measure.
If two measurable sets are disjoint, i.e. there is no point in both sets, and the union of the two sets has a measure defined by the candidate measure, then the measure of the union must be the same as adding the measurements of the individual two sets together.
The set in 1. must be writable as a (countable) infinite union of measurable sets whose individual measurements are not infinite.
The measurement of any measurable set that is a subset of any (countable) infinite union of individually measurable sets needs to be smaller or equal than the infinite sum of the individual measurements.

To get the benefits of measure theory for the box volume the mathematician starts to work his way through the checklist.

As a first step, he defines his box measure mathematically.

A box in three dimensions can be written as a set:

where w₁ ≤ w₂, h₁ ≤ h₂, d₁ ≤ d₂ are all real numbers.

To calculate the volume of the cube, one simply needs to multiply the width, height, and depth of the box.

width := w := w₂ - w₁
height := h := h₂ - h₁
depth := d := d₂ - d₁

The volume measure is denoted by λ to honor the mathematician Lebesgue.

As such our mathematician will use λ for his box measure as well.

He comes up with the following definition for the box volume measure.

He then proceeds to check the items on the list for his box measure.

1. The basic set is the set of real numbers in three dimensions.

2. Every box is by definition a subset of that set.

3. Every box has a volume greater or equal than zero.

4. His definition is currently not able to measure the empty set. 4. requires that the empty set needs to have a box volume of zero, i.e. λ({}) := 0.
This makes sense as the empty set can be seen as measuring the volume of nothing.

5. He needs to show that the intersection of two boxes is a box as well.

If the two boxes do not overlap their intersection is the empty set, which he can now measure.

If two boxes overlap, it is not too difficult to visualize that they define a new box.

A bit more mathematically. For any two lines, the following is true.

[a₁, a₂] ⋂ [b₁, b₂] = [max{a₁, b₁}, min{a₂, b₂}]

This is true for intersections in three dimensions as well. This clearly shows that the intersection of two boxes is indeed a new box.

6. Here he needs to show that if he removes one box from another box, that the remaining box can be written as a union of finitely many non-overlapping boxes.

Sadly his current approach identifies a problem with his definition.
Looking at the following example he realizes his mistake.

[1,3]³\[2,4]³ = [1,2)³

This is not a valid box under his definition because he defined boxes to be closed boxes.

Fortunately, he comes up with a simple solution. He just defines a box to be left open and right closed

B := (w₁, w₂] × (h₁, h₂] × (d₁, d₂]

and keeps the definition of his box measure the same.

With this definition, the problem from before is solved:

(1,3]³\(2,4]³ = (1,2]³

In this case, removing a box from another box produces a valid new box.

After formally testing the various combinations of how a box can be removed from another box and seeing that the new objects can all be written as a finite union of other boxes he proved that 6. is true.

Unfortunately, the mathematician now has to recheck the previous 5 items.
Luckily the same arguments are true for the new box definition as well.

7. He needs to show that the volume of the union of two disjoint boxes that is a box itself is the same as the sum of the two box volumes individually.

That is easy. Merging two disjoint boxes can only become a valid box if they can be fused at one face without any overlap. That means two of the three dimensions (width, height, depth) stay fixed. Assuming without loss of generality that the boxes share the same width and height configuration. For two such boxes he gets (⊍ means disjoint union):

λ(B₁ ⊍ B₂) = w · h · (d₁ + d₂) = w · h · d₁ + w · h · d₂ = λ(B₁) + λ(B₂)

This checks out as well.

8. Here he has to prove that the entire three-dimensional space can be written as a (countable) infinite union of boxes of finite volume.

Luckily the boxes do not have to be disjoint and the boxes can contain themselves. He can therefore simply start with a simple box around zero and just let the box grow evenly in height, width, and depth.

Formally:

and

$$\lambda((-n,n]³) = (n — (-n))³ = (2n)³ < \infty$$

9. The last step is the key step to allow the box measure to work with infinity.

The mathematician has to show that if he has a box that is covered by infinitely many boxes in some way, then the volume of the covered box is smaller or as large as the sum of the volumes of the covering boxes.

Formally this can be written like this. Let

$B, B_1, B_2, \ldots \subset \mathbb{R}³$

be boxes with

$B \subset \bigcup\limits_{n=1}^{\infty} B_n$.

Then he has to show that:

$$\lambda(B) \le \sum \limits_{n=1}^{\infty} \lambda(B_n)$$

To do that he has to establish some more properties about the box measure.

(i)
He has to show that if a box is contained within another box, then the outer box has a bigger volume than the inner box.
This follows directly from the definition of the box volume.

(ii)
Using measure theory he can then conclude from (i) together with 1. — 7. that if the box B is covered by only finitely many boxes then the volume of the box is smaller or equal than the sum of the volumes of the finitely many boxes. Showing (i) in addition to 1. — 7. secures this property for any other basic measure candidate and is not limited to the box measure.

The proof of that is heavily centered around 5. and 6.
Using those two properties measure theorists can remove boxes and intersect boxes and get a representation of the original box as a finite disjoint union of boxes. Using 7. they can then calculate the box volume as a sum of those individual box volumes. Applying (i) they then show that certain boxes from the composition have to lie in (potentially) bigger boxes of the covering boxes from which they can conclude that the sum of the individual volumes has to be bigger. A bit technical but luckily others have done that part already.

(iii)
Now that he knows that the claim is true for finitely many boxes, the mathematician now tries to reduce the cover from infinitely many boxes to finitely many boxes.

To do that he has to use a common trick from topology. He shrinks the inner box slightly to make it compact. At the same time, he slightly expands the covering boxes to make them open. He then has a cover of a compact box consisting of an infinite number of open boxes. Due to the compactness of the inner box, he then knows from the (topological) definition of compactness that there is a way to choose finitely many boxes from those infinitely many boxes that still cover the compact inner box.

Due to continuity of the box volume he can control how close the compact inner box is to the true inner box and how close the outer open boxes are to the covering boxes. He can thereby reduce the problem from infinitely many covering boxes to finitely many covering boxes for which he knows (see (ii)) the claim to be true.

Written down more formally. Let

B := (a, b] := ( a₁, b₁ ] × ( a₂, b₂ ] × ( a₃, b₃ ]

and similarly

Bₙ := (aₙ, bₙ] := ( (aₙ)₁, (bₙ)₁ ] × ( (aₙ)₂, (bₙ)₂ ] × ( (aₙ)₃, (bₙ)₃ ]

With that he wants to choose δ and δ(n) such that:

$[a + \delta, b] \overset{!}{\subset} (a, b] \subset \bigcup\limits_{n=1}^{\infty} (a_n, b_n] \overset{!}{\subset} \bigcup\li

He chooses

$\delta = \left( d, d, d \right), \,\delta(n) = \left( d_n, d_n, d_n \right)$

to be three-dimensional vectors with equal components greater than zero.

Due to the definition of compactness he then gets:

$[a + \delta, b] \subset \bigcup\limits_{k \in K} (a_k, b_k + \delta(k))$

where K is a finite subset of the natural numbers.

The box measure only allows calculating the volumes for boxes that are left open and right closed. But that is fine as (a + δ, b] ⊂ [a + δ, b], and

$\bigcup\limits_{k \in K} (a_k, b_k + \delta(k)) \subset \bigcup\limits_{k \in K} (a_k, b_k + \delta(k)]$

and therefore

$$d \in \mathbb{R} \mapsto \lambda\left((a + \delta(d), b ]\right)$$

and

$$d \in \mathbb{R} \mapsto \lambda\left((a_k, b_k + \delta(k, d)]\right)$$

are continuous in d he can make the volumes arbitrarily close to

by controlling d and dₙ.

Overall he arrives at the following for any ε > 0:

Now he lets ε approach zero and he is done with proving the checklist.

As the box volume measure fulfills all the required properties it can be uniquely extended to a volume measure of all “Borel-measurable” sets.
Practically that means, the volume of any real-life object can be exactly measured using a clever covering of infinite boxes as outlined by the two schemes above.

As a bonus with the help of some simple linear algebra, it can be shown that the volume behaves nicely under invertible matrix transformations.
Multiplying every point of a set with an invertible matrix results in an object whose volume is the same as multiplying the original volume with the absolute value of the determinant of the matrix. Using that it is now very simple to calculate the volume of any transformed object that was flipped, rotated, or stretched in any dimension as long as the original volume is known. That is why the volume of a ball grows cubic with the radius for example.

There are many more properties that measure theory guarantees and that can be picked up from any measure theory book after skipping the first 50–70 pages as they mostly cover the necessary proofs to allow this process of checking some properties.

Onwards towards the data scientist. The mathematician wants to check that the integral over a continuous density function truly defines a probability measure under the notion of measure theory. To do that he will walk through the checklist once again. The second iteration will be much faster as the last property of the checklist is almost identical to the proof of the box measure.

He first defines the basic measure that he wants to elevate to the measure theory world.

Let

be a continuous density function.

The candidate probability measure is then defined as

with

$\bold{P}((a,b]) := \int_a^b \! f(x) \, \mathrm{d}x$

This time the mathematician directly defines the basic objects to be left open and right closed in anticipation of item 6.

And here he goes again.

1. The basic set is the set of all real numbers.

2. The sets are all subsets of the real numbers.

3.) Due to the mean value theorem

$\bold{P}((a,b]) = \int_a^b \! f(x) \, \mathrm{d}x = \underbrace{f(x_0)}_{\ge 0} \cdot \underbrace{(b-a)}_{\ge 0} \ge 0$

for an x₀ ∈ (a,b) as f is continuous.

4. From the definition of the integral it follows P({ }) = 0

5. As before (a,b] ∩ (c,d] = (max(a,c), min(b,d)] which is in the domain of P.

6. As anticipated (a,b] \ (c,d] is also in the domain of P as it is always an interval of the form (x,y].
To see that there are four easy cases to check.

7. The integral is partition independent, therefore (⊍ means disjoint union)

$\int_{(a,b] \dot{\cup} (b,d]} \! f(x) \, \mathrm{d}x = \int_a^d \! f(x) \, \mathrm{d}x = \int_a^b \! f(x) \, \mathrm{d}x + \

8. As before the real numbers can be written as a union

and

$\int_{-n}^n \! f(x) \, \mathrm{d}x = f(x_n) (n — (-n)) = f(x_n) \cdot 2n < \infty$

9. The proof is analogous to the proof for the Lebesgue measure.

(i)
The integral is monotone as f is greater or equal than zero.
From measure theory, we then know that the claim is true for finite sums.

The fundamental theorem of calculus states that

$F(x) := \int_a^x \! f(t) \, \mathrm{d}t \mbox{ and } \tilde{F} := \int_x^b \! f(t) \, \mathrm{d}t$

are differentiable and therefore continuous in any interval [a,b].

From that, it follows that the intervals (a,b] and (aₙ, bₙ] can be shrunk and expanded in a controlled fashion, to create a compact set and an infinite cover of open sets for which a finite cover must exist. This way the infinite claim can be reduced to a problem with finitely many summands.

The claim then follows using the same calculation as before.

As f is a density function we know that

$\int_{-\infty}^\infty \! f(x) \, \mathrm{d}x = 1$

P can therefore be extended to a probability measure under the measure theory umbrella. All theorems from probability theory and measure theory are therefore usable for probability measures gained from density functions.

After coming up with the answers to his questions, the data scientist, the engineer, and the mathematician meet again in the bar and order their drinks.

“Did you find any answers to the questions you had yesterday?” the engineer asks the mathematician.

“Yes, I did. I created a simple checklist of properties that need to be checked for a candidate measure to become a proper measure under the measure theory framework,” the mathematician replies.

“Nice,” the data scientist says. “And I assume you went through that checklist to solve the two questions?”

“Exactly,” the mathematician agrees enthusiastically. “I first started with a simple box volume measure. Going through the checklist I had to refine my definition twice but was, in the end, able to prove the properties from the list. So it is indeed possible to exactly calculate the volume of the container if you can infinitely refine the boxes you put into the container.”

“Nice,” the engineer replies. “So infinities aren’t an issue anymore?”

“Countable infinities aren’t. Uncountable infinities are another issue though. By the way, I also found another way to solve the container volume problem.”

“I am always open for practicable ideas,” the engineer exclaims.

“Well this one is practicable,” the mathematician continues. “Instead of filling the container with boxes, you can also put the container in a box and fill the space between the container and the outer box with small boxes. In the end, you can simply subtract the individual box volumes from the outer box volume.”

“Wait that sounds like a great idea! Today my boss vetoed the idea of filling the container with boxes. The materials on the inside are too valuable apparently. Your idea solves that problem. Why didn’t I think of that!? Let me go over to my boss and pitch him the idea. Thanks!” and with that, the engineer leaves the bar.

“I haven’t seen him this excited in a long time,” the data scientist smiles. “So what did you find out about the distribution function? Did you reuse your checklist?”

“Yes,” the mathematician explains. “Even better I could copy the most challenging proof almost exactly from the box measure to the distribution function. That made it very simple to go through the checklist as well. Here take a look.”

The mathematician hands over his notes to the data scientist who starts looking it over.

“Wait a moment,” the data scientist says. “This is so intuitive. Reducing the infinite summands to finitely many using compactness is so clever. Wait a moment. Maybe I can use that trick to solve the problem that bugged me the whole day at work? I have to go and check something. Great work, you did very well. See you later. I’ve got to go.”

“Wait,” the mathematician says, but it was too late. Both his friends already left the bar. And the barkeeper is approaching his table. Why did he have to leave his wallet at home?

[1] Achim Klenke, Wahrscheinlichkeitstheorie (2020), Springer

How to make measure theory usable for your problem?

A (deep) dive into the Lebesgue measure & probability distributions

Written by Leonard Schuler