Camera Intrinsic Matrix with Example in Python

Part 3 of the comprehensive tutorial series on image formation and camera calibration in Python

Neeraj Krishna

Published in

Towards Data Science

8 min readJan 27, 2022

Introduction

In the previous article, we learned about camera extrinsics which is to view the world from the camera perspective. In this article, we’ll see how the image is formed by the camera and learn about its intrinsic parameters.

Projection of a point

The fundamental idea of image formation is to capture the projection of a point onto the image plane of the camera. The pixels in the image correspond to projections on the image plane. Remember, the image plane is like a film that captures the light rays bouncing off of points. Let’s see how this works:

In the above figure, the camera center is located at the origin 𝑂, and the image plane is at a distance 𝑓 from the origin towards the -ve Z-axis. 𝑓 is called the focal length and is usually known for a camera. The projection of the point 𝑃 onto the image plane is 𝑃′. The coordinates of 𝑃 are (𝑥, 𝑦, 𝑧), and the coordinates of 𝑃′ are (𝑥′, 𝑦′, 𝑓). Our goal is to find the coordinates of 𝑃′.

from the figure,
      △OMP and △OO′P′ are similar triangles.
⟹    x′/x = y′/y = f/z
⟹    x′ = x ∗ f/z and y′ = y ∗ f/z

We’ve found the coordinates of 𝑃′. From the above equation, we can see that as the point 𝑃 moves away from the camera, its 𝑧 coordinate increases and its projection is going to get smaller. So, the farther an object is from the camera, the smaller it’s going to appear in the image.

To get the pixels in the image, we simply take the projection coordinates, discard the last dimension and plot the points.

For example, we have found the coordinates of 𝑃′ as (𝑥𝑓/𝑧, 𝑦𝑓/𝑧, 𝑓). Its image coordinates will be (𝑥𝑓/𝑧, 𝑦𝑓/𝑧). Let’s represent the image coordinates as (𝑢,𝑣), Then:

(u, v) = (xf/z, yf/z)

Alright, we’ve seen how an image is formed by the camera. So are we done with camera intrinsics? No. In the real world, things don’t go as expected, and other parameters affect the formation of an image.

Let’s look at each of them.

Parameters affecting image formation

Scale

When you buy a camera, its adjustable focal length will be given in its description, and it’s usually given in mm, but the space you’re working on might use different units like pixels. So we need to incorporate a scaling factor that normalizes the units.

(u, v) = (𝛼 *x/z, 𝛼 * y/z)

Here, you can think of 𝛼 as the scaled focal length or the conversion factor.

Rectangular Pixels

Ideally, we assume the pixels are square, but in the real world, they can be rectangular with different heights and widths. Because of this, we need to incorporate separate scaling factors for each of the dimensions.

(u, v) = (𝛼 * x/z, 𝛽 * y/z)

Here 𝛼 is the scaling factor for the width dimension and 𝛽 is the scaling factor for the height dimension.

Offset

The perpendicular line from the camera center to the image plane is called the optical axis. The point where this axis intersects the image plane is called the optical center. Usually, the optical center and the origin of the image plane coincide with each other, but in the real world they may be apart as shown in the below figure:

Optical center and Origin may not coincide

So we incorporate an offset in the equation to account for this:

(u, v) = (𝛼 * x/z + x0, 𝛽 * y/z + y0)

Here (𝑥0, 𝑦0) is the offset.

Skew

So far we’ve portrayed the image plane as a rectangle, with the width and height direction perpendicular to each other. But in the real world, the image plane might be skewed and resembles a parallelogram as shown in the below figure:

So how do we deal with this? We’ve assumed the axis are perpendicular to each other while in reality, they’re at an angle. If you think about it, it’s a change of basis problem. Given a point 𝑃 wrt the standard orthonormal axes, we need to express it wrt the skewed axes. Let (𝑥, 𝑦) be the coordinates of 𝑃 wrt the standard orthonormal basis and let (𝑥′, 𝑦′) be its coordinates wrt the skewed basis. Our goal is to find (𝑥′, 𝑦′).

From the above figure,
      cos(90−θ) = y/y′
⟹    sinθ = y/y′
⟹    y = y′sinθ
⟹    y′= y / sinθalso,
      sin(90−θ) = (x - x′)/y′
⟹    y′cosθ = x - x′
⟹    x′ = x - y′cosθ
but,  y′= y / sinθ
⟹    x′ = x - ycosθ / sinθ
⟹    x′ = x - ycotθ

Now that we found (𝑥′, 𝑦′), let’s incorporate them into the equation. We just have to replace the old coordinates with these new coordinates.

      u = 𝛼 * (x - ycotθ)/z + x0
      v = 𝛽 * (y / sinθ)/z + y0⟹    u = 𝛼x/z - (𝛼y/z)cotθ + x0
⟹    v = (𝛽y / zsinθ) + y0

The Camera Intrinsic Matrix

Finally, after accounting for the parameters that affect image formation, the image coordinates are given as:

(u, v) = [𝛼x/z - (𝛼y/z)cotθ + x0, (𝛽y / zsinθ) + y0]

We can represent this as a matrix multiplication using homogeneous coordinates:

The above matrix is called the camera intrinsic matrix, and it’s represented by 𝜅. Given the coordinates of a point in the world wrt the camera, we can multiply it with the camera intrinsic matrix to get the homogeneous coordinates of the point in the image.

Here,
𝑃′ - Homogeneous coordinates of the point in the image
𝜅  - Camera Intrinsic Matrix
𝑃𝑐 - Homogeneous Coordinates of the point in the world wrt camera

To convert from homogeneous coordinates, we simply divide by the last element:

Here, (𝑢, 𝑣) represents the Euclidean coordinates of the point in the image or the pixel location.

If you observe, the last column of the camera intrinsic matrix is a zero column, and we can remove it since it doesn’t contribute to anything and further simplify the matrix as:

And now, the matrix equation can be re-written as:

Here, it’s not required to represent the point coordinates in their homogeneous form.

Here,      
      𝑓      - focal length
      𝑠      - skew factor
      𝑐𝑥,𝑐𝑦   - offset
      𝑎     - aspect ratio

As you can see, there are five degrees of freedom in the camera intrinsic matrix.

Example

All the theory might get you a little confused. So let’s do a hands-on example to clear things up.

Setting up

The GitHub repository with all the code can be found here.

Assuming you’ve not set up the environment previously, you can do it now by running the following commands:

# create a virtual environment in anaconda
conda create -n camera-calibration-python python=3.6 anaconda
conda activate camera-calibration-python# clone the repository and install dependencies
git clone https://github.com/wingedrasengan927/Image-formation-and-camera-calibration.git
cd Image-formation-and-camera-calibration
pip install -r requirements.txt

Note: This assumes you’ve anaconda installed.

There are two main libraries we’ll be using:

pytransform3d: This library has great functions for visualizations and transformations in the 3D space.
ipympl: This is a game-changer. It makes the matplotlib plot interactive allowing us to perform pan, zoom, and rotation in real-time within the notebook which is helpful when working with 3D plots.

Example Intuition

In this example, we’ll consider a simple setup where the camera is located at the origin and the image plane is above it along the +ve Z-axis (We’ll be working with a left-handed coordinate system). Next, we plot some points such that all of them lie on the same plane parallel to the image plane and top of it. This way it’s easy to visualize the projections and image formation. Next, we create the camera intrinsic matrix and use it to project the points onto the image plane and form the image. Finally, we transform the image plane using this matrix. A detailed step-by-step explanation is provided below.

The notebook is embedded below:

Let’s go through it step by step:

First, we import the necessary libraries. The utils.py file contains all the necessary helper functions. The magic command %matplotlib widget enables ipympl backend which enables us to play with the plots.
Next, we define parameters for the image plane and points. Here I’ve taken 6 points at an elevation z=5 uniformly between the XY limits (-5, 5). The image plane is at an elevation of z=2. Finally, we plot all of them.

Next, we create camera intrinsic matrices which allow us to project the points onto the image plane and form an image. Here we have created four matrices with different parameters to illustrate the effect of each of them.

Alright, we have tinkered with the parameters and saw their effect on the image. Now, is it possible to visualize how the image plane will look like in each case? the camera intrinsic matrix is a change of basis matrix, and its function is to sample the points from the image plane. Now, we saw that we can get a transformation matrix if we take the inverse of a change of basis matrix. So let’s take the inverse of the camera intrinsic matrix and apply the result to the image plane. However, we have to remove focal length from the scenario as it deals with the elevation of the image plane and we want the elevation to be constant. We just want to see the effects of other parameters on the image plane.

The above image illustrates the image plane if the skew parameter s is set to 2, whose image is shown in the top right corner of the previous figure. Notice how the image and the image plane are oriented in opposite directions.

Conclusion

I hope you’ve enjoyed the article. I encourage you to play with the notebook, tinker with the parameters, and see the effect they have on the image. If you’ve any doubts or questions, please let me know in the comments below.

References

Introduction to Computer Vision — Udacity

Image Credits

All the images and figures in this article, unless their source is explicitly mentioned in the caption, are by the author.