Change of Basis

If you think it’s easy, you’re probably doing it wrong.

Jeremy Cowles
Towards Data Science

--

If you’re struggling with coordinate system conversion, enjoy this sunset for a second, knowing your answer is just below.

I’ve recently been working the USD Unity SDK and one topic that has come up a few times is basis conversion from right-handed systems (like OpenGL/USD) to left-handed systems (like Unity/DirectX). If you need to convert a matrix or vector from a left handed coordinate system to a right handed coordinate system, you know my pain; and in the spirit of pragmatism, I’ll present the solution first and then we can dive into the details. I’ll also present one weird trick and who knows, maybe we can just multiply Z by -1.

TL;DR

Below is the fully general change of basis formula:

B = P * A * inverse(P)

The erudite reader will identify this change of basis formula as a similarity transform. It can be applied to a matrix A in a right-handed coordinate system to produce the equivalent matrix B in a left-handed coordinate system. The matrix P is the change of basis transformation. So what is the change of basis matrix exactly? If your intent is to convert values from the DirectX to OpenGL default coordinate systems, which differ by a scale of -1 on the Z axis, then the change of basis matrix is the transformation between these systems:

P = [[ 1  0  0  0]
[ 0 1 0 0]
[ 0 0 -1 0]
[ 0 0 0 1]]

This matrix has a special property in that it is involutory (it doesn’t change under inversion). Also note that the change of basis formula with this change of basis matrix only results in six non-identity multiplies. To make the solution fully concrete, here is the implementation in C# using the Unity Engine API:

Matrix4x4 ChangeBasis(Matrix4x4 input) {
var basisChange = Matrix4x4.identity;
// Invert the forward vector.
basisChange[2, 2] = -1;
// Note the inverse of this matrix is the matrix itself.
var basisChangeInverse = basisChange;
// This multiply could be simplified to
// -1 * elements [2,6,8,9,11,14], which can be simplified
// further; see Eberly's proof for details.
return basisChange * input * basisChangeInverse;
}
Vector3 ChangeBasis(Vector3 point) {
UnityEngine.Matrix4x4 mat = UnityEngine.Matrix4x4.identity;
mat[2, 2] = -1;
return mat.MultiplyPoint3x4(point);
}

So there you have a fully general and symmetrical solution for vectors, transforms, cameras, and lights. Note that changing the basis of an entire asset means converting the points, transforms, normals, tangents and all secondary data (e.g. directional vectors which are parameters to shaders). This is no small task. If the asset is animated, you may also need to do this every frame.

You may ask yourself, “but can’t I avoid all these multiplies by using one weird trick?” Well yes, but it’s actually two tricks and it comes at a cost; but before diving into tricks, let’s first agree that the change of basis is the correct general solution.

Quest for Fire

Anyone who has needed to interchange data between DirectX and OpenGL has felt the pain of needing to change basis between the two coordinate systems. Before finding the fully general solution, I spent way too long flailing around with the wrong solution, in which I attempted to simply invert the Z coordinate of all values. This *appears* to do the right thing when inspecting positions, however it fails to account for rotations and sheering. To understand why this is incorrect, it helps to look at the effect of a proper change of basis on a matrix of ones:

[[ 1  0  0  0]      [[1 1 1 1]    [[1  0  0  0]     [[ 1  1 -1  1]
[ 0 1 0 0] x [1 1 1 1] x [0 1 0 0] = [ 1 1 -1 1]
[ 0 0 -1 0] [1 1 1 1] [0 0 -1 0] [-1 -1 1 -1]
[ 0 0 0 1]] [1 1 1 1]] [0 0 0 1]] [ 1 1 -1 1]]

Oh hai, a lot more than just the Z translation is inverted! We now see the six multiplies I claimed above and clearly, this is not equivalent to multiplying Z by -1, but why is this solution correct?

Perhaps you’ve also stumbled across this document by David Eberly. While logical, seemingly correct, and only requiring four matrix multiplies in three special cases, it was difficult for me to see how it generalized and I didn’t want to blindly apply the formula without really understanding what I was doing.

From linear algebra, I knew that what I really wanted was to perform a change of basis, so I set out to find some mathematical foundations for that instead. What I ultimately found was the text book definition of change of basis.

A Similarity Transformation

On Wikipedia, you will find the formula above, but accompanied by a far more inscrutable description than Eberly’s. The underpinnings of this transformation are actually simple to compute and understand, but difficult to explain, so here I’m going to direct you to the 3Blue1Brown video on change of basis. He arrives at the similarity transformation 11 minutes into the video.

Ok, now that we all agree the change of basis formula is fully general and fully correct, let’s get weird.

One Weird Trick: Partial Change of Basis

I didn’t like that this transformation required two matrix multiplies (modulo optimization) per transform and had to be applied to every point, transform and normal (etc) in the asset; and again, this can be particularly slow for animation — so can’t we avoid it? The knowledge of the internet will tell you, “no, it can’t be avoided, suck it up”, but this isn’t completely true, assuming you’re comfortable with a few caveats.

Let’s think about how the change of basis transformation manifests in our data. Assume we have a chain of transforms applied to a vertex position. In a real asset, we have tons of other data, but this limited view of the world is enough to illustrate the idea. Our goal is to transform a point from object space to world space. Assuming T0..T2 are 4x4 matrix transforms, V is a 4x1 vector position to be transformed, our matrix multiplication looks like this when applying the fully general change of basis:

V' = P * T0 * inverse(P)
* P * T1 * inverse(P)
* P * T2 * inverse(P)
* P * V

All those matrix multiplies look extremely redundant, can’t we operate in an intermediate space and only apply the conversion when we’re done? Well yes, but we’d like to avoid having to write this code explicitly in every shader in the system. Instead, we leverage the fact that we know there is a contiguous subgraph in a consistent space (i.e. a single output conversation step should suffice).

V' = P * T0
* T1
* T2 * V

That’s a huge reduction in work! Only the root transform needs to be converted and all other data can remain in its native coordinate space. This works because T0..T2 and V are in the same coordinate system, so rotations work as expected. Additionally, the final output transformation P converts the resulting point correctly, since by definition it cannot be a rotation (it’s just a position). Finally the camera is in the space which P outputs, so the resulting scene appears correct. So what’s the catch?

If we apply this trick while including a camera transform, the camera must either be fully converted or we’ve left the final vertex value in a different coordinate space than where it started when viewing it through this camera. However, to keep all the scene data in a consistent coordinate space, we can instead complete the partial change of basis transformation via the camera’s view transform:

C' = C * inverse(P)

Now, when we go to render our left-handed camera and data in a right-handed coordinate system (for example), the typical camera space transformation will result in the correct final similarity transformation as follows:

CameraSpaceV = (view) * (model) * (vertex)
= C * T0 * T1 * T2 * V
= inverse(P) * C * P * T0 * T1 * T2 * V
= P * C * inverse(P) * T0 * T1 * T2 * V

And now we have two weird tricks :)

The net result has reduced an O(N) number of matrix multiplies on the incoming asset data to O(1).

Caveat Lector

There are actually a few caveats worth mentioning:

  1. On import, this leaves the raw matrix and vertex data in the other coordinate system. Anyone inspecting this data by hand may be confused by what they are seeing versus the values of the data.
  2. The root transform will contain a negative scale. Naive matrix decomposition algorithms will fail to handle this correctly and will convert the scale into a rotation. It is possible to detect this change of basis scale and handle it correctly when decomposing a matrix.
  3. If you are operating on matrix data which gets applied directly to the view transform, this trick will require additional fixes.

Conclusion

Above all, I hope the fully general solution is helpful to anyone scouring the internet for a quick solution. Secondarily, I hope that pointing out the fact that ALL data must be converted exposes the corollary that it is actually fraught with peril and the caveats of the “one weird trick” / partial change of basis may actually be preferable in some cases, not only for performance, but for the sake of correctness.

--

--

Director of the Machine Learning Artistry Lab at Unity Technologies. Formerly Google, Pixar, + 10 years of random startups.