Hands-on Tutorials
How to render a 3D mesh and convert it to a 2D image using PyTorch3D
A hands-on guide with Python code to render 3D .obj files (polygonal meshes) using PyTorch3D API

3D understanding plays a critical role in numerous applications ranging from self-driving cars and autonomous robots to virtual reality and augmented reality. Over the past year, PyTorch3D has become an increasingly popular open-source framework for 3D deep learning with Python. Gratefully, the folks behind the PyTorch3D library have done the legwork of implementing several common 3D operators, loss functions, and differentiable rendering API, making PyTorch3D even more accessible and easier to get started (or play) with [1]. Some of the key PyTorch3D components include:
- Data structures for storing and manipulating triangle meshes
- Efficient operations on triangle meshes
- Differentiable mesh rendering API
Rendering is an essential building block in a computer graphics pipeline that converts 3D representations – be they meshes (.obj
) or point clouds (.ply
) – into 2D images.
In this post, we’ll build background knowledge on how to render a 3D
.obj
file from various viewpoints to create 2D images. We’ll also build a basic 3D rendering pipeline using PyTorch3D in Python with components shown below.

This post assumes only a basic knowledge of 3D file representation so hopefully it’ll be accessible for everyone 🙂 However, if you’d like to read more about 3D reconstruction, then check out this fabulous, up-to-date resource list [2] or course notes from Stanford CS231A [3] and CS468 [4] classes.
At the end of this article, you will know how to:
- Load a 3d Mesh using
.obj
and.mtl
files - Create a renderer
- Render the mesh
- Optional: Use batch properties to render the mesh efficiently from different viewpoints
Just want the code? The entire code is available in this GitHub repository [5].
Ready? Let’s start! 🎉
Step # 1: Import libraries and initialize parameters
We start with importing pre-requisite libraries, such as torch
or numpy
, and a variety of utility functions and structures from Pytorch3d
library.

Lastly, line 43 imports class Params from utils.py
, which loads the important hyperparameters from a configuration file. Generally, it is a good practice to write all your parameters in a single file and load them from this particular file. This allows you to keep track of the hyperparameters that you are testing and review which hyperparameters led to the best performance. In our case, the hyperparameters are stored in params_demo.json
:

Don’t worry if some of those hypeparameters don’t make sense; I’ll explain them later in this tutorial!
Loading the hyperparameters is done via:
params = Params("params_demo.json")
# Access batch size parameter
print(params.elevation)
Once your params
object is initialized, you can also update it with another .json
file using params.update("your_other_params.json")
method.
Okay, so now that we imported libraries and declared parameters, we can load the mesh. 🎉
Step # 2: Load the 3D mesh
There are a couple of ways to represent 3D data, such as point clouds, meshes, or voxels [6]. In this tutorial we’ll focus on 3D meshes although the same procedure in PyTorch3D is applicable to point clouds too [7].
Information about a 3D textured mesh is typically stored in the following files:
.obj
file, which stores vertices and faces.mtl
file, which stores material properties.jpg
or.png
texture image
In this tutorial, we will be working with a 3D capsule object, which is stored in data/capsule
folder. The example files were obtained from a public repository hosted here [8]. To visualize the mesh we are working with, we can use Blender:

PyTorch3D contains several functions to load .obj
files, such as load_obj
or load_objs_as_meshes
. We will use the first one and we load the .obj
file using the following syntax:
verts, faces, aux = load_obj(filename)
Here, verts
is a (V,3)-tensor of vertices, faces.verts_idx
is an (F,3)-tensor of indices for each face corner, and aux
stores auxiliary information about the mesh, such as uv coordinates, material colors, or textures. We then pass those verts
, faces.verts_idx
, and aux
structures into the Meshes constructor which creates an object named capsule_mesh
:

Lastly, line 33checks the number of faces and vertices in capsule mesh. This returns:
We have 5252 vertices and 10200 faces.
which is what we would expect by checking the .obj
file structure.
If you’d like to learn more, the official documentation for Meshes object can be found here [9].
Step # 3: Create a renderer
This is probably the most important step. __ Now that we successfully read our capsule mesh, we need to create a renderer using MeshRenderer
class. Looking at MeshRenderer
documentation [10], we see that it consists of 2 components:
- rasterizer
- shader
So let’s break down this task into 2 steps and, at the end, we put them together.
Step # 3a: Create a rasterizer
Rasterization refers to taking an image representation described in polygons or triangles (.obj file) and converting it into a raster image described in pixels (.png or .jpg file).
Our rasterizer is created by using a class called MeshRasterizer
, which also has several subcomponents, such cameras
and raster_settings
arguments. Basically, cameras
is responsible for transforming 3D coordinates from the world space to the screen space. To initialize the camera, we need 3 important arguments. These are: 1) distance, 2) azimuth angle, and 3) elevation angle. If that sounds like a lot, don’t worry; I’ll go through them step-by-step.
Distance refers to the distance between the camera and the object.
Elevation angle refers to the angle between the vector from the object to the camera and the horizontal plane y=0 (plane xz). Elevation basically tells us from how high we are looking at the object.
Azimuth angle refers to the vector from the object to the camera is projected onto a horizontal plane y = 0. Azimuth angle is the angle between the projected vector and a reference vector at (0, 0, 1) on the reference plane (the horizontal plane). Azimuth angle takes on values on the interval from 0º to 360º. It basically tells us from which side (e.g. left size, right side, front view, back view, etc.) we are looking at the object. See more info here [11], [12].
In our params.json
file (Fig 2), we declared that distance is 3, elevation is 0, and azimuth angle is 90, so if we render this mesh, we should be looking at it directly from distance of 3 units.
Regarding raster settings, the most important parameter is the size of the resulting 2D image. The smaller the size, the more pixelated the image will appear.

Step # 3b: Create a shader
PyTorch3D offers multiple types of shaders, including SoftPhongShader
or HardPhongShader
. Here we’ll use a predefined SoftPhongShader
and pass in the camera and the device on which to initialize the default parameters.
Last but not least, we combine both the rasterizer and the shader:

Step # 4: Render the mesh
This is a very easy step because we only have to call renderer
method on our Meshes object. Let’s render our capsule mesh and plot the results:


The rendered result looks pretty much the same as the Blender visualization in Fig 5, which is a good sign! 🙂
Optional: Step # 5: Using batch properties
Using batch properties may potentially be useful if you want to render the mesh from multiple viewpoints. Before we dive into the code, it’s worth understanding how the current batching implementation works. The current implementation relies on a single argument, which is the batch size. This batch size then divides both elevation and azimuth angle space into n equal increments. So, if your batch size is 4, then your elevation and azimuth angle spaces are torch.linspace(0, 360, 4)
which is tensor([0, 120, 240, 360])
. In each batch, the index moves along both elevation and azimuth angle lists and stops once all elements are exhausted. As a result, we only get 4 rendered pictures: a) with both elev. and azimuth = 0, b) with both elev. and azimuth = 120, c) with both elev. and azimuth = 240, and d) with both elev. and azimuth = 360.
This is analogous to the Python map()
function where you pass two iterable arguments – you also don’t get the results from all pair-wise combinations of those two parameters. So if you expect to get all pair-wise combinations of elevations and azimuth angles, then something like list comprehension is one way to go.
Alright, alright, alright, back to batch properties… We set the batch size to 4, which represents the number of viewpoints from which we want to render the mesh. We use this batch size to extend our meshes, elevation vectors, and azimuth angle vectors. After images are rendered, the resulting tensor has shape [4, 256, 256, 4]
.


Congratulations! 🎉 You have now understood the ins and outs of how to render a 3D mesh from a single and multiple viewpoints.
Here’s what we’ve gone through:
- We’ve seen the installation of PyTorch3D
- We’ve loaded the mesh and textures from
.obj
and.mtl
files - We’ve created a renderer to render the mesh
- We’ve utilized PyTorch3D batching features to extend the mesh and render it from multiple viewpoints in a single forward pass
I’ve helped you to explore some basic PyTorch3D properties but please don’t stop here. It’s just the beginning and it’s up to you where this journey will take you!
Before you go
Like the tutorial and have any comments or suggestions? Anything I may have missed? I’d love to hear your thoughts 🙂 Send me a message or follow me on GitHub!
References
[2] https://github.com/timzhang642/3D-Machine-Learning
[3] https://web.stanford.edu/class/cs231a/syllabus.html
[4] https://graphics.stanford.edu/courses/cs468-17-spring/schedule.html
[5] https://github.com/adelekuzmiakova/pytorch3d-renderer
[6] https://towardsdatascience.com/how-to-represent-3d-data-66a0f6376afb
[7] https://github.com/facebookresearch/pytorch3d/blob/master/docs/tutorials/render_colored_points.ipynb
[8] http://paulbourke.net/dataformats/obj/minobj.html
[9] https://github.com/facebookresearch/pytorch3d/blob/master/pytorch3d/structures/meshes.py
[11] https://pvpmc.sandia.gov/modeling-steps/1-weather-design-inputs/sun-position/
[12] https://www.celestis.com/resources/faq/what-are-the-azimuth-and-elevation-of-a-satellite/