AR/VR/3D

Using AI to Generate 3D Models, Fast!

How to use NVIDIA’s NeRF code to create 3D models

Andrew Blance
Towards Data Science
8 min readJul 27, 2022

--

its like a floating glass cube covered in grass
Photo by DeepMind on Unsplash

Generating 3D models can be time-consuming, or require an enormous set of reference images. One method to get around this is a neural radiance field (NeRF), an AI method of generating images. NeRFs take a small set of 2D images you have taken of an object or scene and use them to (effectively) build a 3D representation. This is done by learning to transition between the images you already have. This jumping (or interpolating, which is a slightly fancier term) leads you to generate images of new perspectives of the object!

Sounds great, right? From a small set of images, you can make a 3D model! This has benefits over standard photogrammetry, which requires an enormous library of images to generate something (you need to have footage of every angle). However, we did promise at the start that NeRFs were fast, which until recently, was not the case. Previously, NeRFs took a very long time to learn how to transform your set of images into something 3D.

This is no longer the case. Nvidia has released its instant NeRF software, which makes use of GPU hardware to run the necessary complex calculations. This has reduced the time needed to create a model from days to seconds! NVIDIA makes a lot of exciting claims about the useability and speed of the instant-ngp software. The results and examples they provide are extremely impressive:

NeRF generated 3d model of a lab
NeRF image NVIDIA has made of a cool robotics lab. Gif by https://github.com/NVlabs/instant-ngp

It's hard not to be impressed by this I think — it looks great! I wanted to check out how easy it was to transfer this onto my own images and generate my own NeRF models. So I decided to install and use the software myself. In this article, I will go through my attempts, and detail the models I made! Let's go!

The Pipeline

So what are we gonna do?

  • We need reference footage. Let's go record something we want to 3D-ify!
  • We will take this, and transfer it to still images. This process also tries to understand the angles we were recording from.
  • We pass this into instant-ngp. This then trains an AI to understand the spaces between the images we have generated. This is effectively the same as making a 3D model.
  • Finally, we want to create a video of our creation! Inside the software NVIDIA has made, we will draw a path for the camera to take around the model, and then render the video.

I won't go into the nitty-gritty of how this all works (feel free to ask questions though!), but I will put links to a lot of resources I have found useful. Instead, I’ll focus on the videos I made, and little parts of the journey I stumbled on.

My Attempts (aka Scruffy-Looking Nerf Herding)

I won’t lie, I found this hard to install. While the instructions are clear, I feel there is less wiggle room when it comes to particular software versions you need than the Requirements section of the repo implies. Using CUDA 11.7 or VS2022 seemed impossible to me, but I think it was the switch back to 11.6 and VS2019 that helped get the installation process moving along. I had a lot of errors like:CUDA_ARCHITECTURES is empty for target. This is caused by CUDA not wanting to play nicely with Visual Studio. I really recommend this video and this repo for further help in setting everything up!

Aside from that, the process works smoothly. Python scripts are supplied to help guide converting video you shoot to images, and the subsequent step of converting that to a model and then a video.

Attempt 1: Lego Car

Initially, I tried to NeRF-ify a small Lego car in the office. I think my photography skills were lacking, as I was not able to create anything sensible at all. Just a weird 3D smudge. Why though? Well, let's look at one of the examples NVIDIA provide us. Note the camera placement:

camera positions around a NeRF generated lego digger
“Camera” placement for a default NeRF that NVIDIA provides, of a digger. Photo by https://github.com/NVlabs/instant-ngp

A trait of a setup that will train well is having “cameras” positioned like above. These cameras are the angles that the software believes you were facing when you shot the video. It should be a nice circle. With my first Lego car it was not like this at all, but rather a squished semi-circle.

Attempt 2: Lego Car (but bigger this time!)

Trying to learn from my first attempt, I found a table that I could fully walk around, and found a larger Lego car. I tried to make sure I shot for longer than before too. In the end, I had 1 minute of smooth footage from every angle. I trained the model for less than 30 seconds. After 4 hours of rendering at 720p, this is the video I made:

NeRF generated 3d model of a lego car
My second NeRF — a Lego Technic car!

Attempt 3: The Plant

Ok, so attempt 2 was better, at least it technically worked. However, there's a weird fog and it's not super sharp. For my next attempt, I tried to shoot from further back as well (I assumed the fog is caused by the AI being “confused” about what is there). I tried to control the aabc_scale more (a measure of how large the scene is) and then trained it for a few minutes. After rendering, the video looks like this:

NeRF generated 3d model of a plant pot
A NeRF I made of a plant on my living room table.

So much better! What’s incredibly impressive here is how it has managed to render the crochet plant pot, and grooves in the wood so accurately, and the intricacy of the leaves. Look at that little swoop the camera does through the leaves!

Attempt 4: So, we’re getting better and better! However, I want one of the outside — let's try it out. I shot just less than 2 minutes of footage outside my flat and began processing that. This was particularly chunky to render/train. My guess here is that my value of aabc_scale was quite high (8), and therefore the rendering “rays” had to go out quite far (ie, the amount of stuff I wanted to render is more). I had to switch to 480p and drop the render fps to 10 (from 30). It goes to show that your choices of settings really do affect your render time. After 8 hours of rendering, I had made this:

A NeRF I made of outside my flat.

I think Attempt 3 is still my favourite. I think I could make Attempt 4 a bit better. However, iterating through versions and trying different render and training settings is hard when render time is so long now. Even setting up camera angles for the render is difficult now, causing the programme to become sluggish for me.

Still, what an amazing output, from just a minute or two of video, I have a detailed, lifelike 3D model!

Pros and Cons

I think the most impressive thing is that given 1–2 minutes of footage, someone entirely untrained in photogrammetry (me) can create a workable 3D model. It requires some tech know-how, but once you get everything installed it is slick and simple to use. Transforming video into images works well, with Python scripts supplied to do this. Once these are made, inputting this into the AI happens smoothly.

However, and it’s hard to fault NVIDIA for this one, but I feel I should bring it up: this thing requires a pretty beefy GPU. I have a T500 in my laptop, and it pushes it to its absolute limit. Training does take longer than the advertised 5 seconds, and trying to render in 1080 causes the program to crash (I was dynamically rendering down near 135*74). Now, this is still a huge improvement over previous NeRF implementation, which took days. I don’t imagine everyone has a 3090 to use for projects like these, so it's worthwhile briefly mentioning it. The low performance made it hard to use the programme, especially when I was trying to “fly” the camera around to set up my rendered videos. Still, it's hard not to be impressed with the output of the process.

The other issue I faced was finding render.py (which, as you might guess, is vitally important to render the video). Very strangely it isn't in the repo, though is heavily referred to in most of the advertising articles and other pieces of documentation. I had to dig it out from here.

Finally, I would also love to be able to extract these as a .obj — maybe that is already possible though?

NeRF generated 3d model of a fox
A gif of a fox — I didn't make this one, NVIDIA did. Good job, eh? Gif by https://github.com/NVlabs/instant-ngp

Final Thoughts and What’s Next

What this makes me think about is DALL-E, the image-generating AI. This has become incredibly popular, partly because of how easily accessible it is. It’s given people an incredibly cool example of what AI models can do, and the limitations they have. It's entered pop culture (or at least it features heavily on my Twitter), with people making their own weird DALL-E images and sharing them. I could imagine something similar happening with this technology too. The viral potential of a website that could let absolutely anyone upload a video and create a 3D model you could share with your friends is enormous. It's almost inevitable that someone will make this eventually.

Personally, I’m looking forward to experimenting with this more. I want to be able to generate super realistic models and then dump them into AR/VR. From there you could host meetings in that space — wouldn't that be fun? Since you only need the camera on your phone, most of the hardware needed to do this is already in the hands of the users.

Overall, I’m really impressed. Being able to take 1 minute of video on your phone and turn it into a model you can step through is amazing. Yes, it takes a while to render and it’s a little difficult to install but the results are great. After a couple of attempts, I was already getting really cool output! I'm looking forward to experimenting more!

Andrew Blance

Please subscribe to me here on Medium to keep track of any exciting future articles! You can also get in touch here: LinkedIn | Twitter

--

--