Burning GPU while training DL model? These commands can cool it down.

Justin Ho
Towards Data Science
4 min readJul 20, 2017

--

We often use Geforce GPU to do the deep learning model training for personal research, but the GPU temperature will go up to 84°C when it’s full loaded running! That’s not only burning the GPU, but also burning our heart!

Inspired by a friend from zhihu (Chinese version Quora), he modified the GPU fans speed to cool down the GPU , that’s because the default nvidia gpu setting limits the gpu fan speed to 70% or less, we have to change the gpu fan speed manually to cool it down.

PS: this tutorial applies the Ubuntu OS with GTX GPUs, please note that.

中文版本:

I also write this article in Chinese, here is Chinese version: 深度学习训练时GPU温度过高?几个命令,为你的GPU迅速降温。

If you have the monitor attached.

Step 1 : modify the /etc/X11/xorg.conf files:

sudo nano /etc/X11/xorg.conf

Step 2 : put these following commands to the “Section Device” term:

Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA"
Option "Coolbits" "4"
EndSection

Step 3 : reboot your machine.

Step 4 : enter this command:

nvidia-settings -a “[gpu:0]/GPUFanControlState=1” -a “[fan:0]/GPUTargetFanSpeed=100”

“GPUFanControlState=1” means you can change the fan speed manually, “[fan:0]” means which gpu fan you want to set, “GPUTargetFanSpeed=100” means setting the speed to 100%, but that will be so noisy, you can choose 80%.

If you don’t have monitor attached.

We often use ssh to connect the machine, so the machine won’t have monitor attached. But the method above can only work when you have monitor, so we have to do something to fool your OS, make it think you have a monitor.

The solution is referencing this article: fan speed without X (headless) : powermizer drops card to p8, the author provided a script to change the fan speed. Here’s the full steps:

Step 1 : clone this repo to this directory: /opt

cd /opt
git clone https://github.com/boris-dimitrov/set_gpu_fans_public

This repo includes these files, the key script is “cool_gpu”.

Step 2 : change the document’s name from “ set_gpu_fans_public” to “set-gpu-fans”, that’s the author’s little wrong type.

sudo mv set_gpu_fans_public set-gpu-fans

Step 3 : cd to the set-gpu-fans directory, and enter these commands:

cd /opt/set-gpu-fans
sudo tcsh
./cool_gpu >& controller.log &
tail -f controller.log

It will start the cooling script, you can see the following process:

Test time

Before we test if it can cool the gpu, we check the current temperature.

It’s about 37°C right now, ok, let’s run a model and see how it goes.

The temperature is getting higher, when it become stable, you can see the final temperature:

WOW we did it! We cooled the gpu from 84°C to 67°C! That’s really awesome!

There’s one thing you should know, the current GPU power state is P2, that means it’s running at high performance, the highest is P0, the lowest is P12. This cooling script works so fine, and it didn’t reduce the performance.

ATTENTION! The original version of cooling script will decrease the GPU performance!

The script we used above is a improved version from the original one, the original one is here: Set fan speed without an X server.

Many people use the original script will cause performance decreasing problem, it’ll drop the power state from P2 to P8, even P12, so I strongly suggest that not to use the original one. But we still pay some respect for the original script author, we can’t change the fan speed without his work.

If you think this article can help you, please click the heart and share it with your friends! Thanks for reading.

--

--