[SOLVED] GPU gets very hot after enabling IOMMU on fresh PVE install

Sandbo

Well-Known Member
Jul 4, 2019
85
10
48
34
I recent got a new build for using PVE, with the key components:
  • AMD 2700X
  • Gigabyte X570S UD
  • Vega 56
  • PVE 7.0-11
After the installation, before setting up any VM, I installed lm-sensor and checked the temperature of various items.
The GPU shows 44C and is actually only warm to touch. This temp is stable and never goes up more than 1-2 degrees over an hour.

I moved on to follow the Wiki to enable IOMMU in hope of passing the GPU to a Windows/Linux guest.
On finishing the preparation but before creating any VM, I found the GPU becomes rather hot to touch.
As the GPU is no longer loaded by Proxmox I cannot check the actual temperature either.

Is it normal that the GPU gets so hot as it was not seen by any driver?
I am now trying to install a Windows guest and see if it helps, will update this post later.
Update:
It doesn't solve the problem even now the GPU is partly detected in Windows. GPU-Z shows 77C with NO loading at all.
And after installing AMD driver, the control panel cannot be opened. Initially it shows Code 43 in the device manager, but reenabling the GPU made it disappear.

Solved:
Turns out I missed one trouble-shooting entry on the wiki page:
video=efifb:eek:ff
Although my error was Bar 0 not Bar 3. That seems to cause the GPU to run at a high loading for no reason.
After adding this to grub the pass through works completely and AMD's control panel can now open normally.
 
Last edited:
Marked as solved, but for your and future reader's information: The IOMMU part is irrelevant here, but adding the 'vfio-pci' stanzas, i.e. loading the vfio-pci driver on the host (PVE), means the amdgpu driver cannot load. For some hardware, that AMD driver is what controls fans and power saving modes of the GPU, so it can happen that the GPU gets stuck in a high-power low-fan condition upon boot, which might lead to high temps.

Once the VM loads with a valid driver, power and fan control is restored (although control is assumed by the guest OS in the VM), and the temperature can normalize.
 
  • Like
Reactions: Sandbo

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!