VM kept on freezing unexpectedly only on GPU passed-through VM

Zerojam

Active Member
Sep 13, 2018
7
0
41
43
I have been working on GPU passthrough for a month now

Still have no luck getting it to work properly even following TechHut's step-by-step tutorial.

VM kept on freezing unexpectedly only on GPU passed-through VM

Code:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

After running the above code, it returns the following:

IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:4c53] (rev 01)
IOMMU group 10 00:1d.0 PCI bridge [0604]: Intel Corporation Device [8086:43b0] (rev 11)
IOMMU group 11 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:4387] (rev 11)
IOMMU group 11 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:43c8] (rev 11)
IOMMU group 11 00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:43a3] (rev 11)
IOMMU group 11 00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:43a4] (rev 11)
IOMMU group 12 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2488] (rev a1)
IOMMU group 12 01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)

IOMMU group 13 02:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. Device [2646:500f] (rev 03)
IOMMU group 14 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05)
IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4c01] (rev 01)
IOMMU group 2 00:14.0 USB controller [0c03]: Intel Corporation Device [8086:43ed] (rev 11)
IOMMU group 2 00:14.2 RAM memory [0500]: Intel Corporation Device [8086:43ef] (rev 11)
IOMMU group 3 00:14.3 Network controller [0280]: Intel Corporation Device [8086:43f0] (rev 11)
IOMMU group 4 00:15.0 Serial bus controller [0c80]: Intel Corporation Device [8086:43e8] (rev 11)
IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation Device [8086:43e0] (rev 11)
IOMMU group 6 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:43d2] (rev 11)
IOMMU group 7 00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:43c4] (rev 11)
IOMMU group 8 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:43bc] (rev 11)
IOMMU group 9 00:1c.7 PCI bridge [0604]: Intel Corporation Device [8086:43bf] (rev 11)

Now I am clueless since the graphic card isn't in the same IOMMU group with other devices apart from the Audio

However, I noticed that Proxmox does not show the full name of the Nvidia Graphic Card
(ex: VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070])
Does it have an impact on the performance?


Secondly, I also noticed that the GPU-passthrough VM is somehow more stable on drives where Proxmox is installed (HDD) than on other media (NVME SSD)



PC build as follows:
  1. Intel(R) Core(TM) i5-11400
  2. Nvidia RTX 3070
  3. Proxmox is installed on an 8TB HDD
  4. VM's are installed on a 512GB M.2 NVME SSD
 
VM kept on freezing unexpectedly only on GPU passed-through VM

Code:
for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done

After running the above code, it returns the following:

IOMMU group 0 00:00.0 Host bridge [0600]: Intel Corporation Device [8086:4c53] (rev 01) IOMMU group 10 00:1d.0 PCI bridge [0604]: Intel Corporation Device [8086:43b0] (rev 11) IOMMU group 11 00:1f.0 ISA bridge [0601]: Intel Corporation Device [8086:4387] (rev 11) IOMMU group 11 00:1f.3 Audio device [0403]: Intel Corporation Device [8086:43c8] (rev 11) IOMMU group 11 00:1f.4 SMBus [0c05]: Intel Corporation Device [8086:43a3] (rev 11) IOMMU group 11 00:1f.5 Serial bus controller [0c80]: Intel Corporation Device [8086:43a4] (rev 11) [COLOR=rgb(184, 49, 47)][B]IOMMU group 12 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2488] (rev a1) IOMMU group 12 01:00.1 Audio device [0403]: NVIDIA Corporation GA104 High Definition Audio Controller [10de:228b] (rev a1)[/B][/COLOR] IOMMU group 13 02:00.0 Non-Volatile memory controller [0108]: Kingston Technology Company, Inc. Device [2646:500f] (rev 03) IOMMU group 14 04:00.0 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller [10ec:8125] (rev 05) IOMMU group 1 00:01.0 PCI bridge [0604]: Intel Corporation Device [8086:4c01] (rev 01) IOMMU group 2 00:14.0 USB controller [0c03]: Intel Corporation Device [8086:43ed] (rev 11) IOMMU group 2 00:14.2 RAM memory [0500]: Intel Corporation Device [8086:43ef] (rev 11) IOMMU group 3 00:14.3 Network controller [0280]: Intel Corporation Device [8086:43f0] (rev 11) IOMMU group 4 00:15.0 Serial bus controller [0c80]: Intel Corporation Device [8086:43e8] (rev 11) IOMMU group 5 00:16.0 Communication controller [0780]: Intel Corporation Device [8086:43e0] (rev 11) IOMMU group 6 00:17.0 SATA controller [0106]: Intel Corporation Device [8086:43d2] (rev 11) IOMMU group 7 00:1b.0 PCI bridge [0604]: Intel Corporation Device [8086:43c4] (rev 11) IOMMU group 8 00:1c.0 PCI bridge [0604]: Intel Corporation Device [8086:43bc] (rev 11) IOMMU group 9 00:1c.7 PCI bridge [0604]: Intel Corporation Device [8086:43bf] (rev 11)

Now I am clueless since the graphic card isn't in the same IOMMU group with other devices apart from the Audio
IOMMU groups cannot be shared between VMs and/or the Proxmox host. That all functions (VGA & Audio) of your GPU are in the same group is normal. That they are in a separate group from everything else is also good and even required for passthrough.
However, I noticed that Proxmox does not show the full name of the Nvidia Graphic Card
(ex: VGA compatible controller [0300]: NVIDIA Corporation GA104 [GeForce RTX 3070])
Does it have an impact on the performance?
No, this is normal and of no consequence.
Secondly, I also noticed that the GPU-passthrough VM is somehow more stable on drives where Proxmox is installed (HDD) than on other media (NVME SSD)
Probably coincidence, or it works for longer only because it is starting slower.
PC build as follows:
  1. Intel(R) Core(TM) i5-11400
  2. Nvidia RTX 3070
  3. Proxmox is installed on an 8TB HDD
  4. VM's are installed on a 512GB M.2 NVME SSD
You have only a single GPU in your system that is probably used to display by Proxmox to show boot messages before starting the VM. That is most likely your problem and you need this work-around to make sure your GPU is not touched by anything before starting the VM.

More information, like the errors your are getting (after applying the work-around) in the Proxmox Syslog (journalctl) when starting the VM and what version of Proxmox you are running (pveversion -v) would help. Also, what is the output of cat /proc/cmdline and lspci -nnks 01:00 just after restarting Proxmox but before starting the VM?
 
  • Like
Reactions: Zerojam
IOMMU groups cannot be shared between VMs and/or the Proxmox host. That all functions (VGA & Audio) of your GPU are in the same group is normal. That they are in a separate group from everything else is also good and even required for passthrough.

No, this is normal and of no consequence.

Probably coincidence, or it works for longer only because it is starting slower.

You have only a single GPU in your system that is probably used to display by Proxmox to show boot messages before starting the VM. That is most likely your problem and you need this work-around to make sure your GPU is not touched by anything before starting the VM.

More information, like the errors your are getting (after applying the work-around) in the Proxmox Syslog (journalctl) when starting the VM and what version of Proxmox you are running (pveversion -v) would help. Also, what is the output of cat /proc/cmdline and lspci -nnks 01:00 just after restarting Proxmox but before starting the VM?

Great advice! You're the God, read many of your posts, and all of them are very fruitful

I simply plugged the HDMI into Intel(R) UHD Graphics port and reinstalled Proxmox and that leaves 3070 all to the VM instance

Thanks so much for your suggestion, that simply rocks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!