Passthrough RTX 5090 CPU Soft BUG lockup, D3cold to D0, after guest shutdown

tytanick

Renowned Member
Feb 25, 2013
109
3
83
Europe/Poland
Guys, i have no idea how to solve this. Already lost like 2 weeks.
Already fixed one side of issues by using ovmf bios but here is the second half mostly when sometimes guest shuts down.

I am passthrough RTX5090 and RTX PRO 6000 to linux VMs. And this is working fine (that would suggest no riser issues).
Then sometimes on guest shutdown i am getting those errors below.
Then only reboot of the whole host helps. Its like GPU goes into sleep or some other state and cant wake up or cant be seen.
Or probably there is some issue with vfio ?


I am getting two errors when this happens:

Errors:
"Unable to change power state from D3cold to D0, device inaccessible"
and
CPU BUG soft lockup

Software:
Proxmox 8.4
Motherboard GENOA2D24G-2L+
CPU: 2x AMD EPYC 9654 96-Core Processor
GPU: 5x RTX PRO 6000 blackwell and 6x RTX 5090

I really need to get this sorted and if anyone will help, i can pay for your time guys.
Beside configs below i also tried:
Setting 0 into each GPU at host boot: /sys/bus/pci/devices/0000:$id.0/d3cold_allowed
Disabled ESPM pci-e in bios so that it always stays on PCI-E 5.0 x16

Full errors:
[490431.407576] tap12850079i0: entered promiscuous mode
[490431.440282] wanbr: port 4(tap12850079i0) entered blocking state
[490431.440291] wanbr: port 4(tap12850079i0) entered disabled state

[490431.440354] tap12850079i0: entered allmulticast mode
[490431.440608] wanbr: port 4(tap12850079i0) entered blocking state
[490431.440611] wanbr: port 4(tap12850079i0) entered forwarding state

[490431.805032] vfio-pci 0000:81:00.0: Unable to change power state from D3cold to D0, device inaccessible
[490431.818384] vfio-pci 0000:81:00.0: Unable to change power state from D3cold to D0, device inaccessible
[490431.821151] vfio-pci 0000:81:00.0: Unable to change power state from D3cold to D0, device inaccessible

[490432.869619] pcieport 0000:80:01.1: Data Link Layer Link Active not set in 1000 msec
[490432.874478] wanbr: port 4(tap12850079i0) entered disabled state
[490432.875371] tap12850079i0 (unregistering): left allmulticast mode

CleanShot 2025-07-13 at 10.31.50@2x.png



CleanShot 2025-07-13 at 10.35.12@2x.png

And then i see also CPU softlockup:

CleanShot 2025-07-13 at 17.32.30@2x Duży.jpeg


pveversion
Code:
pve-manager/8.4.1/2a5fa54a8503f96d (running kernel: 6.8.12-11-pve)

uname
Code:
Linux cee93dbc-5f39-4690-bc2c-ec4d4cbf4ea2 6.8.12-11-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-11 (2025-05-22T09:39Z) x86_64 GNU/Linux

cat /etc/default/grub | grep LINUX_DEF
Code:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 vfio-pci.ids=10de:22e8,10de:2bb1 hugepagesz=1G hugepages=1300 default_hugepagesz=1G initcall_blacklist=sysfb_init"

cat /etc/modprobe.d/vfio.conf
Code:
options vfio_iommu_type1 allow_unsafe_interrupts=1
options kvm ignore_msrs=1 report_ignored_msrs=0
options vfio-pci ids=10de:22e8,10de:2bb1 disable_vga=1 disable_idle_d3=1

cat /etc/modprobe.d/blacklist-gpu.conf
Code:
blacklist radeon
blacklist nouveau
blacklist nvidia
# Additional NVIDIA related blacklists
blacklist snd_hda_intel
blacklist amd76x_edac
blacklist vga16fb
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

blacklist microcode

lspci -vv | grep NVID
Code:
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2bb1 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation Device 204b
01:00.1 Audio device: NVIDIA Corporation Device 22e8 (rev a1)
    Subsystem: NVIDIA Corporation Device 0000
24:00.0 VGA compatible controller: NVIDIA Corporation Device 2bb1 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation Device 204b
24:00.1 Audio device: NVIDIA Corporation Device 22e8 (rev a1)
    Subsystem: NVIDIA Corporation Device 0000
41:00.0 VGA compatible controller: NVIDIA Corporation Device 2bb1 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation Device 204b
41:00.1 Audio device: NVIDIA Corporation Device 22e8 (rev a1)
    Subsystem: NVIDIA Corporation Device 0000
61:00.0 VGA compatible controller: NVIDIA Corporation Device 2bb1 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation Device 204b
61:00.1 Audio device: NVIDIA Corporation Device 22e8 (rev a1)
    Subsystem: NVIDIA Corporation Device 0000
81:00.0 VGA compatible controller: NVIDIA Corporation Device 2bb1 (rev a1) (prog-if 00 [VGA controller])
    Subsystem: NVIDIA Corporation Device 204b
81:00.1 Audio device: NVIDIA Corporation Device 22e8 (rev a1)
    Subsystem: NVIDIA Corporation Device 0000

lspci -nnk | grep -iA3 nvidia
Code:
    01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2bb1] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:204b]
    Kernel driver in use: vfio-pci
    Kernel modules: nvidiafb, nouveau
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22e8] (rev a1)
    Subsystem: NVIDIA Corporation Device [10de:0000]
    Kernel driver in use: vfio-pci
    Kernel modules: snd_hda_intel


iommu
Code:
root@cee93dbc-5f39-4690-bc2c-ec4d4cbf4ea2:/# dmesg | grep -i iommu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=UUID=072efd2c-22ba-443d-b869-2606bf91cea7 ro quiet quiet amd_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 vfio-pci.ids=10de:22e8,10de:2bb1 skew_tick=1 hugepagesz=1G hugepages=1495 default_hugepagesz=1G hugepagesz=1G hugepages=1495 default_hugepagesz=1G
[    1.236207] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-6.8.12-11-pve root=UUID=072efd2c-22ba-443d-b869-2606bf91cea7 ro quiet quiet amd_iommu=on iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 vfio-pci.ids=10de:22e8,10de:2bb1 skew_tick=1 hugepagesz=1G hugepages=1495 default_hugepagesz=1G hugepagesz=1G hugepages=1495 default_hugepagesz=1G
[    6.949405] iommu: Default domain type: Passthrough (set via kernel command line)
[    7.042236] pci 0000:60:00.2: AMD-Vi: IOMMU performance counters supported
[    7.051849] pci 0000:60:00.3: Adding to iommu group 0
[    7.051923] pci 0000:60:01.0: Adding to iommu group 1
[    7.051965] pci 0000:60:01.1: Adding to iommu group 2
[    7.052036] pci 0000:60:02.0: Adding to iommu group 3
[    7.052106] pci 0000:60:03.0: Adding to iommu group 4
[    7.052177] pci 0000:60:04.0: Adding to iommu group 5
[    7.052252] pci 0000:60:05.0: Adding to iommu group 6
[    7.052296] pci 0000:60:05.3: Adding to iommu group 7
[    7.052339] pci 0000:60:05.4: Adding to iommu group 8
[    7.052476] pci 0000:60:07.0: Adding to iommu group 9
[    7.052517] pci 0000:60:07.1: Adding to iommu group 9
[    7.052557] pci 0000:60:07.2: Adding to iommu group 9
...

4G support: I do not have directly in bios that setting but i guess it works (it has 64bit so i guess it has that support)
Code:
root@cee93dbc-5f39-4690-bc2c-ec4d4cbf4ea2:/# lspci -vvv | grep -i prefetchable
    Prefetchable memory behind bridge: 18c000000000-18e001ffffff [size=131104M] [32-bit]
    Prefetchable memory behind bridge: 18e002100000-18e0021fffff [size=1M] [32-bit]
    Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff [disabled] [64-bit]
    Region 0: Memory at f0000000 (32-bit, non-prefetchable) [size=64M]
    Region 1: Memory at 18c000000000 (64-bit, prefetchable) [size=128G]
    Region 3: Memory at 18e000000000 (64-bit, prefetchable) [size=32M]
    Region 0: Memory at f4080000 (32-bit, non-prefetchable) [disabled] [size=16K]
    Region 0: Memory at 18e002180000 (64-bit, prefetchable) [disabled] [size=512K]
    Region 2: Memory at 18e002100000 (64-bit, prefetchable) [disabled] [size=512K]
    Region 0: Memory at f4300000 (64-bit, non-prefetchable) [size=1M]
    Region 2: Memory at f4200000 (32-bit, non-prefetchable) [size=1M]
    Region 5: Memory at f4400000 (32-bit, non-prefetchable) [size=8K]
    Region 5: Memory at f4501000 (32-bit, non-prefetchable) [size=2K]

My special command for checking link state of each GPU via riser and its speed and PCI-E gen
Code:
lspci | grep VGA | awk '{print $1}' | while read id; do info=$(lspci -vvv -s "$id"); lnkcap=$(echo "$info" | grep -m1 "LnkCap:" | grep -oP 'Speed \K[0-9]+'); capw=$(echo "$info" | grep -m1 "LnkCap:" | grep -oP 'Width x\K[0-9]+'); lnksta=$(echo "$info" | grep -m1 "LnkSta:" | grep -oP 'Speed \K[0-9]+'); staw=$(echo "$info" | grep -m1 "LnkSta:" | grep -oP 'Width x\K[0-9]+'); case $lnksta in 2) gen="PCIe 1.0" ;; 5) gen="PCIe 2.0" ;; 8) gen="PCIe 3.0" ;; 16) gen="PCIe 4.0" ;; 32) gen="PCIe 5.0" ;; 64) gen="PCIe 6.0" ;; *) gen="" ;; esac; [ -z "$gen" ] || [ -z "$lnksta" ] || [ -z "$lnkcap" ] && continue; color="\033[32m"; [ "$lnksta" -lt "$lnkcap" ] || [ "$staw" -lt "$capw" ] && color="\033[31m"; echo -e "\033[36mBUS ID: $id\033[0m"; echo -e "${color}$gen (${lnksta} GT/s)\033[0m"; echo -e "LnkSta/LnkCap: ${color}Speed ${lnksta}GT/s, Width x${staw} / Speed ${lnkcap}GT/s, Width x${capw}\033[0m"; echo ""; done
result:
BUS ID: 01:00.0
PCIe 5.0 (32 GT/s)
LnkSta/LnkCap: Speed 32GT/s, Width x16 / Speed 32GT/s, Width x16
BUS ID: 24:00.0
PCIe 5.0 (32 GT/s)
LnkSta/LnkCap: Speed 32GT/s, Width x16 / Speed 32GT/s, Width x16
BUS ID: 41:00.0
PCIe 5.0 (32 GT/s)
LnkSta/LnkCap: Speed 32GT/s, Width x16 / Speed 32GT/s, Width x16
BUS ID: 61:00.0
PCIe 5.0 (32 GT/s)
LnkSta/LnkCap: Speed 32GT/s, Width x16 / Speed 32GT/s, Width x16
BUS ID: 81:00.0
PCIe 5.0 (32 GT/s)
LnkSta/LnkCap: Speed 32GT/s, Width x16 / Speed 32GT/s, Width x16


After this issue happens, i no longer can read device info:
CleanShot 2025-07-13 at 10.35.12@2x.png
 
Last edited:
This happens usually within 1-48 hours after shutdown of guest VM and both Linux and Windows VM.
When i was testing it, i could not get that result but users are doing something or there is some exception going on here.
And when that happens, i am unable to reset that GPU and when i unbind vfio-pci, remove pci device and rescan, it does not show up. Only reboot solves that issue.
 
Last edited:
This is gonna seam odd to you since the following utility is supposedly for the 5060 and 5060 TI but using windows try to run this nvidia utility on your card and see if it has firmware up to date. The utility is for the 5000 Series. I had to update my 5070 TI for it to start to work correctly. Hopefully it works for you.

PS: I dunno if this is strictly necessary or not, did not test it but I presume yes at least to get proxmox cli back. I also have the nvidia drivers installed in proxmox so that the card can initialise and idle correctly. I then using a hookscript unload them in the pre-start phase so vfio can do its thing and unbind vfio and reload the nvidia drivers on the post-stop phase so I get proxmox cli back.
With so much cards I don't think you are after this but I would still update the firmware and install the nvidia drivers on proxmox.
I also blacklist both nouveau and nvidiafb kernel modules that I see you have in use.
 
Last edited:
This is gonna seam odd to you since the following utility is supposedly for the 5060 and 5060 TI but using windows try to run this nvidia utility on your card and see if it has firmware up to date. The utility is for the 5000 Series. I had to update my 5070 TI for it to start to work correctly. Hopefully it works for you.

PS: I dunno if this is strictly necessary or not, did not test it but I presume yes at least to get proxmox cli back. I also have the nvidia drivers installed in proxmox so that the card can initialise and idle correctly. I then using a hookscript unload them in the pre-start phase so vfio can do its thing and unbind vfio and reload the nvidia drivers on the post-stop phase so I get proxmox cli back.
With so much cards I don't think you are after this but I would still update the firmware and install the nvidia drivers on proxmox.
I also blacklist both nouveau and nvidiafb kernel modules that I see you have in use.
There is no nvidia drivers on proxmox as i am using passthrough. No drivers at all and all are blacklisted so that only vfio binds to those GPUs.
Well RTX6000 do not have newer bios as they were just released.
And for 5090 i will take a look but i dont think that is the issue as all GPUs have this issue and i have few brands and two different models (all blackwells).
 
Last edited:
There is no nvidia drivers on proxmox as i am using passthrough. No drivers at all and all are blacklisted so that only vfio binds to those GPUs.
Well RTX6000 do not have newer bios as they were just released.
And for 5090 i will take a look but i dont think that is the issue as all GPUs have this issue and i have few brands and two different models (all blackwells).
Install the nvidia driver, let it initialise the gpu(s), then unbind the driver for the card you want to use and start the vm. Proxmox will bind vfio for you. What do you have to lose ?
I had your problem in the past and no more. The nvidia firmware is literally a requirement or the card(s) in my case when turning off the VM, crashed my host so hard it rebooted by itself after like 20 seconds. The firmware made both the 5060 TI (I tried two models, MSI and Gigabyte) and 5070 TI fine with passthrough without shenanigans and I've read somewhere, probably on the level1 forums, here, fixed a 5080. That's when I learnt about the updater not being only for the 5060 and 5060 TI.

I'm not saying this is a definitive fix for your case, but seams similar to what I've experienced with the 5000 series.
 
Last edited:
Install the nvidia driver, let it initialise the gpu(s), then unbind the driver for the card you want to use and start the vm. Proxmox will bind vfio for you. What do you have to lose ?
I had your problem in the past and no more. The nvidia firmware is literally a requirement or the card(s) in my case when turning off the VM, crashed my host so hard it rebooted by itself after like 20 seconds. The firmware made both the 5060 TI (I tried two models, MSI and Gigabyte) and 5070 TI fine with passthrough without shenanigans and I've read somewhere, probably on the level1 forums, here, fixed a 5080. That's when I learnt about the updater not being only for the 5060 and 5060 TI.

I'm not saying this is a definitive fix for your case, but seams similar to what I've experienced with the 5000 series.
You had exactly CPU lockup in your case exactly ? What was happening in your case can you tell me more ?
EDIT: sure will take a look tmr. I am wondering if i can do that in windows VM xD - will see.
 
Last edited:
You had exactly CPU lockup in your case exactly ? What was happening in your case can you tell me more ?
I had your "Unable to change power state from D3cold to D0, device inaccessible" when I was lucky and a "automatic" hard motherboard reset most of the times. To make it bearable, I had to unload the nvidia driver inside the VM before shutdown. All fixed after the firmware update. My previous adventure.
 
@adolfotregosa Still, do you think loading nvidia driver in proxmox still makes sense / is it needed ?
I am wondering ... Actually if i am correct. Without loading nvidia, i think GPUs are taking 100watts instead of 10w when idle. Need to double check it.
 
@adolfotregosa Still, do you think loading nvidia driver in proxmox still makes sense / is it needed ?
I am wondering ... Actually if i am correct. Without loading nvidia, i think GPUs are taking 100watts instead of 10w when idle. Need to double check it.
I'm positive it is needed if you want to control the idle power when the gpu is not in usage by any VM, but 10W idle ? :D on a 5090 lol ? Nope :D Should be around 30W. A bit less than 10W is what the 5060 TI used, the 5070 TI about 18W. The 5090 should be around 30W idle. So yes, loading the nvidia driver when the gpus are not in usage is a requirement !!

Please do share you results!
 
Last edited:
  • Like
Reactions: tytanick
@adolfotregosa well i had some nice progress already !
Before that firmware upgrade, when i tried to passthrough more than one GPU (so 2 gpus) to linux VM using seabios then i had 100% failure resulting in d0 d3 error + CPU soft lockup. ! Now i flashed those two and i tried seabios and it works, it boots, it shows 2 GPUs under nvidia-smi and no crash !
I think that might work after all ! Will upload to all other blackwells and wait for them to crash by my clients as i cant crash it myself xD
 
@adolfotregosa well i had some nice progress already !
Before that firmware upgrade, when i tried to passthrough more than one GPU (so 2 gpus) to linux VM using seabios then i had 100% failure resulting in d0 d3 error + CPU soft lockup. ! Now i flashed those two and i tried seabios and it works, it boots, it shows 2 GPUs under nvidia-smi and no crash !
I think that might work after all ! Will upload to all other blackwells and wait for them to crash by my clients as i cant crash it myself xD
I would personally use OVMF + Q35. Rombar + pci-express and cpu set to host. These have been my miracle settings so far.

EDIT: Lol, I did tell you it would seem odd, didn’t I? You never know until you try it!
 
Last edited:
OH !! ok ok :D Glad I could help.
Where did you read that binding nvidia driver when GPU is released and unbinding it when it is going to be passthrough is requirement ?

I thought this was needed for older GPUs. And that the resetting script wont work on newer cards. Can you send me some link to this ?

I have flashed some servers and will be testing from now if this will happen again.
What motherboard do you have ? Is it by any chance asrock GENOA2D24G-2L+ ?
I will wait 3-4 more days to see if i will have crash or not.

Did you also have CPU soft crash in dmesg when you were passing 2 or more GPUs to one VM on seabios ?
 
Where did you read that binding nvidia driver when GPU is released and unbinding it when it is going to be passthrough is requirement ?

I thought this was needed for older GPUs. And that the resetting script wont work on newer cards. Can you send me some link to this ?

I have flashed some servers and will be testing from now if this will happen again.
What motherboard do you have ? Is it by any chance asrock GENOA2D24G-2L+ ?
I will wait 3-4 more days to see if i will have crash or not.

Did you also have CPU soft crash in dmesg when you were passing 2 or more GPUs to one VM on seabios ?
Oh man, nothing like that. This is just a hobby for me—100% consumer gear, single GPU setup. I wanted an all-in-one solution: a router, a NAS, and “unlimited” OS flexibility, so I can switch between them at will. My GPU is mainly for actual image output for my "Desktop VM" of the moment.

As for your question:
"Where did you read that binding the NVIDIA driver when the GPU is released, and unbinding it when it's going to be passed through, is a requirement?"
If you don't unbind the NVIDIA driver, how can Proxmox automatically bind the vfio-pci driver when starting the VM for you?
My reason for having the NVIDIA driver on proxmox is to achieve low idle power consumption if no vm is using the gpu or when I shutdown my "Desktop VM", to save some watts. It's really not that much but hey, if I can save something, I'll take it and of course to be able to recover the Proxmox CLI when I shut down any VM that uses the GPU, I load again the nvidia driver.

EDIT: I dunno, I like the fact that the card is initialised properly. The newer AMD gpus like the RX 9070 XT from what I read, it is a requirement that the amdgpu driver initialises properly the gpu before you unbind it so vfio-pci can take over. If this is not done you will not get an image it seams.
 
Last edited:
@adolfotregosa oh i get it, i thought it was requirement to get it stable. Fow low idle yeah that os ok. I will be looking into switching between nvidia and vfio but honestly first thing that came to my mind is to fire up vm with nvidia driver istead of messing in proxmox.
Do you have some ready solition that unbinds nvidia and binds vfio automatically when new vm need some gpus ?
I will look into making some scripts but maybe we can use the reset script that was needed to reset gpu state after vm shutdown for older gpus.
 
@adolfotregosa oh i get it, i thought it was requirement to get it stable. Fow low idle yeah that os ok. I will be looking into switching between nvidia and vfio but honestly first thing that came to my mind is to fire up vm with nvidia driver istead of messing in proxmox.
Do you have some ready solition that unbinds nvidia and binds vfio automatically when new vm need some gpus ?
I will look into making some scripts but maybe we can use the reset script that was needed to reset gpu state after vm shutdown for older gpus.
Have a look at the attachment:

My notes tell me:

mkdir /var/lib/vz/snippets
nano /var/lib/vz/snippets/200.pl
chmod +x /var/lib/vz/snippets/200.pl

Then as example:
qm set 200 --hookscript local:snippets/200.pl

or if you don't need it anymore:
qm set 200 --delete hookscript

or edit the 200.conf and remove or add the hookscript line.
Example:
nano /etc/pve/qemu-server/200.conf
I would then add/remove:
hookscript: local:snippets/200.pl

On your case I think you would try to:
echo '0000:01:00.0' > /sys/bus/pci/drivers/nvidia/unbind
or
echo '0000:01:00.0' > /sys/bus/pci/drivers/nvidia/bind

since you have more than one card instead of rmmod the driver. Replace 0000:01:00.0 with the correct one.

Keeping the VM running is also a option if every bit of power consumption is not a concern. It will be a lot better idle power wise than just leaving the card without driver in proxmox unused.
 

Attachments

Last edited: