[SOLVED] PCI Passthrough NVME: Unable to change power state

Bene124

Member
Jun 26, 2023
13
1
8
Hey guys,

I have a 3 node proxmox cluster consisting of 3 Minisforum MS-01 with 3x 2TB Samsung nvme SSD 990 EVO Plus in each node. For Performance Testing I setup a OKD Cluster with 3 Worker Node on each proxmox node and direct attaching one Samsung SSD per VM.

This worked on 2 VMs without any problems. The third VM can not attach any of the 3 Samsung SSDs, for testing purposes I attached both nvme SSD one as RAW device and one with a mapping.

1746798026247.png

When trying to launch this VM on the proxmox host I get following dmesg log:

Code:
...
[  562.662538] vfio-pci 0000:59:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[  562.872420] vfio-pci 0000:5a:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[  563.367593] tap105i0: entered promiscuous mode
[  563.399735] OCPVnet: port 2(fwpr105p0) entered blocking state
[  563.399739] OCPVnet: port 2(fwpr105p0) entered disabled state
...

The two other Vms on separate Proxmox Hosts are running without any problems. What I checked already:

* All SSDs have same samsung firmware versions
* All Proxmox hosts running identical versions
* All Vms Running identical image
* All Hosts have same BIOS Firmware (Secure Boot disabled)
* All Hosts have powermanagement mobile S0 only set in BIOS
* All Hosts have ASPM Disabled in BIOS
* The affected proxmox host ran with/without pcie_aspm=off and with/without intel_iommu=on without any changes

What is going on here?
 
Ok I can not find out what exactly the difference on these Samsung SSDs actually is. For anybody else: I bought a Sandisk SSD, exchanged it and it works... Problem solved...
 
  • Like
Reactions: leesteken
Ok I can not find out what exactly the difference on these Samsung SSDs actually is. For anybody else: I bought a Sandisk SSD, exchanged it and it works... Problem solved...
Thank you for reporting back on this and marking the thread a solved. Can you tell us the model/type of the Samsung drive, which apparently does not reset properly?
 
I am having a similar problem passing though an NVME, and to no specific surprise, it is the same Brand/Model as OP. I can use the drive on the host NP, but it fails to pass through to any VM, even VMs that have another NVME already passed through and working fine.

So, there seems to be some compatibility problems with this specific make/model of NVME. At least for some units.


The specific unit in my case:
Samsung 990 EVO Plus (4TB)
144d:a80d
 
Another datapoint from me here - the Samsung 990 EVO Plus definitely doesn't work in passthrough, even on a server platform:

Code:
[ 1293.609288] vfio-pci 0000:46:00.0: resetting
[ 1293.671772] vfio-pci 0000:46:00.0: reset done
[ 1293.673440] vfio-pci 0000:46:00.0: Unable to change power state from D0 to D3hot, device inaccessible
<....>
[ 1295.564307] vfio-pci 0000:46:00.0: resetting
[ 1295.564825] vfio-pci 0000:46:00.0: reset done
[ 1295.641873] vfio-pci 0000:46:00.0: resetting
[ 1295.645018] vfio-pci 0000:46:00.0: reset done
[ 1299.765847] {3}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 514
[ 1299.765892] {3}[Hardware Error]: It has been corrected by h/w and requires no further action
[ 1299.765921] {3}[Hardware Error]: event severity: corrected
[ 1299.765942] {3}[Hardware Error]:  Error 0, type: corrected
[ 1299.765962] {3}[Hardware Error]:  fru_text: PcieError
[ 1299.765982] {3}[Hardware Error]:   section_type: PCIe error
[ 1299.766002] {3}[Hardware Error]:   port_type: 0, PCIe end point
[ 1299.766023] {3}[Hardware Error]:   version: 0.2
[ 1299.766041] {3}[Hardware Error]:   command: 0x0000, status: 0x0011
[ 1299.766064] {3}[Hardware Error]:   device_id: 0000:46:00.0
[ 1299.766085] {3}[Hardware Error]:   slot: 0
[ 1299.766101] {3}[Hardware Error]:   secondary_bus: 0x00
[ 1299.766120] {3}[Hardware Error]:   vendor_id: 0x144d, device_id: 0xa80d
[ 1299.766143] {3}[Hardware Error]:   class_code: 010802
[ 1299.766734] {3}[Hardware Error]:   bridge: secondary_status: 0x0000, control: 0x0000
[ 1299.767326] {3}[Hardware Error]:   aer_cor_status: 0x00002000, aer_cor_mask: 0x00001000
[ 1299.767902] {3}[Hardware Error]:   aer_uncor_status: 0x00100000, aer_uncor_mask: 0x00100000
[ 1299.768479] {3}[Hardware Error]:   aer_uncor_severity: 0x004f6030
[ 1299.769044] {3}[Hardware Error]:   TLP Header: 00000000 00000000 00000000 00000000
[ 1299.778165] vfio-pci 0000:46:00.0: AER: aer_status: 0x00002000, aer_mask: 0x00001000
[ 1299.778767] vfio-pci 0000:46:00.0:    [13] NonFatalErr
[ 1299.779359] vfio-pci 0000:46:00.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID

The 990 EVO Plus fails in both PCIe v4 and v5 mode, which I tested in both native and bifurcated slots. There are zero issues with 970 Pro, 980 Pro, or 990 Pro though (some, depending on the firmware, require PCIe hot-plug being disabled).
 
Finally found this thread via google, have been pulling my hair out here.

I have 4x 990 EVO Plus 4TB and having this exact issue, no matter what I do the devices will not passthrough to any VM I have tried.

Why is this marked "solved"?
 
BEHOLD ye weary traveler, whoever finds this thread in a future pit of despair. A confirmed fix exists for Samsung 990 Evo Plus 4TB!

echo -e 'options vfio-pci ids=144d:a80d disable_idle_d3=1\nsoftdep nvme pre: vfio-pci' > /etc/modprobe.d/vfio.conf


disable_idle_d3 fixed the issue! @leesteken
 
To elaborate on the above, I was getting stuck because on my

asus Pro WS W880-ACE SE the 4x nvme drives would show up fine in proxmox but refused to pcie/direct passthrough to TrueNAS VM.

I was just about to return the drives when I found this thread.

Before:

- TrueNAS vm shell would show the controllers
- ls dev would NOT contain nvme (only the fabrics one)
- adding disable_idle_d3 was _the_ fix and now I have 8x nvme entries in /dev (2 per drive)
- TrueNAS automatically sees the drives now/they are usable as normal (so far... will update if I run into future issues)

Thanks to all and especially @drunk.bass
 
Why is this marked "solved"?
OP marked it as they said they replaced the SSD with a Sandisk one ;)

disable_idle_d3 fixed the issue! @leesteken
I will caution you before using "disable_idle_d3" - it is a bit of a band-aid that may cause other issues as well. Some devices reasonably expect d3 support, especially if it was advertised/probed as supported earlier during UEFI init or boot. The vfio-pci driver normally loads quite late; nvme driver which enabled D3 is nowadays usually loaded during initramfs boot stage.

- TrueNAS automatically sees the drives now/they are usable as normal (so far... will update if I run into future issues)
Most likely it will continue working for SSDs. However, looking at what "disable_idle_d3" actually does, it doesn't seem like a good idea. It disabled enqueuing for power management for any devices bound to vfio-pci driver; it's a bit of a nuclear option as it ignores calling the main PM entrypoint in the kernel. This will cause not just SSDs but every single device bound to vfio-pci to be prevented from entering lower power states, including GPUs. In some [rare] cases this may even cause crashes and triggering of over-temperature protection, while for sure using a ton of power for no reason ;)
In addition, the fact that kernel doesn't request D3 transfer, the drive may enter D3 on its own which some do. I didn't test that for 990 EVO Plus, but earlier Samsung SSDs do that to work around a broken ASPM on some Windows laptops.

Disabling D3 for all devices is a bit like disabling ASPM for the whole system (pcie_aspm=off): it's a last-resort before firmware is fixed, and consequences of such action should be understood before applied blindly.
 
  • Like
Reactions: leesteken