PCI Passthrough NVME: Unable to change power state

Bene124

New Member
Jun 26, 2023
12
0
1
Hey guys,

I have a 3 node proxmox cluster consisting of 3 Minisforum MS-01 with 3x 2TB Samsung nvme SSD 990 EVO Plus in each node. For Performance Testing I setup a OKD Cluster with 3 Worker Node on each proxmox node and direct attaching one Samsung SSD per VM.

This worked on 2 VMs without any problems. The third VM can not attach any of the 3 Samsung SSDs, for testing purposes I attached both nvme SSD one as RAW device and one with a mapping.

1746798026247.png

When trying to launch this VM on the proxmox host I get following dmesg log:

Code:
...
[  562.662538] vfio-pci 0000:59:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[  562.872420] vfio-pci 0000:5a:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[  563.367593] tap105i0: entered promiscuous mode
[  563.399735] OCPVnet: port 2(fwpr105p0) entered blocking state
[  563.399739] OCPVnet: port 2(fwpr105p0) entered disabled state
...

The two other Vms on separate Proxmox Hosts are running without any problems. What I checked already:

* All SSDs have same samsung firmware versions
* All Proxmox hosts running identical versions
* All Vms Running identical image
* All Hosts have same BIOS Firmware (Secure Boot disabled)
* All Hosts have powermanagement mobile S0 only set in BIOS
* All Hosts have ASPM Disabled in BIOS
* The affected proxmox host ran with/without pcie_aspm=off and with/without intel_iommu=on without any changes

What is going on here?