[SOLVED] PCI Passthrough NVME: Unable to change power state

Bene124

Member
Jun 26, 2023
13
1
8
Hey guys,

I have a 3 node proxmox cluster consisting of 3 Minisforum MS-01 with 3x 2TB Samsung nvme SSD 990 EVO Plus in each node. For Performance Testing I setup a OKD Cluster with 3 Worker Node on each proxmox node and direct attaching one Samsung SSD per VM.

This worked on 2 VMs without any problems. The third VM can not attach any of the 3 Samsung SSDs, for testing purposes I attached both nvme SSD one as RAW device and one with a mapping.

1746798026247.png

When trying to launch this VM on the proxmox host I get following dmesg log:

Code:
...
[  562.662538] vfio-pci 0000:59:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[  562.872420] vfio-pci 0000:5a:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[  563.367593] tap105i0: entered promiscuous mode
[  563.399735] OCPVnet: port 2(fwpr105p0) entered blocking state
[  563.399739] OCPVnet: port 2(fwpr105p0) entered disabled state
...

The two other Vms on separate Proxmox Hosts are running without any problems. What I checked already:

* All SSDs have same samsung firmware versions
* All Proxmox hosts running identical versions
* All Vms Running identical image
* All Hosts have same BIOS Firmware (Secure Boot disabled)
* All Hosts have powermanagement mobile S0 only set in BIOS
* All Hosts have ASPM Disabled in BIOS
* The affected proxmox host ran with/without pcie_aspm=off and with/without intel_iommu=on without any changes

What is going on here?
 
Ok I can not find out what exactly the difference on these Samsung SSDs actually is. For anybody else: I bought a Sandisk SSD, exchanged it and it works... Problem solved...
 
  • Like
Reactions: leesteken
Ok I can not find out what exactly the difference on these Samsung SSDs actually is. For anybody else: I bought a Sandisk SSD, exchanged it and it works... Problem solved...
Thank you for reporting back on this and marking the thread a solved. Can you tell us the model/type of the Samsung drive, which apparently does not reset properly?
 
I am having a similar problem passing though an NVME, and to no specific surprise, it is the same Brand/Model as OP. I can use the drive on the host NP, but it fails to pass through to any VM, even VMs that have another NVME already passed through and working fine.

So, there seems to be some compatibility problems with this specific make/model of NVME. At least for some units.


The specific unit in my case:
Samsung 990 EVO Plus (4TB)
144d:a80d