Hey guys,
I have a 3 node proxmox cluster consisting of 3 Minisforum MS-01 with 3x 2TB Samsung nvme SSD 990 EVO Plus in each node. For Performance Testing I setup a OKD Cluster with 3 Worker Node on each proxmox node and direct attaching one Samsung SSD per VM.
This worked on 2 VMs without any problems. The third VM can not attach any of the 3 Samsung SSDs, for testing purposes I attached both nvme SSD one as RAW device and one with a mapping.

When trying to launch this VM on the proxmox host I get following dmesg log:
The two other Vms on separate Proxmox Hosts are running without any problems. What I checked already:
* All SSDs have same samsung firmware versions
* All Proxmox hosts running identical versions
* All Vms Running identical image
* All Hosts have same BIOS Firmware (Secure Boot disabled)
* All Hosts have powermanagement mobile S0 only set in BIOS
* All Hosts have ASPM Disabled in BIOS
* The affected proxmox host ran with/without pcie_aspm=off and with/without intel_iommu=on without any changes
What is going on here?
I have a 3 node proxmox cluster consisting of 3 Minisforum MS-01 with 3x 2TB Samsung nvme SSD 990 EVO Plus in each node. For Performance Testing I setup a OKD Cluster with 3 Worker Node on each proxmox node and direct attaching one Samsung SSD per VM.
This worked on 2 VMs without any problems. The third VM can not attach any of the 3 Samsung SSDs, for testing purposes I attached both nvme SSD one as RAW device and one with a mapping.

When trying to launch this VM on the proxmox host I get following dmesg log:
Code:
...
[ 562.662538] vfio-pci 0000:59:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[ 562.872420] vfio-pci 0000:5a:00.0: Unable to change power state from D0 to D3hot, device inaccessible
[ 563.367593] tap105i0: entered promiscuous mode
[ 563.399735] OCPVnet: port 2(fwpr105p0) entered blocking state
[ 563.399739] OCPVnet: port 2(fwpr105p0) entered disabled state
...
The two other Vms on separate Proxmox Hosts are running without any problems. What I checked already:
* All SSDs have same samsung firmware versions
* All Proxmox hosts running identical versions
* All Vms Running identical image
* All Hosts have same BIOS Firmware (Secure Boot disabled)
* All Hosts have powermanagement mobile S0 only set in BIOS
* All Hosts have ASPM Disabled in BIOS
* The affected proxmox host ran with/without pcie_aspm=off and with/without intel_iommu=on without any changes
What is going on here?