I hope it is ok to chime in with a problem I am currently experiencing with Kioxia enterprise NVMe drives in an AMD AM5 B650 system (Asrock B650D4U) with the newest kernel on Proxmox 9
root@pve0:~# uname --all
Linux pve0 6.14.11-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.14.11-2 (2025-09-12T09:46Z) x86_64 GNU/Linux
After only a few hours (sometimes just minutes) of use, one or both of the drives in the ZFS mirrors are reported as "controller is down; will reset":
Sep 26 03:24:49 pve0 kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Sep 26 03:24:49 pve0 kernel: nvme nvme1: Does your device have a faulty power saving mode enabled?
Sep 26 03:24:49 pve0 kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Sep 26 03:24:49 pve0 kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
Sep 26 03:24:49 pve0 kernel: nvme nvme2: Does your device have a faulty power saving mode enabled?
Sep 26 03:24:49 pve0 kernel: nvme nvme2: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off" and report a bug
Sep 26 03:24:49 pve0 kernel: nvme 0000:04:00.0: enabling device (0000 -> 0002)
Sep 26 03:24:49 pve0 kernel: nvme 0000:07:00.0: enabling device (0000 -> 0002)
Sep 26 03:24:49 pve0 kernel: nvme nvme1: Disabling device after reset failure: -19
Sep 26 03:24:49 pve0 kernel: nvme nvme2: Disabling device after reset failure: -19
After a hard power cycle, the drives show up fine until the next error. The drives affected are these:
/dev/nvme1n1 /dev/ng1n1 22P0A0xxxxxx KCD61VUL3T20 0x1 254.86 GB / 3.20 TB 512 B + 0 B 0106
/dev/nvme2n1 /dev/ng2n1 22P0A0xxxxxx KCD61VUL3T20 0x1 252.50 GB / 3.20 TB 512 B + 0 B 0106
I have followed all the recommended countermeasures - to no avail:
cat /etc/kernel/cmdline
root=ZFS=rpool/ROOT/pve-1 boot=zfs module_blacklist=amdgpu nvme_core.default_ps_max_latency_us=0 pcie_aspm=off pcie_port_pm=off
The boot ZFS mirror is not affected (pair of Intel Optane drives). Since one of them is attached to the same type of cable.
Is there anything I can do? "Get new" drives is not an option that I am very happy with and I don't understand why the same type of drive is working fine in two other systems on the same CPU but a different motherboard (Asrock DeskMeet X600).