NVMe Issue: Unable to change power state from D3cold to D0, device inaccessible

in my opinion there is bug in kernel latest than 6.8.4-2

i cant start Win11 VM with NVME pass

Code:
Jun 22 11:37:51 proxmox kernel: vfio-pci 0000:02:00.0: timed out waiting for pending transaction; performing function level reset anyway
Jun 22 11:37:51 proxmox kernel: vfio-pci 0000:02:00.0: Unable to change power state from D0 to D3hot, device inaccessible

maybe someone could help ?
 
Did anyone get to the bottom of this?

I have experienced in both kernel
6.5.13-5-pve
6.8.8-2-pve

I thought this was due to be undervolting my cpu (and it could be) but I'm pretty sure I have reset settings and still got this. Currently I have the voltage applied and no amount of rebooting appears to kick the error out. I just reset my bios.

Other times I can reboot reboot reboot and continue to get it.

It only appears to have started following an upgrade. I have not upgraded in 4 months ish
 
Last edited:
Did anyone get to the bottom of this?

I have experienced in both kernel
6.5.13-5-pve
6.8.8-2-pve

I thought this was due to be undervolting my cpu (and it could be) but I'm pretty sure I have reset settings and still got this. Currently I have the voltage applied and no amount of rebooting appears to kick the error out. I just reset my bios.

Other times I can reboot reboot reboot and continue to get it.

It only appears to have started following an upgrade. I have not upgraded in 4 months ish
Code:
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0b.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0a.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:09.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:08.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:01:46 pve kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 28 17:01:46 pve kernel: nvme nvme2: Does your device have a faulty power saving mode enabled?
Jul 28 17:01:46 pve kernel: nvme nvme2: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 28 17:01:46 pve kernel: nvme 0000:09:00.0: Unable to change power state from D3cold to D0, device inaccessible

I have the same problem, I have 4 intel P4510 4T on my PCIE switch card, They will suddenly lose power shortly after being turned on. I have config "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" in /etc/default/grub, It still doesn't work.

I have tried 6.8.8-3-pve and 6.5.13-5-pve, my filesystem is zfs with raidz1
 
Same here:

Setup:
B650D4U
Ryzen 9750X3D
Memory 128GB
2* Samsung 990 Pro (with Heatsink)

Lost NVME after 24 hours. I think I have done an upgrade to 6.8.8-3-pve before.
Changed nVME 3 Times.

Tried everything in this thread. Any other Ideas.
 
Code:
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0b.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0a.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:09.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:08.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:01:46 pve kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 28 17:01:46 pve kernel: nvme nvme2: Does your device have a faulty power saving mode enabled?
Jul 28 17:01:46 pve kernel: nvme nvme2: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 28 17:01:46 pve kernel: nvme 0000:09:00.0: Unable to change power state from D3cold to D0, device inaccessible

I have the same problem, I have 4 intel P4510 4T on my PCIE switch card, They will suddenly lose power shortly after being turned on. I have config "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" in /etc/default/grub, It still doesn't work.

I have tried 6.8.8-3-pve and 6.5.13-5-pve, my filesystem is zfs with raidz1
So far I think this is due to a power supply issue as I noticed all my drive power lights were off before the error. So I swapped out the diy power cable that link to Hard Drive Bay back to the original one. Yes, no problem has occurred yet. I guess this is also why developers pay so little attention to this, syslog isn't really the problem
 
I was sick of weird behavior, so I have dramatically had to change my setup. Got rid of my hyper card; combined drives, got rid of my ZFS Slog. No problem on the new kernel, yet - Such a pain
 
Same here:

Setup:
B650D4U
Ryzen 9750X3D
Memory 128GB
2* Samsung 990 Pro (with Heatsink)

Lost NVME after 24 hours. I think I have done an upgrade to 6.8.8-3-pve before.
Changed nVME 3 Times.

Tried everything in this thread. Any other Ideas.
My Setup has 2* Samsung 990 Pro (with Heatsink) M.2 directly on the board. (B650D4U) No PCIE. So it could not be a cable Problem. Maybe I try to change the PSU, if there is a problem with any cable.

Also tried to downgrade Kernel and BIOS etc. No luck.
 
My Setup has 2* Samsung 990 Pro (with Heatsink) M.2 directly on the board. (B650D4U) No PCIE. So it could not be a cable Problem. Maybe I try to change the PSU, if there is a problem with any cable.

Also tried to downgrade Kernel and BIOS etc. No luck.
what is your error log? did you try
Code:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
at first?
 
I was sick of weird behavior, so I have dramatically had to change my setup. Got rid of my hyper card; combined drives, got rid of my ZFS Slog. No problem on the new kernel, yet - Such a pain
I feel the same way. I spent two months working on these painful hardware problems. Why is human technology so backward?
 
what is your error log? did you try
Code:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
at first?
Hi.
Following Log(s) - Two different nvme(s), same slot:
Jul 26 09:31:38 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 26 09:31:38 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jul 26 09:31:38 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 26 09:31:38 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 26 09:31:38 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19

Jul 27 13:09:24 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 27 13:09:24 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jul 27 13:09:24 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 27 13:09:24 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 27 13:09:24 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19

Yes, tried nvme_core.default_ps_max_latency_us=0 pcie_aspm=off first.
 
Last edited:
My Setup has 2* Samsung 990 Pro (with Heatsink) M.2 directly on the board. (B650D4U) No PCIE. So it could not be a cable Problem. Maybe I try to change the PSU, if there is a problem with any cable.

Also tried to downgrade Kernel and BIOS etc. No luck.
A couple of things I tried with varied success - Looks like I was experiencing multiple issues though

BIOS reset
pcie powersaving bios disable (aspm?)
kernel pinning ( thought adding as manual was pinning - It isn't)
disable any power saving, powertop etc
990 firmware update - I had to install windows to check this
 
Aaaand I'm back to
Code:
kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.

On the latest kernel
 
Hi. Following Log(s) - Two different nvme(s), same slot: Jul 26 09:31:38 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff Jul 26 09:31:38 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? Jul 26 09:31:38 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Jul 26 09:31:38 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible Jul 26 09:31:38 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19 Jul 27 13:09:24 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff Jul 27 13:09:24 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? Jul 27 13:09:24 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Jul 27 13:09:24 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible Jul 27 13:09:24 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19 Yes, tried nvme_core.default_ps_max_latency_us=0 pcie_aspm=off first.

I think you may have missed some logs or operation before this, as I said, these logs are just a result, the device was accidentally disconnected before this
 
Honestly I don't know what is going on. Mix of errors

I was unable to boot at one point with my boot nvme in the hyper card, no matter the kernel. (IO Error or something)
 
Sadly, painful mistakes have appeared again:(
So far, I have tried the following methods:
1. `nvme_core.default_ps_max_latency_us=0 pcie_aspm=off`
2. enable vmd support in bios
3. disable aspm by bios
4. change pcie cable
5 .update pve kernel

They will work occasionally, but they will fail after the next restart, so I am constantly confused.
I found more reports in various applications, such as openzfs: https://github.com/openzfs/zfs/discussions/14793 , or other linux kernel
I have no idea now. Now i will try `pcie_port_pm=off pcie_aspm=off` on grub, no error so far.

Looking forward to the help of professionals
 
Last edited:
Sadly, painful mistakes have appeared again:(
So far, I have tried the following methods:
1. `nvme_core.default_ps_max_latency_us=0 pcie_aspm=off`
2. enable vmd support in bios
3. disable aspm by bios
4. change pcie cable
5 .update pve kernel

They will work occasionally, but they will fail after the next restart, so I am constantly confused.
I found more reports in various applications, such as openzfs: https://github.com/openzfs/zfs/discussions/14793 , or other linux kernel
I have no idea now. Now i will try `pcie_port_pm=off pcie_aspm=off` on grub, no error so far.

Looking forward to the help of professionals
Hi have a B650D4U with the latest BIOS and BMC. I downgraded to the previous version and no fail/error so far.
Maybe it helps you do investigate the error.