NVMe Issue: Unable to change power state from D3cold to D0, device inaccessible

in my opinion there is bug in kernel latest than 6.8.4-2

i cant start Win11 VM with NVME pass

Code:
Jun 22 11:37:51 proxmox kernel: vfio-pci 0000:02:00.0: timed out waiting for pending transaction; performing function level reset anyway
Jun 22 11:37:51 proxmox kernel: vfio-pci 0000:02:00.0: Unable to change power state from D0 to D3hot, device inaccessible

maybe someone could help ?
 
Did anyone get to the bottom of this?

I have experienced in both kernel
6.5.13-5-pve
6.8.8-2-pve

I thought this was due to be undervolting my cpu (and it could be) but I'm pretty sure I have reset settings and still got this. Currently I have the voltage applied and no amount of rebooting appears to kick the error out. I just reset my bios.

Other times I can reboot reboot reboot and continue to get it.

It only appears to have started following an upgrade. I have not upgraded in 4 months ish
 
Last edited:
Did anyone get to the bottom of this?

I have experienced in both kernel
6.5.13-5-pve
6.8.8-2-pve

I thought this was due to be undervolting my cpu (and it could be) but I'm pretty sure I have reset settings and still got this. Currently I have the voltage applied and no amount of rebooting appears to kick the error out. I just reset my bios.

Other times I can reboot reboot reboot and continue to get it.

It only appears to have started following an upgrade. I have not upgraded in 4 months ish
Code:
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0b.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0a.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:09.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:08.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:01:46 pve kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 28 17:01:46 pve kernel: nvme nvme2: Does your device have a faulty power saving mode enabled?
Jul 28 17:01:46 pve kernel: nvme nvme2: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 28 17:01:46 pve kernel: nvme 0000:09:00.0: Unable to change power state from D3cold to D0, device inaccessible

I have the same problem, I have 4 intel P4510 4T on my PCIE switch card, They will suddenly lose power shortly after being turned on. I have config "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" in /etc/default/grub, It still doesn't work.

I have tried 6.8.8-3-pve and 6.5.13-5-pve, my filesystem is zfs with raidz1
 
Same here:

Setup:
B650D4U
Ryzen 9750X3D
Memory 128GB
2* Samsung 990 Pro (with Heatsink)

Lost NVME after 24 hours. I think I have done an upgrade to 6.8.8-3-pve before.
Changed nVME 3 Times.

Tried everything in this thread. Any other Ideas.
 
Code:
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0b.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:0a.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:09.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:00:45 pve kernel: pcieport 0000:02:08.0: Unable to change power state from D3hot to D0, device inaccessible
Jul 28 17:01:46 pve kernel: nvme nvme2: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 28 17:01:46 pve kernel: nvme nvme2: Does your device have a faulty power saving mode enabled?
Jul 28 17:01:46 pve kernel: nvme nvme2: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 28 17:01:46 pve kernel: nvme 0000:09:00.0: Unable to change power state from D3cold to D0, device inaccessible

I have the same problem, I have 4 intel P4510 4T on my PCIE switch card, They will suddenly lose power shortly after being turned on. I have config "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" in /etc/default/grub, It still doesn't work.

I have tried 6.8.8-3-pve and 6.5.13-5-pve, my filesystem is zfs with raidz1
So far I think this is due to a power supply issue as I noticed all my drive power lights were off before the error. So I swapped out the diy power cable that link to Hard Drive Bay back to the original one. Yes, no problem has occurred yet. I guess this is also why developers pay so little attention to this, syslog isn't really the problem
 
I was sick of weird behavior, so I have dramatically had to change my setup. Got rid of my hyper card; combined drives, got rid of my ZFS Slog. No problem on the new kernel, yet - Such a pain
 
Same here:

Setup:
B650D4U
Ryzen 9750X3D
Memory 128GB
2* Samsung 990 Pro (with Heatsink)

Lost NVME after 24 hours. I think I have done an upgrade to 6.8.8-3-pve before.
Changed nVME 3 Times.

Tried everything in this thread. Any other Ideas.
My Setup has 2* Samsung 990 Pro (with Heatsink) M.2 directly on the board. (B650D4U) No PCIE. So it could not be a cable Problem. Maybe I try to change the PSU, if there is a problem with any cable.

Also tried to downgrade Kernel and BIOS etc. No luck.
 
My Setup has 2* Samsung 990 Pro (with Heatsink) M.2 directly on the board. (B650D4U) No PCIE. So it could not be a cable Problem. Maybe I try to change the PSU, if there is a problem with any cable.

Also tried to downgrade Kernel and BIOS etc. No luck.
what is your error log? did you try
Code:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
at first?
 
I was sick of weird behavior, so I have dramatically had to change my setup. Got rid of my hyper card; combined drives, got rid of my ZFS Slog. No problem on the new kernel, yet - Such a pain
I feel the same way. I spent two months working on these painful hardware problems. Why is human technology so backward?
 
what is your error log? did you try
Code:
nvme_core.default_ps_max_latency_us=0 pcie_aspm=off
at first?
Hi.
Following Log(s) - Two different nvme(s), same slot:
Jul 26 09:31:38 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 26 09:31:38 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jul 26 09:31:38 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 26 09:31:38 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 26 09:31:38 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19

Jul 27 13:09:24 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff
Jul 27 13:09:24 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
Jul 27 13:09:24 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
Jul 27 13:09:24 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible
Jul 27 13:09:24 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19

Yes, tried nvme_core.default_ps_max_latency_us=0 pcie_aspm=off first.
 
Last edited:
My Setup has 2* Samsung 990 Pro (with Heatsink) M.2 directly on the board. (B650D4U) No PCIE. So it could not be a cable Problem. Maybe I try to change the PSU, if there is a problem with any cable.

Also tried to downgrade Kernel and BIOS etc. No luck.
A couple of things I tried with varied success - Looks like I was experiencing multiple issues though

BIOS reset
pcie powersaving bios disable (aspm?)
kernel pinning ( thought adding as manual was pinning - It isn't)
disable any power saving, powertop etc
990 firmware update - I had to install windows to check this
 
Aaaand I'm back to
Code:
kvm: ../hw/pci/pci.c:1633: pci_irq_handler: Assertion `0 <= irq_num && irq_num < PCI_NUM_PINS' failed.

On the latest kernel
 
Hi. Following Log(s) - Two different nvme(s), same slot: Jul 26 09:31:38 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff Jul 26 09:31:38 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? Jul 26 09:31:38 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Jul 26 09:31:38 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible Jul 26 09:31:38 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19 Jul 27 13:09:24 proxmox kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff Jul 27 13:09:24 proxmox kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? Jul 27 13:09:24 proxmox kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Jul 27 13:09:24 proxmox kernel: nvme 0000:0a:00.0: Unable to change power state from D3cold to D0, device inaccessible Jul 27 13:09:24 proxmox kernel: nvme nvme0: Disabling device after reset failure: -19 Yes, tried nvme_core.default_ps_max_latency_us=0 pcie_aspm=off first.

I think you may have missed some logs or operation before this, as I said, these logs are just a result, the device was accidentally disconnected before this
 
Honestly I don't know what is going on. Mix of errors

I was unable to boot at one point with my boot nvme in the hyper card, no matter the kernel. (IO Error or something)
 
Sadly, painful mistakes have appeared again:(
So far, I have tried the following methods:
1. `nvme_core.default_ps_max_latency_us=0 pcie_aspm=off`
2. enable vmd support in bios
3. disable aspm by bios
4. change pcie cable
5 .update pve kernel

They will work occasionally, but they will fail after the next restart, so I am constantly confused.
I found more reports in various applications, such as openzfs: https://github.com/openzfs/zfs/discussions/14793 , or other linux kernel
I have no idea now. Now i will try `pcie_port_pm=off pcie_aspm=off` on grub, no error so far.

Looking forward to the help of professionals
 
Last edited:
Sadly, painful mistakes have appeared again:(
So far, I have tried the following methods:
1. `nvme_core.default_ps_max_latency_us=0 pcie_aspm=off`
2. enable vmd support in bios
3. disable aspm by bios
4. change pcie cable
5 .update pve kernel

They will work occasionally, but they will fail after the next restart, so I am constantly confused.
I found more reports in various applications, such as openzfs: https://github.com/openzfs/zfs/discussions/14793 , or other linux kernel
I have no idea now. Now i will try `pcie_port_pm=off pcie_aspm=off` on grub, no error so far.

Looking forward to the help of professionals
Hi have a B650D4U with the latest BIOS and BMC. I downgraded to the previous version and no fail/error so far.
Maybe it helps you do investigate the error.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!