NVME passthrough - can't change power state from D3hot to D0 (config space inaccessible)

daemonix

Member
Dec 12, 2021
32
3
13
42
Since installing a truenas core with 6 NVME passthrough (zfs pool created in truenas), everything was ok. 6+months (but no reboot)
I had to reboot the server (not just the VM) and now I cant boot the VM with the attached NVMEs.

Any ideas?


I though at some point that proxmox zfs tries to interfere on boot so i did:

Code:
systemctl disable zfs-import-cache
systemctl disable zfs-import-scan
systemctl disable zfs-mount

systemctl list-unit-files | grep zfs
zfs-import-cache.service               disabled        enabled
zfs-import-scan.service                disabled        disabled
zfs-import.service                     masked          enabled
zfs-import@.service                    disabled        enabled
zfs-load-key.service                   masked          enabled
zfs-mount.service                      disabled        enabled
zfs-scrub@.service                     static          -
zfs-share.service                      enabled         enabled
zfs-volume-wait.service                enabled         enabled
zfs-zed.service                        enabled         enabled
zfs-import.target                      enabled         enabled
zfs-volumes.target                     enabled         enabled
zfs.target                             enabled         enabled
zfs-scrub-monthly@.timer               disabled        enabled
zfs-scrub-weekly@.timer                disabled        enabled


Config:
Code:
balloon: 0
boot: order=ide0;ide2;net0
cores: 4
cpu: host
hostpci0: 0000:04:00
hostpci1: 0000:05:00
hostpci2: 0000:06:00
hostpci3: 0000:07:00
hostpci4: 0000:08:00
hostpci5: 0000:09:00
ide0: Disk8T0:164/vm-164-disk-0.qcow2,size=32G
ide2: none,media=cdrom
memory: 48000
meta: creation-qemu=7.0.0,ctime=1689955542
name: ucltruenas
net0: virtio=76:BC:ED:33:39:B4,bridge=vmbr1
numa: 1
ostype: other
scsihw: virtio-scsi-pci
smbios1: uuid=9778cb3a-ff6d-5558-6aad-70d5747ca471
sockets: 2
vmgenid: 69e8d2aa-2ff7-5d56-5434-e43b80e3fdec

grub:
"GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"


Code:
[  190.927003] pcieport 0000:02:05.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[  190.930684] pcieport 0000:02:05.0: bridge window [io  0x1000-0x0fff] to [bus 08] add_size 1000
[  190.930691] pcieport 0000:02:05.0: BAR 13: no space for [io  size 0x1000]
[  190.930693] pcieport 0000:02:05.0: BAR 13: failed to assign [io  size 0x1000]
[  190.930694] pcieport 0000:02:05.0: BAR 13: no space for [io  size 0x1000]
[  190.930695] pcieport 0000:02:05.0: BAR 13: failed to assign [io  size 0x1000]
[  190.930698] pci 0000:08:00.0: BAR 0: assigned [mem 0xf4100000-0xf413ffff 64bit]
[  190.932408] pci 0000:08:00.0: BAR 4: assigned [mem 0xf4140000-0xf417ffff 64bit]
[  190.934115] pci 0000:08:00.0: BAR 6: assigned [mem 0xf4180000-0xf41bffff pref]
[  190.934118] pcieport 0000:02:05.0: PCI bridge to [bus 08]
[  190.934850] pcieport 0000:02:05.0:   bridge window [mem 0xf4100000-0xf41fffff]
[  190.935340] pcieport 0000:02:05.0:   bridge window [mem 0xf5100000-0xf52fffff 64bit pref]
[  190.937343] nvme nvme2: pci function 0000:08:00.0
[  190.938039] nvme 0000:08:00.0: enabling device (0000 -> 0002)
[  190.977895] nvme nvme2: 127/0/0 default/read/poll queues
[  190.993683]  nvme2n1: p1 p2
[  192.318164] vfio-pci 0000:09:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[  192.320595] pcieport 0000:02:06.0: pciehp: Slot(0-6): Link Down
[  192.484916] clocksource: timekeeping watchdog on CPU123: hpet wd-wd read-back delay of 246050ns
[  192.484937] clocksource: wd-tsc-wd read-back delay of 243047ns, clock-skew test skipped!
[  192.736191] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x12e8 (issued 2000 msec ago)
[  193.988867] clocksource: timekeeping watchdog on CPU126: hpet wd-wd read-back delay of 246400ns
[  193.988894] clocksource: wd-tsc-wd read-back delay of 244095ns, clock-skew test skipped!
[  194.244006] vfio-pci 0000:09:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[  194.244187] pci 0000:09:00.0: Removing from iommu group 84
[  194.252153] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x13f8 (issued 186956 msec ago)
[  194.252765] pcieport 0000:02:06.0: pciehp: Slot(0-6): Card present
[  194.726855] device tap164i0 entered promiscuous mode
[  194.738469] vmbr1: port 1(tap164i0) entered blocking state
[  194.738476] vmbr1: port 1(tap164i0) entered disabled state
[  194.738962] vmbr1: port 1(tap164i0) entered blocking state
[  194.738964] vmbr1: port 1(tap164i0) entered forwarding state
[  194.738987] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready
[  196.272094] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2020 msec ago)
[  196.413036] pci 0000:09:00.0: [1344:51c0] type 00 class 0x010802
[  196.416962] pci 0000:09:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[  196.421620] pci 0000:09:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[  196.422846] pci 0000:09:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[  196.424317] pci 0000:09:00.0: Max Payload Size set to 512 (was 128, max 512)
[  196.439279] pci 0000:09:00.0: PME# supported from D0 D1 D3hot
[  196.459967] pci 0000:09:00.0: Adding to iommu group 84
[  196.462579] pcieport 0000:02:06.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[  196.466259] pcieport 0000:02:06.0: bridge window [io  0x1000-0x0fff] to [bus 09] add_size 1000
[  196.466265] pcieport 0000:02:06.0: BAR 13: no space for [io  size 0x1000]
[  196.466267] pcieport 0000:02:06.0: BAR 13: failed to assign [io  size 0x1000]
[  196.466268] pcieport 0000:02:06.0: BAR 13: no space for [io  size 0x1000]
[  196.466269] pcieport 0000:02:06.0: BAR 13: failed to assign [io  size 0x1000]
[  196.466272] pci 0000:09:00.0: BAR 0: assigned [mem 0xf4000000-0xf403ffff 64bit]
[  196.467975] pci 0000:09:00.0: BAR 4: assigned [mem 0xf4040000-0xf407ffff 64bit]
[  196.469691] pci 0000:09:00.0: BAR 6: assigned [mem 0xf4080000-0xf40bffff pref]
[  196.469694] pcieport 0000:02:06.0: PCI bridge to [bus 09]
[  196.470426] pcieport 0000:02:06.0:   bridge window [mem 0xf4000000-0xf40fffff]
[  196.470916] pcieport 0000:02:06.0:   bridge window [mem 0xf5300000-0xf54fffff 64bit pref]
[  196.472884] nvme nvme3: pci function 0000:09:00.0
[  196.473616] nvme 0000:09:00.0: enabling device (0000 -> 0002)
[  196.512931] nvme nvme3: 127/0/0 default/read/poll queues
[  196.529097]  nvme3n1: p1 p2
[  198.092038] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x12e8 (issued 1820 msec ago)
[  198.690791] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x300
[  198.691033] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x27@0x920
[  198.691278] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x26@0x9c0
[  199.114602] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x19@0x300
[  199.114847] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x27@0x920
[  199.115096] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x26@0x9c0
[  199.485505] vmbr1: port 1(tap164i0) entered disabled state
[  330.030345] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[  330.032580] pcieport 0000:02:05.0: pciehp: Slot(0-5): Link Down
[  331.935885] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[  331.936059] pci 0000:08:00.0: Removing from iommu group 83
[  331.956272] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x11e8 (issued 139224 msec ago)
[  331.957145] pcieport 0000:02:05.0: pciehp: Slot(0-5): Card present
[  333.976326] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2020 msec ago)
[  334.117418] pci 0000:08:00.0: [1344:51c0] type 00 class 0x010802
[  334.121345] pci 0000:08:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[  334.126000] pci 0000:08:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[  334.127226] pci 0000:08:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[  334.128698] pci 0000:08:00.0: Max Payload Size set to 512 (was 128, max 512)
[  334.143659] pci 0000:08:00.0: PME# supported from D0 D1 D3hot
[  334.164444] pci 0000:08:00.0: Adding to iommu group 83
[  334.166959] pcieport 0000:02:05.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[  334.170643] pcieport 0000:02:05.0: bridge window [io  0x1000-0x0fff] to [bus 08] add_size 1000
[  334.170650] pcieport 0000:02:05.0: BAR 13: no space for [io  size 0x1000]
[  334.170652] pcieport 0000:02:05.0: BAR 13: failed to assign [io  size 0x1000]
[  334.170653] pcieport 0000:02:05.0: BAR 13: no space for [io  size 0x1000]
[  334.170654] pcieport 0000:02:05.0: BAR 13: failed to assign [io  size 0x1000]
[  334.170658] pci 0000:08:00.0: BAR 0: assigned [mem 0xf4100000-0xf413ffff 64bit]
[  334.172363] pci 0000:08:00.0: BAR 4: assigned [mem 0xf4140000-0xf417ffff 64bit]
[  334.174072] pci 0000:08:00.0: BAR 6: assigned [mem 0xf4180000-0xf41bffff pref]
[  334.174075] pcieport 0000:02:05.0: PCI bridge to [bus 08]
[  334.174806] pcieport 0000:02:05.0:   bridge window [mem 0xf4100000-0xf41fffff]
[  334.175296] pcieport 0000:02:05.0:   bridge window [mem 0xf5100000-0xf52fffff 64bit pref]
[  334.177298] nvme nvme1: pci function 0000:08:00.0
[  334.177996] nvme 0000:08:00.0: enabling device (0000 -> 0002)
[  334.220204] nvme nvme1: 127/0/0 default/read/poll queues
[  334.237017]  nvme1n1: p1 p2
[  335.796180] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x12e8 (issued 1820 msec ago)

I tried a couple of times so maybe this second log is already dead:
Code:
[   79.533603] vfio-pci 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[   79.535330] pcieport 0000:02:04.0: pciehp: Slot(0-4): Link Down
[   80.284136] vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
[   81.532090] vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
[   82.588056] vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
[   84.892150] vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
[   89.243877] vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
[   97.691632] vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
[  114.331200] vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
[  149.146240] vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
[  149.154174] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13f8 (issued 141128 msec ago)
[  151.174121] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x03e0 (issued 2020 msec ago)
[  152.506070] vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
[  153.562091] vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
[  155.801981] vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
[  160.153992] vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
[  168.601641] vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
[  186.009203] vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
[  220.824284] vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
[  220.844289] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x03e0 (issued 71692 msec ago)
[  222.168321] vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
[  223.224211] vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
[  225.432174] vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
[  229.784044] vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
[  238.231807] vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
[  245.400141] INFO: task irq/59-pciehp:1664 blocked for more than 120 seconds.
[  245.400994]       Tainted: P           O      5.15.158-2-pve #1
[  245.401399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  245.401793] task:irq/59-pciehp   state:D stack:    0 pid: 1664 ppid:     2 flags:0x00004000
[  245.401800] Call Trace:
[  245.401804]  <TASK>
[  245.401809]  __schedule+0x34e/0x1740
[  245.401821]  ? srso_alias_return_thunk+0x5/0x7f
[  245.401827]  ? srso_alias_return_thunk+0x5/0x7f
[  245.401828]  ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[  245.401834]  schedule+0x69/0x110
[  245.401836]  schedule_preempt_disabled+0xe/0x20
[  245.401839]  __mutex_lock.constprop.0+0x255/0x480
[  245.401843]  __mutex_lock_slowpath+0x13/0x20
[  245.401846]  mutex_lock+0x38/0x50
[  245.401848]  device_release_driver+0x1f/0x40
[  245.401855]  pci_stop_bus_device+0x74/0xa0
[  245.401862]  pci_stop_and_remove_bus_device+0x13/0x30
[  245.401864]  pciehp_unconfigure_device+0x92/0x150
[  245.401872]  pciehp_disable_slot+0x6c/0x100
[  245.401875]  pciehp_handle_presence_or_link_change+0x22a/0x340
[  245.401877]  ? srso_alias_return_thunk+0x5/0x7f
[  245.401879]  pciehp_ist+0x19a/0x1b0
[  245.401882]  ? irq_forced_thread_fn+0x90/0x90
[  245.401889]  irq_thread_fn+0x28/0x70
[  245.401892]  irq_thread+0xde/0x1b0
[  245.401895]  ? irq_thread_fn+0x70/0x70
[  245.401898]  ? irq_thread_check_affinity+0x100/0x100
[  245.401901]  kthread+0x12a/0x150
[  245.401905]  ? set_kthread_struct+0x50/0x50
[  245.401907]  ret_from_fork+0x22/0x30
[  245.401915]  </TASK>
[  255.639346] vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
[  290.454384] vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
[  290.456313] vfio-pci 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[  290.457400] pci 0000:07:00.0: Removing from iommu group 82
[  290.499751] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13e8 (issued 69656 msec ago)
[  290.500378] pcieport 0000:02:04.0: pciehp: Slot(0-4): Card present
[  290.500381] pcieport 0000:02:04.0: pciehp: Slot(0-4): Link Up
[  292.534371] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2036 msec ago)
[  292.675367] pci 0000:07:00.0: [1344:51c0] type 00 class 0x010802
[  292.679292] pci 0000:07:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[  292.683949] pci 0000:07:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[  292.685175] pci 0000:07:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[  292.686647] pci 0000:07:00.0: Max Payload Size set to 512 (was 128, max 512)
[  292.701608] pci 0000:07:00.0: PME# supported from D0 D1 D3hot
[  292.722320] pci 0000:07:00.0: Adding to iommu group 82
[  292.725153] pcieport 0000:02:04.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[  292.729326] pcieport 0000:02:04.0: bridge window [io  0x1000-0x0fff] to [bus 07] add_size 1000
[  292.729338] pcieport 0000:02:04.0: BAR 13: no space for [io  size 0x1000]
[  292.729341] pcieport 0000:02:04.0: BAR 13: failed to assign [io  size 0x1000]
[  292.729344] pcieport 0000:02:04.0: BAR 13: no space for [io  size 0x1000]
[  292.729345] pcieport 0000:02:04.0: BAR 13: failed to assign [io  size 0x1000]
[  292.729351] pci 0000:07:00.0: BAR 0: assigned [mem 0xf4200000-0xf423ffff 64bit]
[  292.731040] pci 0000:07:00.0: BAR 4: assigned [mem 0xf4240000-0xf427ffff 64bit]
[  292.732756] pci 0000:07:00.0: BAR 6: assigned [mem 0xf4280000-0xf42bffff pref]
[  292.732761] pcieport 0000:02:04.0: PCI bridge to [bus 07]
[  292.733491] pcieport 0000:02:04.0:   bridge window [mem 0xf4200000-0xf42fffff]
[  292.733981] pcieport 0000:02:04.0:   bridge window [mem 0xf4f00000-0xf50fffff 64bit pref]
[  292.736102] nvme nvme1: pci function 0000:07:00.0
[  292.736683] nvme 0000:07:00.0: enabling device (0000 -> 0002)
[  292.849474] nvme nvme1: 127/0/0 default/read/poll queues
[  292.873346]  nvme1n1: p1 p2
[  294.144318] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[  294.147677] pcieport 0000:02:05.0: pciehp: Slot(0-5): Link Down
[  294.562254] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x12e8 (issued 2028 msec ago)
[  294.870254] vfio-pci 0000:08:00.0: timed out waiting for pending transaction; performing function level reset anyway
[  296.118251] vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
[  297.174284] vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
[  299.414197] vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!