Since installing a truenas core with 6 NVME passthrough (zfs pool created in truenas), everything was ok. 6+months (but no reboot)
I had to reboot the server (not just the VM) and now I cant boot the VM with the attached NVMEs.
Any ideas?
I though at some point that proxmox zfs tries to interfere on boot so i did:
Config:
grub:
"GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
I tried a couple of times so maybe this second log is already dead:
I had to reboot the server (not just the VM) and now I cant boot the VM with the attached NVMEs.
Any ideas?
I though at some point that proxmox zfs tries to interfere on boot so i did:
Code:
systemctl disable zfs-import-cache
systemctl disable zfs-import-scan
systemctl disable zfs-mount
systemctl list-unit-files | grep zfs
zfs-import-cache.service disabled enabled
zfs-import-scan.service disabled disabled
zfs-import.service masked enabled
zfs-import@.service disabled enabled
zfs-load-key.service masked enabled
zfs-mount.service disabled enabled
zfs-scrub@.service static -
zfs-share.service enabled enabled
zfs-volume-wait.service enabled enabled
zfs-zed.service enabled enabled
zfs-import.target enabled enabled
zfs-volumes.target enabled enabled
zfs.target enabled enabled
zfs-scrub-monthly@.timer disabled enabled
zfs-scrub-weekly@.timer disabled enabled
Config:
Code:
balloon: 0
boot: order=ide0;ide2;net0
cores: 4
cpu: host
hostpci0: 0000:04:00
hostpci1: 0000:05:00
hostpci2: 0000:06:00
hostpci3: 0000:07:00
hostpci4: 0000:08:00
hostpci5: 0000:09:00
ide0: Disk8T0:164/vm-164-disk-0.qcow2,size=32G
ide2: none,media=cdrom
memory: 48000
meta: creation-qemu=7.0.0,ctime=1689955542
name: ucltruenas
net0: virtio=76:BC:ED:33:39:B4,bridge=vmbr1
numa: 1
ostype: other
scsihw: virtio-scsi-pci
smbios1: uuid=9778cb3a-ff6d-5558-6aad-70d5747ca471
sockets: 2
vmgenid: 69e8d2aa-2ff7-5d56-5434-e43b80e3fdec
grub:
"GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
Code:
[ 190.927003] pcieport 0000:02:05.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 190.930684] pcieport 0000:02:05.0: bridge window [io 0x1000-0x0fff] to [bus 08] add_size 1000
[ 190.930691] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 190.930693] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 190.930694] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 190.930695] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 190.930698] pci 0000:08:00.0: BAR 0: assigned [mem 0xf4100000-0xf413ffff 64bit]
[ 190.932408] pci 0000:08:00.0: BAR 4: assigned [mem 0xf4140000-0xf417ffff 64bit]
[ 190.934115] pci 0000:08:00.0: BAR 6: assigned [mem 0xf4180000-0xf41bffff pref]
[ 190.934118] pcieport 0000:02:05.0: PCI bridge to [bus 08]
[ 190.934850] pcieport 0000:02:05.0: bridge window [mem 0xf4100000-0xf41fffff]
[ 190.935340] pcieport 0000:02:05.0: bridge window [mem 0xf5100000-0xf52fffff 64bit pref]
[ 190.937343] nvme nvme2: pci function 0000:08:00.0
[ 190.938039] nvme 0000:08:00.0: enabling device (0000 -> 0002)
[ 190.977895] nvme nvme2: 127/0/0 default/read/poll queues
[ 190.993683] nvme2n1: p1 p2
[ 192.318164] vfio-pci 0000:09:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 192.320595] pcieport 0000:02:06.0: pciehp: Slot(0-6): Link Down
[ 192.484916] clocksource: timekeeping watchdog on CPU123: hpet wd-wd read-back delay of 246050ns
[ 192.484937] clocksource: wd-tsc-wd read-back delay of 243047ns, clock-skew test skipped!
[ 192.736191] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x12e8 (issued 2000 msec ago)
[ 193.988867] clocksource: timekeeping watchdog on CPU126: hpet wd-wd read-back delay of 246400ns
[ 193.988894] clocksource: wd-tsc-wd read-back delay of 244095ns, clock-skew test skipped!
[ 194.244006] vfio-pci 0000:09:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 194.244187] pci 0000:09:00.0: Removing from iommu group 84
[ 194.252153] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x13f8 (issued 186956 msec ago)
[ 194.252765] pcieport 0000:02:06.0: pciehp: Slot(0-6): Card present
[ 194.726855] device tap164i0 entered promiscuous mode
[ 194.738469] vmbr1: port 1(tap164i0) entered blocking state
[ 194.738476] vmbr1: port 1(tap164i0) entered disabled state
[ 194.738962] vmbr1: port 1(tap164i0) entered blocking state
[ 194.738964] vmbr1: port 1(tap164i0) entered forwarding state
[ 194.738987] IPv6: ADDRCONF(NETDEV_CHANGE): vmbr1: link becomes ready
[ 196.272094] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2020 msec ago)
[ 196.413036] pci 0000:09:00.0: [1344:51c0] type 00 class 0x010802
[ 196.416962] pci 0000:09:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[ 196.421620] pci 0000:09:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[ 196.422846] pci 0000:09:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[ 196.424317] pci 0000:09:00.0: Max Payload Size set to 512 (was 128, max 512)
[ 196.439279] pci 0000:09:00.0: PME# supported from D0 D1 D3hot
[ 196.459967] pci 0000:09:00.0: Adding to iommu group 84
[ 196.462579] pcieport 0000:02:06.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 196.466259] pcieport 0000:02:06.0: bridge window [io 0x1000-0x0fff] to [bus 09] add_size 1000
[ 196.466265] pcieport 0000:02:06.0: BAR 13: no space for [io size 0x1000]
[ 196.466267] pcieport 0000:02:06.0: BAR 13: failed to assign [io size 0x1000]
[ 196.466268] pcieport 0000:02:06.0: BAR 13: no space for [io size 0x1000]
[ 196.466269] pcieport 0000:02:06.0: BAR 13: failed to assign [io size 0x1000]
[ 196.466272] pci 0000:09:00.0: BAR 0: assigned [mem 0xf4000000-0xf403ffff 64bit]
[ 196.467975] pci 0000:09:00.0: BAR 4: assigned [mem 0xf4040000-0xf407ffff 64bit]
[ 196.469691] pci 0000:09:00.0: BAR 6: assigned [mem 0xf4080000-0xf40bffff pref]
[ 196.469694] pcieport 0000:02:06.0: PCI bridge to [bus 09]
[ 196.470426] pcieport 0000:02:06.0: bridge window [mem 0xf4000000-0xf40fffff]
[ 196.470916] pcieport 0000:02:06.0: bridge window [mem 0xf5300000-0xf54fffff 64bit pref]
[ 196.472884] nvme nvme3: pci function 0000:09:00.0
[ 196.473616] nvme 0000:09:00.0: enabling device (0000 -> 0002)
[ 196.512931] nvme nvme3: 127/0/0 default/read/poll queues
[ 196.529097] nvme3n1: p1 p2
[ 198.092038] pcieport 0000:02:06.0: pciehp: Timeout on hotplug command 0x12e8 (issued 1820 msec ago)
[ 198.690791] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x19@0x300
[ 198.691033] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x27@0x920
[ 198.691278] vfio-pci 0000:04:00.0: vfio_ecap_init: hiding ecap 0x26@0x9c0
[ 199.114602] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x19@0x300
[ 199.114847] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x27@0x920
[ 199.115096] vfio-pci 0000:05:00.0: vfio_ecap_init: hiding ecap 0x26@0x9c0
[ 199.485505] vmbr1: port 1(tap164i0) entered disabled state
[ 330.030345] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 330.032580] pcieport 0000:02:05.0: pciehp: Slot(0-5): Link Down
[ 331.935885] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 331.936059] pci 0000:08:00.0: Removing from iommu group 83
[ 331.956272] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x11e8 (issued 139224 msec ago)
[ 331.957145] pcieport 0000:02:05.0: pciehp: Slot(0-5): Card present
[ 333.976326] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2020 msec ago)
[ 334.117418] pci 0000:08:00.0: [1344:51c0] type 00 class 0x010802
[ 334.121345] pci 0000:08:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[ 334.126000] pci 0000:08:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[ 334.127226] pci 0000:08:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[ 334.128698] pci 0000:08:00.0: Max Payload Size set to 512 (was 128, max 512)
[ 334.143659] pci 0000:08:00.0: PME# supported from D0 D1 D3hot
[ 334.164444] pci 0000:08:00.0: Adding to iommu group 83
[ 334.166959] pcieport 0000:02:05.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 334.170643] pcieport 0000:02:05.0: bridge window [io 0x1000-0x0fff] to [bus 08] add_size 1000
[ 334.170650] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 334.170652] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 334.170653] pcieport 0000:02:05.0: BAR 13: no space for [io size 0x1000]
[ 334.170654] pcieport 0000:02:05.0: BAR 13: failed to assign [io size 0x1000]
[ 334.170658] pci 0000:08:00.0: BAR 0: assigned [mem 0xf4100000-0xf413ffff 64bit]
[ 334.172363] pci 0000:08:00.0: BAR 4: assigned [mem 0xf4140000-0xf417ffff 64bit]
[ 334.174072] pci 0000:08:00.0: BAR 6: assigned [mem 0xf4180000-0xf41bffff pref]
[ 334.174075] pcieport 0000:02:05.0: PCI bridge to [bus 08]
[ 334.174806] pcieport 0000:02:05.0: bridge window [mem 0xf4100000-0xf41fffff]
[ 334.175296] pcieport 0000:02:05.0: bridge window [mem 0xf5100000-0xf52fffff 64bit pref]
[ 334.177298] nvme nvme1: pci function 0000:08:00.0
[ 334.177996] nvme 0000:08:00.0: enabling device (0000 -> 0002)
[ 334.220204] nvme nvme1: 127/0/0 default/read/poll queues
[ 334.237017] nvme1n1: p1 p2
[ 335.796180] pcieport 0000:02:05.0: pciehp: Timeout on hotplug command 0x12e8 (issued 1820 msec ago)
I tried a couple of times so maybe this second log is already dead:
Code:
[ 79.533603] vfio-pci 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 79.535330] pcieport 0000:02:04.0: pciehp: Slot(0-4): Link Down
[ 80.284136] vfio-pci 0000:07:00.0: timed out waiting for pending transaction; performing function level reset anyway
[ 81.532090] vfio-pci 0000:07:00.0: not ready 1023ms after FLR; waiting
[ 82.588056] vfio-pci 0000:07:00.0: not ready 2047ms after FLR; waiting
[ 84.892150] vfio-pci 0000:07:00.0: not ready 4095ms after FLR; waiting
[ 89.243877] vfio-pci 0000:07:00.0: not ready 8191ms after FLR; waiting
[ 97.691632] vfio-pci 0000:07:00.0: not ready 16383ms after FLR; waiting
[ 114.331200] vfio-pci 0000:07:00.0: not ready 32767ms after FLR; waiting
[ 149.146240] vfio-pci 0000:07:00.0: not ready 65535ms after FLR; giving up
[ 149.154174] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13f8 (issued 141128 msec ago)
[ 151.174121] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x03e0 (issued 2020 msec ago)
[ 152.506070] vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
[ 153.562091] vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
[ 155.801981] vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
[ 160.153992] vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
[ 168.601641] vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
[ 186.009203] vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
[ 220.824284] vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
[ 220.844289] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x03e0 (issued 71692 msec ago)
[ 222.168321] vfio-pci 0000:07:00.0: not ready 1023ms after bus reset; waiting
[ 223.224211] vfio-pci 0000:07:00.0: not ready 2047ms after bus reset; waiting
[ 225.432174] vfio-pci 0000:07:00.0: not ready 4095ms after bus reset; waiting
[ 229.784044] vfio-pci 0000:07:00.0: not ready 8191ms after bus reset; waiting
[ 238.231807] vfio-pci 0000:07:00.0: not ready 16383ms after bus reset; waiting
[ 245.400141] INFO: task irq/59-pciehp:1664 blocked for more than 120 seconds.
[ 245.400994] Tainted: P O 5.15.158-2-pve #1
[ 245.401399] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 245.401793] task:irq/59-pciehp state:D stack: 0 pid: 1664 ppid: 2 flags:0x00004000
[ 245.401800] Call Trace:
[ 245.401804] <TASK>
[ 245.401809] __schedule+0x34e/0x1740
[ 245.401821] ? srso_alias_return_thunk+0x5/0x7f
[ 245.401827] ? srso_alias_return_thunk+0x5/0x7f
[ 245.401828] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 245.401834] schedule+0x69/0x110
[ 245.401836] schedule_preempt_disabled+0xe/0x20
[ 245.401839] __mutex_lock.constprop.0+0x255/0x480
[ 245.401843] __mutex_lock_slowpath+0x13/0x20
[ 245.401846] mutex_lock+0x38/0x50
[ 245.401848] device_release_driver+0x1f/0x40
[ 245.401855] pci_stop_bus_device+0x74/0xa0
[ 245.401862] pci_stop_and_remove_bus_device+0x13/0x30
[ 245.401864] pciehp_unconfigure_device+0x92/0x150
[ 245.401872] pciehp_disable_slot+0x6c/0x100
[ 245.401875] pciehp_handle_presence_or_link_change+0x22a/0x340
[ 245.401877] ? srso_alias_return_thunk+0x5/0x7f
[ 245.401879] pciehp_ist+0x19a/0x1b0
[ 245.401882] ? irq_forced_thread_fn+0x90/0x90
[ 245.401889] irq_thread_fn+0x28/0x70
[ 245.401892] irq_thread+0xde/0x1b0
[ 245.401895] ? irq_thread_fn+0x70/0x70
[ 245.401898] ? irq_thread_check_affinity+0x100/0x100
[ 245.401901] kthread+0x12a/0x150
[ 245.401905] ? set_kthread_struct+0x50/0x50
[ 245.401907] ret_from_fork+0x22/0x30
[ 245.401915] </TASK>
[ 255.639346] vfio-pci 0000:07:00.0: not ready 32767ms after bus reset; waiting
[ 290.454384] vfio-pci 0000:07:00.0: not ready 65535ms after bus reset; giving up
[ 290.456313] vfio-pci 0000:07:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 290.457400] pci 0000:07:00.0: Removing from iommu group 82
[ 290.499751] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13e8 (issued 69656 msec ago)
[ 290.500378] pcieport 0000:02:04.0: pciehp: Slot(0-4): Card present
[ 290.500381] pcieport 0000:02:04.0: pciehp: Slot(0-4): Link Up
[ 292.534371] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x13e8 (issued 2036 msec ago)
[ 292.675367] pci 0000:07:00.0: [1344:51c0] type 00 class 0x010802
[ 292.679292] pci 0000:07:00.0: reg 0x10: [mem 0x00000000-0x0003ffff 64bit]
[ 292.683949] pci 0000:07:00.0: reg 0x20: [mem 0x00000000-0x0003ffff 64bit]
[ 292.685175] pci 0000:07:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
[ 292.686647] pci 0000:07:00.0: Max Payload Size set to 512 (was 128, max 512)
[ 292.701608] pci 0000:07:00.0: PME# supported from D0 D1 D3hot
[ 292.722320] pci 0000:07:00.0: Adding to iommu group 82
[ 292.725153] pcieport 0000:02:04.0: ASPM: current common clock configuration is inconsistent, reconfiguring
[ 292.729326] pcieport 0000:02:04.0: bridge window [io 0x1000-0x0fff] to [bus 07] add_size 1000
[ 292.729338] pcieport 0000:02:04.0: BAR 13: no space for [io size 0x1000]
[ 292.729341] pcieport 0000:02:04.0: BAR 13: failed to assign [io size 0x1000]
[ 292.729344] pcieport 0000:02:04.0: BAR 13: no space for [io size 0x1000]
[ 292.729345] pcieport 0000:02:04.0: BAR 13: failed to assign [io size 0x1000]
[ 292.729351] pci 0000:07:00.0: BAR 0: assigned [mem 0xf4200000-0xf423ffff 64bit]
[ 292.731040] pci 0000:07:00.0: BAR 4: assigned [mem 0xf4240000-0xf427ffff 64bit]
[ 292.732756] pci 0000:07:00.0: BAR 6: assigned [mem 0xf4280000-0xf42bffff pref]
[ 292.732761] pcieport 0000:02:04.0: PCI bridge to [bus 07]
[ 292.733491] pcieport 0000:02:04.0: bridge window [mem 0xf4200000-0xf42fffff]
[ 292.733981] pcieport 0000:02:04.0: bridge window [mem 0xf4f00000-0xf50fffff 64bit pref]
[ 292.736102] nvme nvme1: pci function 0000:07:00.0
[ 292.736683] nvme 0000:07:00.0: enabling device (0000 -> 0002)
[ 292.849474] nvme nvme1: 127/0/0 default/read/poll queues
[ 292.873346] nvme1n1: p1 p2
[ 294.144318] vfio-pci 0000:08:00.0: can't change power state from D3hot to D0 (config space inaccessible)
[ 294.147677] pcieport 0000:02:05.0: pciehp: Slot(0-5): Link Down
[ 294.562254] pcieport 0000:02:04.0: pciehp: Timeout on hotplug command 0x12e8 (issued 2028 msec ago)
[ 294.870254] vfio-pci 0000:08:00.0: timed out waiting for pending transaction; performing function level reset anyway
[ 296.118251] vfio-pci 0000:08:00.0: not ready 1023ms after FLR; waiting
[ 297.174284] vfio-pci 0000:08:00.0: not ready 2047ms after FLR; waiting
[ 299.414197] vfio-pci 0000:08:00.0: not ready 4095ms after FLR; waiting