I use Home Assistant and log power consumption via a Zigbee based power switch.What tool is generating these graphs please?
Have you tried without that watchdog enabled? AFAIK the NMI itself sometimes generates a high number of interrupts and therefore will impact server performance on its own - with that limited J3455 the outcome maybe your actual issue.nmi_watchdog
Not yet, if it's a definite fix I'd be happy to give that a try. I'd rather be keen to find what changed between 6.8 / 6.11 and 6.14 to cause the issue... given the rock solid stability of the older kernels.Have you tried without that watchdog enabled? AFAIK the NMI itself sometimes generates a high number of interrupts and therefore will impact server performance on its own - with that limited J3455 the outcome maybe your actual issue.
Don't see any harm in trying - and then you can still try discovering "what changed".Not yet, if it's a definite fix I'd be happy to give that a try. I'd rather be keen to find what changed between 6.8 / 6.11 and 6.14 to cause the issue
The kernel patch is available and working well. Give it a try with your 9070 XT.Thanks so much!
The kernel patch is available and working well. Give it a try with your 9070 XT.
After upgrading to kernel 6.14, on several machines after a week or two I started catching the following errors in the logs:CPU(s) 48 x Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz (2 Sockets)
Kernel Version Linux 6.14.0-2-pve (2025-04-10T17:57Z)
Boot Mode EFI
Manager Version pve-manager/8.4.1/2a5fa54a8503f96d
At the same time, there are no errors inside the VM.[1996513.635917] ------------[ cut here ]------------
[1996513.635946] WARNING: CPU: 24 PID: 128499 at arch/x86/kvm/vmx/vmx.c:5247 handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.635993] Modules linked in: ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrac
k xt_tcpudp iptable_filter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nvme_fabrics nvme_keyring nvme_core nvme_auth nf_tables nfnetlink_cttimeout bonding tls softdog sunrpc binfmt_misc openvswitch nsh nf_conncount nf_nat nf_con
ntrack nf_defrag_ipv6 nf_defrag_ipv4 psample nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_
ssif kvm polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl intel_cstate pcspkr acpi_power_meter mei_me ipmi_si acpi_ipmi mgag200 mei ioatdma i2c_algo_bit hpilo intel_pch_therma
l ipmi_devintf ipmi_msghandler joydev acpi_tad input_leds mac_hid vhost_net vhost
[1996513.636060] vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear dm_thin_pool
dm_persistent_data dm_bio_prison dm_bufio ses enclosure uas usb_storage hid_generic usbmouse usbkbd usbhid hid ixgbe smartpqi xhci_pci xfrm_algo dca scsi_transport_sas ehci_pci mdio xhci_hcd ehci_hcd lpc_ich wmi
[1996513.636286] CPU: 24 UID: 0 PID: 128499 Comm: CPU 1/KVM Tainted: P O 6.14.0-2-pve #1
[1996513.636305] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[1996513.636865] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 02/21/2025
[1996513.637405] RIP: 0010:handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.637944] Code: 00 01 e8 70 4f d7 ca 0f 0b 48 8b 55 d0 e9 b1 fb ff ff 44 89 c6 31 c9 45 31 c0 4c 89 e2 48 89 df e8 22 76 ea ff e9 6a fe ff ff <0f> 0b 4c 8b a3 d0 23 00 00 41 83 3c 24 30 0f 85 f5 02 00 00 48 89
[1996513.638964] RSP: 0018:ffff96f67c9778c8 EFLAGS: 00010246
[1996513.639458] RAX: ffffffffc13fdbd0 RBX: ffff8bfda80e4800 RCX: 0000000000000000
[1996513.639980] RDX: ffff8bfdaf0d3000 RSI: 0000000000000000 RDI: ffff8bfda80e4800
[1996513.640493] RBP: ffff96f67c977900 R08: 0000000000000000 R09: 0000000000000000
[1996513.641007] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000080000314
[1996513.641514] R13: 0000000080000314 R14: 0000000080000300 R15: 0000000000000000
[1996513.642033] FS: 00007b92ffdff6c0(0000) GS:ffff8bfb3fc00000(0000) knlGS:ffff9cc351680000
[1996513.642537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1996513.643056] CR2: 00000812b8185020 CR3: 00000001c1598003 CR4: 00000000007726f0
[1996513.643580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1996513.644145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1996513.644632] PKRU: 55555554
[1996513.645139] Call Trace:
[1996513.645624] <TASK>
[1996513.646128] ? show_regs+0x6c/0x80
[1996513.646600] ? __warn+0x8d/0x150
[1996513.647090] ? handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.647568] ? report_bug+0x182/0x1b0
[1996513.648052] ? handle_bug+0x6e/0xb0
[1996513.648512] ? exc_invalid_op+0x18/0x80
[1996513.648967] ? asm_exc_invalid_op+0x1b/0x20
[1996513.649398] ? __pfx_handle_exception_nmi+0x10/0x10 [kvm_intel]
[1996513.649857] ? handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.650278] ? vmx_vcpu_enter_exit+0x14f/0x450 [kvm_intel]
[1996513.650687] vmx_handle_exit+0x1f4/0x8b0 [kvm_intel]
[1996513.651119] vcpu_enter_guest+0x4e8/0x1640 [kvm]
[1996513.651621] kvm_arch_vcpu_ioctl_run+0x35d/0x750 [kvm]
[1996513.652147] kvm_vcpu_ioctl+0x2c2/0xaa0 [kvm]
[1996513.652608] __x64_sys_ioctl+0xa4/0xe0
[1996513.653007] x64_sys_call+0xb45/0x2540
[1996513.653382] do_syscall_64+0x7e/0x170
[1996513.653732] ? kvm_set_msi+0xad/0xc0 [kvm]
[1996513.654198] ? kvm_send_userspace_msi+0x75/0xb0 [kvm]
[1996513.654599] ? kvm_vm_ioctl+0xe81/0x1aa0 [kvm]
[1996513.655020] ? kvm_arch_vcpu_ioctl_run+0x226/0x750 [kvm]
[1996513.655449] ? kvm_vcpu_ioctl+0x23e/0xaa0 [kvm]
[1996513.655851] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.656163] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.656460] ? do_syscall_64+0x8a/0x170
[1996513.656746] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.657055] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.657343] ? do_syscall_64+0x8a/0x170
[1996513.657609] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.657906] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.658180] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.658439] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.658690] ? do_syscall_64+0x8a/0x170
[1996513.658967] ? clear_bhb_loop+0x15/0x70
[1996513.659213] ? clear_bhb_loop+0x15/0x70
[1996513.659438] ? clear_bhb_loop+0x15/0x70
[1996513.659658] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[1996513.659909] RIP: 0033:0x7bae90c67d1b
[1996513.660142] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[1996513.660623] RSP: 002b:00007b92ffdf9ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1996513.660892] RAX: ffffffffffffffda RBX: 0000557a2ecc26e0 RCX: 00007bae90c67d1b
[1996513.661147] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000005e
[1996513.661390] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000
[1996513.661631] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[1996513.661902] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[1996513.662155] </TASK>
[1996513.662393] ---[ end trace 0000000000000000 ]---
[1996513.662639] kvm_intel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
[1996513.662926] kvm: #VE 899406080, spte[4] = 0x80000002e33c3907, spte[3] = 0x800000031bf63907, spte[2] = 0x860000378d200bf3
agent: 1,freeze-fs-on-backup=0
balloon: 0
boot: order=scsi0;ide2;net0
cores: 16
cpu: host
ide2: none,media=cdrom
machine: q35
memory: 112640
meta: creation-qemu=8.0.2,ctime=1690795457
name: ******
net0: virtio=96:C2:2C:****,bridge=vmbr0,tag=300
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: md-thinstorage:vm-206-disk-0,cache=none,format=raw,iothread=1,size=3840G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=a6dfc75d-fe4f-4fc8-81e2-bb422cfa6fd3
sockets: 1
vmgenid: 3c5cb61d-4be5-468b-80b2-7f02b27705bd
Disabling the `nmi_watchdog` seems to have returned things to stability after a few days of testing. I admittedly do not have deep knowledge of the workings of the kernel, but that appears to be what I needed to disable to prevent the igb driver instability as I posted in my first post.Don't see any harm in trying - and then you can still try discovering "what changed".
6.14.4-1-pve
i just installed it and rebooted and noticed these errors in the logMay 10 01:45:33 prox kernel: ata4.00: status: { DRDY }
May 10 01:45:56 prox kernel: ata4.00: exception Emask 0x10 SAct 0x3 SErr 0x4050000 action 0xe frozen
May 10 01:45:56 prox kernel: ata4.00: irq_stat 0x00000040, connection status changed
May 10 01:45:56 prox kernel: ata4: SError: { PHYRdyChg CommWake DevExch }
May 10 01:45:56 prox kernel: ata4.00: failed command: WRITE FPDMA QUEUED
May 10 01:45:56 prox kernel: ata4.00: cmd 61/08:00:38:f6:5b/00:00:27:00:00/40 tag 0 ncq dma 4096 out
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
00:00.0 Host bridge: Intel Corporation Sky Lake-E DMI3 Registers (rev 04)
00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers (rev 04)
00:05.2 System peripheral: Intel Corporation Sky Lake-E RAS (rev 04)
00:05.4 PIC: Intel Corporation Sky Lake-E IOAPIC (rev 04)
00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode]
00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #17 (rev f0)
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #1 (rev f0)
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation X299 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
01:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01)
03:00.0 USB controller: ASMedia Technology Inc. ASM2142/ASM3142 USB 3.1 Host Controller
16:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04)
16:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
16:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
16:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04)
16:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
17:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
64:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04)
64:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
64:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
64:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04)
64:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:09.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 04)
64:0a.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 04)
64:0a.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 04)
64:0b.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 04)
64:0b.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 04)
64:0b.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 04)
64:0b.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 04)
64:0c.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 04)
64:0c.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 04)
64:0c.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 04)
64:0d.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 04)
64:0d.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 04)
64:0d.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 04)
64:0d.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 04)
65:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01)
66:01.0 PCI bridge: Intel Corporation Device 4fa4
66:04.0 PCI bridge: Intel Corporation Device 4fa4
67:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A310] (rev 05)
68:00.0 Audio device: Intel Corporation DG2 Audio Controller
b2:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port D (rev 04)
b2:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
b2:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
b2:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04)
b2:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b3:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
May 10 14:17:55 pve-nas1 kernel: nvidia-vgpu-mgr[2201]: segfault at 130 ip 0000783fa399d42a sp 00007ffdfa156fa0 error 4 in libnvidia-vgpu.so.570.133.10[f742a,783fa3929000+21a000] likely on CPU 14 (core 14, socket 0)
May 10 14:17:55 pve-nas1 kernel: Code: 55 48 89 e5 41 56 41 55 41 54 53 44 8b 66 3c 45 85 e4 74 74 44 8b 97 1c 2a 00 00 49 89 fd 48 89 f3 45 85 d2 74 72 48 8b 47 60 <44> 8b 88 30 01 00 00 45 85 c9 74 62 44 89 e2 45 31 e4 48 c7 43 10
May 10 14:18:04 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: No mdev vendor driver request callback support, blocked until released by user
May 10 14:18:20 pve-nas1 kernel: nvidia-vgpu-vfio 00000015-0000-0000-0000-000000000100: Removing from iommu group 88
May 10 14:18:20 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: Failed to post delete device event, 0x56
May 10 14:18:20 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: vGPU destroy failed: 0xfffffffb
May 10 14:18:20 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: Failed to destroy vGPU device, ret: -5
May 10 14:18:28 pve-nas1 kernel: watchdog: watchdog0: watchdog did not stop!
May 10 14:18:28 pve-nas1 systemd-shutdown[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
May 10 14:18:28 pve-nas1 systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
May 10 14:18:28 pve-nas1 systemd-shutdown[1]: Syncing filesystems and block devices.
May 10 14:18:33 pve-nas1 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
May 10 14:18:33 pve-nas1 systemd-journald[1252]: Received SIGTERM from PID 1 (systemd-shutdow).
May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input5
May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input6
May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input7
May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] *ERROR* Link training failed
May 10 14:20:38 pve-nas1 kernel: ------------[ cut here ]------------
May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] drm_WARN_ON(!__ast_dp_wait_enable(ast, enabled))
May 10 14:20:38 pve-nas1 kernel: WARNING: CPU: 0 PID: 470 at drivers/gpu/drm/ast/ast_dp.c:221 ast_dp_set_enable+0xea/0x110 [ast]
May 10 14:20:38 pve-nas1 kernel: Modules linked in: amd_atl intel_rapl_msr intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd ipmi_ssif kvm_amd kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec polyval_clmul>
May 10 14:20:38 pve-nas1 kernel: CPU: 0 UID: 0 PID: 470 Comm: kworker/0:13 Tainted: P O 6.14.0-2-pve #1
May 10 14:20:38 pve-nas1 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
May 10 14:20:38 pve-nas1 kernel: Hardware name: GENOAD8UD-2T/X550/GENOAD8UD-2T/X550, BIOS 11.01 01/23/2025
May 10 14:20:38 pve-nas1 kernel: Workqueue: events work_for_cpu_fn
May 10 14:20:38 pve-nas1 kernel: RIP: 0010:ast_dp_set_enable+0xea/0x110 [ast]
May 10 14:20:38 pve-nas1 kernel: Code: 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 0f fc ff d6 48 c7 c1 70 a0 5f c1 48 89 da 48 c7 c7 11 a4 5f c1 48 89 c6 e8 a6 06 59 d6 <0f> 0b 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff
May 10 14:20:38 pve-nas1 kernel: RSP: 0018:ff6418f1812c7750 EFLAGS: 00010246
May 10 14:20:38 pve-nas1 kernel: RAX: 0000000000000000 RBX: ff3a51d982e96b60 RCX: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: RBP: ff6418f1812c7778 R08: 0000000000000000 R09: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000003e8
May 10 14:20:38 pve-nas1 kernel: R13: 0000000000000010 R14: ff3a51d98e124000 R15: ff6418f1814403d4
May 10 14:20:38 pve-nas1 kernel: FS: 0000000000000000(0000) GS:ff3a5208cc800000(0000) knlGS:0000000000000000
May 10 14:20:38 pve-nas1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 14:20:38 pve-nas1 kernel: CR2: 00006456ffb7601c CR3: 000000012df4e003 CR4: 0000000000f71ef0
May 10 14:20:38 pve-nas1 kernel: PKRU: 55555554
May 10 14:20:38 pve-nas1 kernel: Call Trace:
May 10 14:20:38 pve-nas1 kernel: <TASK>
May 10 14:20:38 pve-nas1 kernel: ? show_regs+0x6c/0x80
May 10 14:20:38 pve-nas1 kernel: ? __warn+0x8d/0x150
Do you have the “firmware-ast” package installed?I also get this drm failure because it seems to be trying to enable drm on the BMC video chipset - don't recall these on earlier kernels
Code:May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input5 May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input6 May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input7 May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] *ERROR* Link training failed May 10 14:20:38 pve-nas1 kernel: ------------[ cut here ]------------ May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] drm_WARN_ON(!__ast_dp_wait_enable(ast, enabled)) May 10 14:20:38 pve-nas1 kernel: WARNING: CPU: 0 PID: 470 at drivers/gpu/drm/ast/ast_dp.c:221 ast_dp_set_enable+0xea/0x110 [ast] May 10 14:20:38 pve-nas1 kernel: Modules linked in: amd_atl intel_rapl_msr intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd ipmi_ssif kvm_amd kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec polyval_clmul> May 10 14:20:38 pve-nas1 kernel: CPU: 0 UID: 0 PID: 470 Comm: kworker/0:13 Tainted: P O 6.14.0-2-pve #1 May 10 14:20:38 pve-nas1 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE May 10 14:20:38 pve-nas1 kernel: Hardware name: GENOAD8UD-2T/X550/GENOAD8UD-2T/X550, BIOS 11.01 01/23/2025 May 10 14:20:38 pve-nas1 kernel: Workqueue: events work_for_cpu_fn May 10 14:20:38 pve-nas1 kernel: RIP: 0010:ast_dp_set_enable+0xea/0x110 [ast] May 10 14:20:38 pve-nas1 kernel: Code: 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 0f fc ff d6 48 c7 c1 70 a0 5f c1 48 89 da 48 c7 c7 11 a4 5f c1 48 89 c6 e8 a6 06 59 d6 <0f> 0b 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff May 10 14:20:38 pve-nas1 kernel: RSP: 0018:ff6418f1812c7750 EFLAGS: 00010246 May 10 14:20:38 pve-nas1 kernel: RAX: 0000000000000000 RBX: ff3a51d982e96b60 RCX: 0000000000000000 May 10 14:20:38 pve-nas1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 May 10 14:20:38 pve-nas1 kernel: RBP: ff6418f1812c7778 R08: 0000000000000000 R09: 0000000000000000 May 10 14:20:38 pve-nas1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000003e8 May 10 14:20:38 pve-nas1 kernel: R13: 0000000000000010 R14: ff3a51d98e124000 R15: ff6418f1814403d4 May 10 14:20:38 pve-nas1 kernel: FS: 0000000000000000(0000) GS:ff3a5208cc800000(0000) knlGS:0000000000000000 May 10 14:20:38 pve-nas1 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 May 10 14:20:38 pve-nas1 kernel: CR2: 00006456ffb7601c CR3: 000000012df4e003 CR4: 0000000000f71ef0 May 10 14:20:38 pve-nas1 kernel: PKRU: 55555554 May 10 14:20:38 pve-nas1 kernel: Call Trace: May 10 14:20:38 pve-nas1 kernel: <TASK> May 10 14:20:38 pve-nas1 kernel: ? show_regs+0x6c/0x80 May 10 14:20:38 pve-nas1 kernel: ? __warn+0x8d/0x150
not that i know of, also drivers tend to be a kernel thing?Do you have the “firmware-ast” package installed?
root@pve-nas1:~# apt -y install firmware-ast
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package firmware-ast is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source
That's sounding similar to issues I have with any Fedora 41/42 VMs when they try to use any 6.14 kernel, and seems to be linked to balloon memory as it's not reacting fast enough to requests for more RAM and the OOM watchdog is killing processes trying to keep the server "running". Eventually it either locks up the VM or reboots it. Only fixes I've found for it so far is to either disable ballooning thus fixing the amount of allocated RAM or reverting back to 6.13I also have some issues with 6.14 after upgrading on different machines.
A NUC that I have and a workstation with a Threadripper became unstable.
Machines become unrespomsive or my workstation simply shuts down. No indicator for problems but I have not digged deep because Im not sure where.
Reboot works fine and all is working fine again.
Reverted to 6.11 for now
My NUC had an unrelated problem... some kernel panic of the network card because of driver.That's sounding similar to issues I have with any Fedora 41/42 VMs when they try to use any 6.14 kernel, and seems to be linked to balloon memory as it's not reacting fast enough to requests for more RAM and the OOM watchdog is killing processes trying to keep the server "running". Eventually it either locks up the VM or reboots it. Only fixes I've found for it so far is to either disable ballooning thus fixing the amount of allocated RAM or reverting back to 6.13
I know this is not quite the same, as this is for the host running 6.14, but perhaps something similar is also happening here. I've tried 3 releases of 6.14 for Fedora so far.
Strangely my host running this 6.14 has so far been ok.
We use essential cookies to make this site work, and optional cookies to enhance your experience.