Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

HomebrewD · May 3, 2025

Tried the new kernel but had to revert back to 6.8 as I couldn't get my network up.
Something to do with ASPM I think.
Igb kept trowing 'pcie link lost' on both interfaces (I350-AM2 on aMZ32-AR0)
Latest bios and all.

CRCinAU · May 4, 2025

luckman212 said:
What tool is generating these graphs please?

I use Home Assistant and log power consumption via a Zigbee based power switch.

Lets me measure the actual consumption being drawn from the socket and not calculated via other methods.

funtowne · May 5, 2025

I am having serious issues with the igb driver on 6.14 that I did not experience on 6.8 or 6.11. I tested on the first release of 6.14 and the subsequent patch release -2.

Computer is a Compulab Fitlet2 Celeron J3455 with 4x Intel i211 adapters. Only VM is opnsense with no PCIe passthrough, all interfaces virtual via Linux bridges. I also have a few containers running on the machine.

When the driver goes crazy, below, I have to reboot to gain network access. enp2s0, below, is connected via vmbr1 to my 5G modem/router.

[325745.360971] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5294 ms
[325745.361226] igb 0000:02:00.0 enp2s0: Reset adapter
[325745.361307] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325745.423312] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[325746.321262] igb 0000:02:00.0 enp2s0: Reset adapter
[325746.450856] vmbr1: port 1(enp2s0) entered disabled state
[325748.369268] igb 0000:02:00.0 enp2s0: Reset adapter
[325751.184343] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325751.497017] vmbr1: port 1(enp2s0) entered blocking state
[325751.497041] vmbr1: port 1(enp2s0) entered forwarding state
[325756.752522] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5143 ms
[325756.752675] igb 0000:02:00.0 enp2s0: Reset adapter
[325756.815508] vmbr1: port 1(enp2s0) entered disabled state
[325760.371106] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325760.371532] vmbr1: port 1(enp2s0) entered blocking state
[325760.371545] vmbr1: port 1(enp2s0) entered forwarding state
[325765.536236] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5002 ms
[325765.536320] igb 0000:02:00.0 enp2s0: Reset adapter
[325765.592824] vmbr1: port 1(enp2s0) entered disabled state
[325768.746817] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325768.747121] vmbr1: port 1(enp2s0) entered blocking state
[325768.747134] vmbr1: port 1(enp2s0) entered forwarding state
[325774.671975] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5793 ms
[325774.672169] igb 0000:02:00.0 enp2s0: Reset adapter
[325774.734935] vmbr1: port 1(enp2s0) entered disabled state
[325777.637432] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325777.952029] vmbr1: port 1(enp2s0) entered blocking state
[325777.952053] vmbr1: port 1(enp2s0) entered forwarding state
[325783.375668] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5221 ms
[325783.375858] igb 0000:02:00.0 enp2s0: Reset adapter
[325783.437719] vmbr1: port 1(enp2s0) entered disabled state
[325787.017102] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325787.017370] vmbr1: port 1(enp2s0) entered blocking state
[325787.017380] vmbr1: port 1(enp2s0) entered forwarding state
[325792.591385] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 0: transmit queue 0 timed out 5299 ms
[325792.591698] igb 0000:02:00.0 enp2s0: Reset adapter
[325792.654211] vmbr1: port 1(enp2s0) entered disabled state
[325795.716892] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325796.031499] vmbr1: port 1(enp2s0) entered blocking state
[325796.031524] vmbr1: port 1(enp2s0) entered forwarding state
[325802.383040] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 1: transmit queue 0 timed out 5459 ms
[325802.383229] igb 0000:02:00.0 enp2s0: Reset adapter
[325802.383434] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325802.436055] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[325803.343272] igb 0000:02:00.0 enp2s0: Reset adapter
[325803.471441] vmbr1: port 1(enp2s0) entered disabled state
[325805.328207] igb 0000:02:00.0 enp2s0: Reset adapter
[325808.371371] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325808.371717] vmbr1: port 1(enp2s0) entered blocking state
[325808.371751] vmbr1: port 1(enp2s0) entered forwarding state
[325815.310544] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 3: transmit queue 0 timed out 5353 ms
[325815.310623] igb 0000:02:00.0 enp2s0: Reset adapter
[325815.372292] vmbr1: port 1(enp2s0) entered disabled state
[325818.702051] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325818.702454] vmbr1: port 1(enp2s0) entered blocking state
[325818.702467] vmbr1: port 1(enp2s0) entered forwarding state
[325826.318291] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 3: transmit queue 0 timed out 5380 ms
[325826.318584] igb 0000:02:00.0 enp2s0: Reset adapter
[325826.378359] vmbr1: port 1(enp2s0) entered disabled state
[325829.635726] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325829.635990] vmbr1: port 1(enp2s0) entered blocking state
[325829.635999] vmbr1: port 1(enp2s0) entered forwarding state
[325837.325780] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 2: transmit queue 0 timed out 5163 ms
[325837.325901] igb 0000:02:00.0 enp2s0: Reset adapter
[325837.388776] vmbr1: port 1(enp2s0) entered disabled state
[325840.699271] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325840.699542] vmbr1: port 1(enp2s0) entered blocking state
[325840.699551] vmbr1: port 1(enp2s0) entered forwarding state
[325847.373520] igb 0000:02:00.0 enp2s0: NETDEV WATCHDOG: CPU: 2: transmit queue 1 timed out 5302 ms
[325847.373656] igb 0000:02:00.0 enp2s0: Reset adapter
[325847.373706] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325847.435051] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Down
[325848.333899] igb 0000:02:00.0 enp2s0: Reset adapter
[325848.461716] vmbr1: port 1(enp2s0) entered disabled state
[325850.317860] igb 0000:02:00.0 enp2s0: Reset adapter
[325853.118961] igb 0000:02:00.0 enp2s0: igb: enp2s0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
[325853.429611] vmbr1: port 1(enp2s0) entered blocking state
[325853.429634] vmbr1: port 1(enp2s0) entered forwarding state

quiet nmi_watchdog=1 intel_iommu=on

proxmox-ve: 8.4.0 (running kernel: 6.8.12-10-pve)
pve-manager: 8.4.1 (running version: 8.4.1/2a5fa54a8503f96d)
proxmox-kernel-helper: 8.1.1
proxmox-kernel-6.8.12-10-pve-signed: 6.8.12-10
proxmox-kernel-6.8: 6.8.12-10
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
intel-microcode: 3.20250211.1~deb12u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.1
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.0
libpve-cluster-perl: 8.1.0
libpve-common-perl: 8.3.1
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.2
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.6
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
openvswitch-switch: residual config
proxmox-backup-client: 3.4.1-1
proxmox-backup-file-restore: 3.4.1-1
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.1
proxmox-mail-forward: 0.3.2
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.3.10
pve-cluster: 8.1.0
pve-container: 5.2.6
pve-docs: 8.4.0
pve-edk2-firmware: 4.2025.02-3
pve-esxi-import-tools: 0.7.3
pve-firewall: 5.1.1
pve-firmware: 3.15-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.2
pve-qemu-kvm: 9.2.0-5
pve-xtermjs: 5.5.0-2
qemu-server: 8.3.12
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.7-pve2

gfngfn256 · May 6, 2025

funtowne said:
nmi_watchdog

Have you tried without that watchdog enabled? AFAIK the NMI itself sometimes generates a high number of interrupts and therefore will impact server performance on its own - with that limited J3455 the outcome maybe your actual issue.

funtowne · May 6, 2025

gfngfn256 said:
Have you tried without that watchdog enabled? AFAIK the NMI itself sometimes generates a high number of interrupts and therefore will impact server performance on its own - with that limited J3455 the outcome maybe your actual issue.

Not yet, if it's a definite fix I'd be happy to give that a try. I'd rather be keen to find what changed between 6.8 / 6.11 and 6.14 to cause the issue... given the rock solid stability of the older kernels.

gfngfn256 · May 6, 2025

funtowne said:
Not yet, if it's a definite fix I'd be happy to give that a try. I'd rather be keen to find what changed between 6.8 / 6.11 and 6.14 to cause the issue

Don't see any harm in trying - and then you can still try discovering "what changed".

adolfotregosa · May 6, 2025

uzumo said:
Thanks so much!

The kernel patch is available and working well. Give it a try with your 9070 XT.

uzumo · May 7, 2025

adolfotregosa said:
The kernel patch is available and working well. Give it a try with your 9070 XT.

Thank you very much.

I have confirmed that rx9070xt works fine with patched pve-6.14.0-2.

We haven't been able to test everything yet, but we have always been able to pass the places where we had problems, so I don't think we'll have any problems.

rontex · May 9, 2025

Hello

CPU(s) 48 x Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz (2 Sockets)
Kernel Version Linux 6.14.0-2-pve (2025-04-10T17:57Z)
Boot Mode EFI
Manager Version pve-manager/8.4.1/2a5fa54a8503f96d

After upgrading to kernel 6.14, on several machines after a week or two I started catching the following errors in the logs:

[1996513.635917] ------------[ cut here ]------------
[1996513.635946] WARNING: CPU: 24 PID: 128499 at arch/x86/kvm/vmx/vmx.c:5247 handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.635993] Modules linked in: ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrac
k xt_tcpudp iptable_filter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel nvme_fabrics nvme_keyring nvme_core nvme_auth nf_tables nfnetlink_cttimeout bonding tls softdog sunrpc binfmt_misc openvswitch nsh nf_conncount nf_nat nf_con
ntrack nf_defrag_ipv6 nf_defrag_ipv4 psample nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common skx_edac_common nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_
ssif kvm polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl intel_cstate pcspkr acpi_power_meter mei_me ipmi_si acpi_ipmi mgag200 mei ioatdma i2c_algo_bit hpilo intel_pch_therma
l ipmi_devintf ipmi_msghandler joydev acpi_tad input_leds mac_hid vhost_net vhost
[1996513.636060] vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear dm_thin_pool
dm_persistent_data dm_bio_prison dm_bufio ses enclosure uas usb_storage hid_generic usbmouse usbkbd usbhid hid ixgbe smartpqi xhci_pci xfrm_algo dca scsi_transport_sas ehci_pci mdio xhci_hcd ehci_hcd lpc_ich wmi
[1996513.636286] CPU: 24 UID: 0 PID: 128499 Comm: CPU 1/KVM Tainted: P O 6.14.0-2-pve #1
[1996513.636305] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[1996513.636865] Hardware name: HPE ProLiant DL360 Gen10/ProLiant DL360 Gen10, BIOS U32 02/21/2025
[1996513.637405] RIP: 0010:handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.637944] Code: 00 01 e8 70 4f d7 ca 0f 0b 48 8b 55 d0 e9 b1 fb ff ff 44 89 c6 31 c9 45 31 c0 4c 89 e2 48 89 df e8 22 76 ea ff e9 6a fe ff ff <0f> 0b 4c 8b a3 d0 23 00 00 41 83 3c 24 30 0f 85 f5 02 00 00 48 89
[1996513.638964] RSP: 0018:ffff96f67c9778c8 EFLAGS: 00010246
[1996513.639458] RAX: ffffffffc13fdbd0 RBX: ffff8bfda80e4800 RCX: 0000000000000000
[1996513.639980] RDX: ffff8bfdaf0d3000 RSI: 0000000000000000 RDI: ffff8bfda80e4800
[1996513.640493] RBP: ffff96f67c977900 R08: 0000000000000000 R09: 0000000000000000
[1996513.641007] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000080000314
[1996513.641514] R13: 0000000080000314 R14: 0000000080000300 R15: 0000000000000000
[1996513.642033] FS: 00007b92ffdff6c0(0000) GS:ffff8bfb3fc00000(0000) knlGS:ffff9cc351680000
[1996513.642537] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1996513.643056] CR2: 00000812b8185020 CR3: 00000001c1598003 CR4: 00000000007726f0
[1996513.643580] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[1996513.644145] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[1996513.644632] PKRU: 55555554
[1996513.645139] Call Trace:
[1996513.645624] <TASK>
[1996513.646128] ? show_regs+0x6c/0x80
[1996513.646600] ? __warn+0x8d/0x150
[1996513.647090] ? handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.647568] ? report_bug+0x182/0x1b0
[1996513.648052] ? handle_bug+0x6e/0xb0
[1996513.648512] ? exc_invalid_op+0x18/0x80
[1996513.648967] ? asm_exc_invalid_op+0x1b/0x20
[1996513.649398] ? __pfx_handle_exception_nmi+0x10/0x10 [kvm_intel]
[1996513.649857] ? handle_exception_nmi+0x503/0xb70 [kvm_intel]
[1996513.650278] ? vmx_vcpu_enter_exit+0x14f/0x450 [kvm_intel]
[1996513.650687] vmx_handle_exit+0x1f4/0x8b0 [kvm_intel]
[1996513.651119] vcpu_enter_guest+0x4e8/0x1640 [kvm]
[1996513.651621] kvm_arch_vcpu_ioctl_run+0x35d/0x750 [kvm]
[1996513.652147] kvm_vcpu_ioctl+0x2c2/0xaa0 [kvm]
[1996513.652608] __x64_sys_ioctl+0xa4/0xe0
[1996513.653007] x64_sys_call+0xb45/0x2540
[1996513.653382] do_syscall_64+0x7e/0x170
[1996513.653732] ? kvm_set_msi+0xad/0xc0 [kvm]
[1996513.654198] ? kvm_send_userspace_msi+0x75/0xb0 [kvm]
[1996513.654599] ? kvm_vm_ioctl+0xe81/0x1aa0 [kvm]
[1996513.655020] ? kvm_arch_vcpu_ioctl_run+0x226/0x750 [kvm]
[1996513.655449] ? kvm_vcpu_ioctl+0x23e/0xaa0 [kvm]
[1996513.655851] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.656163] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.656460] ? do_syscall_64+0x8a/0x170
[1996513.656746] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.657055] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.657343] ? do_syscall_64+0x8a/0x170
[1996513.657609] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.657906] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.658180] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[1996513.658439] ? syscall_exit_to_user_mode+0x38/0x1d0
[1996513.658690] ? do_syscall_64+0x8a/0x170
[1996513.658967] ? clear_bhb_loop+0x15/0x70
[1996513.659213] ? clear_bhb_loop+0x15/0x70
[1996513.659438] ? clear_bhb_loop+0x15/0x70
[1996513.659658] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[1996513.659909] RIP: 0033:0x7bae90c67d1b
[1996513.660142] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00
[1996513.660623] RSP: 002b:00007b92ffdf9ee0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[1996513.660892] RAX: ffffffffffffffda RBX: 0000557a2ecc26e0 RCX: 00007bae90c67d1b
[1996513.661147] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000005e
[1996513.661390] RBP: 000000000000ae80 R08: 0000000000000000 R09: 0000000000000000
[1996513.661631] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[1996513.661902] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000
[1996513.662155] </TASK>
[1996513.662393] ---[ end trace 0000000000000000 ]---
[1996513.662639] kvm_intel: set kvm_intel.dump_invalid_vmcs=1 to dump internal KVM state.
[1996513.662926] kvm: #VE 899406080, spte[4] = 0x80000002e33c3907, spte[3] = 0x800000031bf63907, spte[2] = 0x860000378d200bf3

At the same time, there are no errors inside the VM.

agent: 1,freeze-fs-on-backup=0
balloon: 0
boot: order=scsi0;ide2;net0
cores: 16
cpu: host
ide2: none,media=cdrom
machine: q35
memory: 112640
meta: creation-qemu=8.0.2,ctime=1690795457
name: ******
net0: virtio=96:C2:2C:****,bridge=vmbr0,tag=300
numa: 0
onboot: 1
ostype: l26
protection: 1
scsi0: md-thinstorage:vm-206-disk-0,cache=none,format=raw,iothread=1,size=3840G,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=a6dfc75d-fe4f-4fc8-81e2-bb422cfa6fd3
sockets: 1
vmgenid: 3c5cb61d-4be5-468b-80b2-7f02b27705bd

funtowne · May 9, 2025

gfngfn256 said:
Don't see any harm in trying - and then you can still try discovering "what changed".

Disabling the `nmi_watchdog` seems to have returned things to stability after a few days of testing. I admittedly do not have deep knowledge of the workings of the kernel, but that appears to be what I needed to disable to prevent the igb driver instability as I posted in my first post.

zenowl77 · May 10, 2025

i may have a bug in the latest upgrade to 6.14.4-1-pve i just installed it and rebooted and noticed these errors in the log

Code:

May 10 01:45:33 prox kernel: ata4.00: status: { DRDY }
May 10 01:45:56 prox kernel: ata4.00: exception Emask 0x10 SAct 0x3 SErr 0x4050000 action 0xe frozen
May 10 01:45:56 prox kernel: ata4.00: irq_stat 0x00000040, connection status changed
May 10 01:45:56 prox kernel: ata4: SError: { PHYRdyChg CommWake DevExch }
May 10 01:45:56 prox kernel: ata4.00: failed command: WRITE FPDMA QUEUED
May 10 01:45:56 prox kernel: ata4.00: cmd 61/08:00:38:f6:5b/00:00:27:00:00/40 tag 0 ncq dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)

and i know this indicates a hardware error and i will investigate it further. but it just popped up after updating and the server has not been touched, the smart data shows nothing wrong with the drive, it is a brand new SATA SSD with 0% wear that was working perfectly fine up until the reboot after updating.

i found one post here, saying it is basically either a bad cable or a NCQ bug (if it is not the drive going bad) which i do not think it is the drive or the cable, or the error would have been present before, either way though i will perform more checks/tests later (probably tomorrow).

hardware:
CPU: i7-7820X
motherboard: Gigabyte X299 UD4
LSPCI output:

Code:

00:00.0 Host bridge: Intel Corporation Sky Lake-E DMI3 Registers (rev 04)
00:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.1 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.2 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.3 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.4 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.5 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.6 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:04.7 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 04)
00:05.0 System peripheral: Intel Corporation Sky Lake-E MM/Vt-d Configuration Registers (rev 04)
00:05.2 System peripheral: Intel Corporation Sky Lake-E RAS (rev 04)
00:05.4 PIC: Intel Corporation Sky Lake-E IOAPIC (rev 04)
00:08.0 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:08.1 Performance counters: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:08.2 System peripheral: Intel Corporation Sky Lake-E Ubox Registers (rev 04)
00:14.0 USB controller: Intel Corporation 200 Series/Z370 Chipset Family USB 3.0 xHCI Controller
00:16.0 Communication controller: Intel Corporation 200 Series PCH CSME HECI #1
00:17.0 SATA controller: Intel Corporation 200 Series PCH SATA controller [AHCI mode]
00:1b.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #17 (rev f0)
00:1c.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #1 (rev f0)
00:1c.4 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #5 (rev f0)
00:1d.0 PCI bridge: Intel Corporation 200 Series PCH PCI Express Root Port #9 (rev f0)
00:1f.0 ISA bridge: Intel Corporation X299 Chipset LPC/eSPI Controller
00:1f.2 Memory controller: Intel Corporation 200 Series/Z370 Chipset Family Power Management Controller
00:1f.3 Audio device: Intel Corporation 200 Series PCH HD Audio
00:1f.4 SMBus: Intel Corporation 200 Series/Z370 Chipset Family SMBus Controller
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-V
01:00.0 Non-Volatile memory controller: Sandisk Corp WD Blue SN550 NVMe SSD (rev 01)
03:00.0 USB controller: ASMedia Technology Inc. ASM2142/ASM3142 USB 3.1 Host Controller
16:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04)
16:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
16:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
16:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04)
16:08.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:08.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:09.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:09.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.4 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.5 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.6 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0e.7 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0f.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:0f.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.0 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.1 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.2 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1d.3 System peripheral: Intel Corporation Sky Lake-E CHA Registers (rev 04)
16:1e.0 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.1 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.2 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.3 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.4 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.5 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
16:1e.6 System peripheral: Intel Corporation Sky Lake-E PCU Registers (rev 04)
17:00.0 3D controller: NVIDIA Corporation GP104GL [Tesla P4] (rev a1)
64:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04)
64:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
64:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
64:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04)
64:08.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:09.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0a.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 04)
64:0a.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 04)
64:0a.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 04)
64:0b.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 04)
64:0b.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 04)
64:0b.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 04)
64:0b.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 04)
64:0c.0 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.1 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.2 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.3 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.4 System peripheral: Intel Corporation Sky Lake-E Integrated Memory Controller (rev 04)
64:0c.5 System peripheral: Intel Corporation Sky Lake-E LM Channel 1 (rev 04)
64:0c.6 System peripheral: Intel Corporation Sky Lake-E LMS Channel 1 (rev 04)
64:0c.7 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 1 (rev 04)
64:0d.0 System peripheral: Intel Corporation Sky Lake-E DECS Channel 2 (rev 04)
64:0d.1 System peripheral: Intel Corporation Sky Lake-E LM Channel 2 (rev 04)
64:0d.2 System peripheral: Intel Corporation Sky Lake-E LMS Channel 2 (rev 04)
64:0d.3 System peripheral: Intel Corporation Sky Lake-E LMDP Channel 2 (rev 04)
65:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01)
66:01.0 PCI bridge: Intel Corporation Device 4fa4
66:04.0 PCI bridge: Intel Corporation Device 4fa4
67:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A310] (rev 05)
68:00.0 Audio device: Intel Corporation DG2 Audio Controller
b2:03.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port D (rev 04)
b2:05.0 System peripheral: Intel Corporation Sky Lake-E VT-d (rev 04)
b2:05.2 System peripheral: Intel Corporation Sky Lake-E RAS Configuration Registers (rev 04)
b2:05.4 PIC: Intel Corporation Sky Lake-E IOxAPIC Configuration Registers (rev 04)
b2:12.0 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:12.1 Performance counters: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:12.2 System peripheral: Intel Corporation Sky Lake-E M3KTI Registers (rev 04)
b2:15.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:16.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:16.4 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b2:17.0 System peripheral: Intel Corporation Sky Lake-E M2PCI Registers (rev 04)
b3:00.0 Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)

scyto · May 11, 2025

I'm having issues with a kernel panic caused by patched NVIDIA grid drivers, so far i have tried 550-144 and 570.133, next up will try 553.230, if that doesn't work i will fall back to production kernel and retest. This is the last items before i see ascii tux on the console.

Code:

May 10 14:17:55 pve-nas1 kernel: nvidia-vgpu-mgr[2201]: segfault at 130 ip 0000783fa399d42a sp 00007ffdfa156fa0 error 4 in libnvidia-vgpu.so.570.133.10[f742a,783fa3929000+21a000] likely on CPU 14 (core 14, socket 0)
May 10 14:17:55 pve-nas1 kernel: Code: 55 48 89 e5 41 56 41 55 41 54 53 44 8b 66 3c 45 85 e4 74 74 44 8b 97 1c 2a 00 00 49 89 fd 48 89 f3 45 85 d2 74 72 48 8b 47 60 <44> 8b 88 30 01 00 00 45 85 c9 74 62 44 89 e2 45 31 e4 48 c7 43 10
May 10 14:18:04 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: No mdev vendor driver request callback support, blocked until released by user
May 10 14:18:20 pve-nas1 kernel: nvidia-vgpu-vfio 00000015-0000-0000-0000-000000000100: Removing from iommu group 88
May 10 14:18:20 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: Failed to post delete device event, 0x56
May 10 14:18:20 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: vGPU destroy failed: 0xfffffffb
May 10 14:18:20 pve-nas1 kernel: [nvidia-vgpu-vfio] 00000015-0000-0000-0000-000000000100: Failed to destroy vGPU device, ret: -5
May 10 14:18:28 pve-nas1 kernel: watchdog: watchdog0: watchdog did not stop!
May 10 14:18:28 pve-nas1 systemd-shutdown[1]: Using hardware watchdog 'Software Watchdog', version 0, device /dev/watchdog0
May 10 14:18:28 pve-nas1 systemd-shutdown[1]: Watchdog running with a hardware timeout of 10min.
May 10 14:18:28 pve-nas1 systemd-shutdown[1]: Syncing filesystems and block devices.
May 10 14:18:33 pve-nas1 systemd-shutdown[1]: Sending SIGTERM to remaining processes...
May 10 14:18:33 pve-nas1 systemd-journald[1252]: Received SIGTERM from PID 1 (systemd-shutdow).

scyto · May 11, 2025

I also get this drm failure because it seems to be trying to enable drm on the BMC video chipset - don't recall these on earlier kernels

Code:

May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input5
May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input6
May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input7
May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] *ERROR* Link training failed
May 10 14:20:38 pve-nas1 kernel: ------------[ cut here ]------------
May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] drm_WARN_ON(!__ast_dp_wait_enable(ast, enabled))
May 10 14:20:38 pve-nas1 kernel: WARNING: CPU: 0 PID: 470 at drivers/gpu/drm/ast/ast_dp.c:221 ast_dp_set_enable+0xea/0x110 [ast]
May 10 14:20:38 pve-nas1 kernel: Modules linked in: amd_atl intel_rapl_msr intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd ipmi_ssif kvm_amd kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec polyval_clmul>
May 10 14:20:38 pve-nas1 kernel: CPU: 0 UID: 0 PID: 470 Comm: kworker/0:13 Tainted: P           O       6.14.0-2-pve #1
May 10 14:20:38 pve-nas1 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
May 10 14:20:38 pve-nas1 kernel: Hardware name:  GENOAD8UD-2T/X550/GENOAD8UD-2T/X550, BIOS 11.01 01/23/2025
May 10 14:20:38 pve-nas1 kernel: Workqueue: events work_for_cpu_fn
May 10 14:20:38 pve-nas1 kernel: RIP: 0010:ast_dp_set_enable+0xea/0x110 [ast]
May 10 14:20:38 pve-nas1 kernel: Code: 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 0f fc ff d6 48 c7 c1 70 a0 5f c1 48 89 da 48 c7 c7 11 a4 5f c1 48 89 c6 e8 a6 06 59 d6 <0f> 0b 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff
May 10 14:20:38 pve-nas1 kernel: RSP: 0018:ff6418f1812c7750 EFLAGS: 00010246
May 10 14:20:38 pve-nas1 kernel: RAX: 0000000000000000 RBX: ff3a51d982e96b60 RCX: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: RBP: ff6418f1812c7778 R08: 0000000000000000 R09: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000003e8
May 10 14:20:38 pve-nas1 kernel: R13: 0000000000000010 R14: ff3a51d98e124000 R15: ff6418f1814403d4
May 10 14:20:38 pve-nas1 kernel: FS:  0000000000000000(0000) GS:ff3a5208cc800000(0000) knlGS:0000000000000000
May 10 14:20:38 pve-nas1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 14:20:38 pve-nas1 kernel: CR2: 00006456ffb7601c CR3: 000000012df4e003 CR4: 0000000000f71ef0
May 10 14:20:38 pve-nas1 kernel: PKRU: 55555554
May 10 14:20:38 pve-nas1 kernel: Call Trace:
May 10 14:20:38 pve-nas1 kernel:  <TASK>
May 10 14:20:38 pve-nas1 kernel:  ? show_regs+0x6c/0x80
May 10 14:20:38 pve-nas1 kernel:  ? __warn+0x8d/0x150

tentori · May 12, 2025

Can you please add support (driver ) 8126 realtek ?

funtowne · May 13, 2025

scyto said:

I also get this drm failure because it seems to be trying to enable drm on the BMC video chipset - don't recall these on earlier kernels

Code:

May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input5
May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input6
May 10 14:20:38 pve-nas1 kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:20/0000:20:01.1/0000:21:00.1/sound/card0/input7
May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] *ERROR* Link training failed
May 10 14:20:38 pve-nas1 kernel: ------------[ cut here ]------------
May 10 14:20:38 pve-nas1 kernel: ast 0000:ab:00.0: [drm] drm_WARN_ON(!__ast_dp_wait_enable(ast, enabled))
May 10 14:20:38 pve-nas1 kernel: WARNING: CPU: 0 PID: 470 at drivers/gpu/drm/ast/ast_dp.c:221 ast_dp_set_enable+0xea/0x110 [ast]
May 10 14:20:38 pve-nas1 kernel: Modules linked in: amd_atl intel_rapl_msr intel_rapl_common amd64_edac snd_hda_codec_hdmi edac_mce_amd ipmi_ssif kvm_amd kvm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec polyval_clmul>
May 10 14:20:38 pve-nas1 kernel: CPU: 0 UID: 0 PID: 470 Comm: kworker/0:13 Tainted: P           O       6.14.0-2-pve #1
May 10 14:20:38 pve-nas1 kernel: Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
May 10 14:20:38 pve-nas1 kernel: Hardware name:  GENOAD8UD-2T/X550/GENOAD8UD-2T/X550, BIOS 11.01 01/23/2025
May 10 14:20:38 pve-nas1 kernel: Workqueue: events work_for_cpu_fn
May 10 14:20:38 pve-nas1 kernel: RIP: 0010:ast_dp_set_enable+0xea/0x110 [ast]
May 10 14:20:38 pve-nas1 kernel: Code: 48 8b 5f 50 48 85 db 75 03 48 8b 1f e8 0f fc ff d6 48 c7 c1 70 a0 5f c1 48 89 da 48 c7 c7 11 a4 5f c1 48 89 c6 e8 a6 06 59 d6 <0f> 0b 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff
May 10 14:20:38 pve-nas1 kernel: RSP: 0018:ff6418f1812c7750 EFLAGS: 00010246
May 10 14:20:38 pve-nas1 kernel: RAX: 0000000000000000 RBX: ff3a51d982e96b60 RCX: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: RBP: ff6418f1812c7778 R08: 0000000000000000 R09: 0000000000000000
May 10 14:20:38 pve-nas1 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000003e8
May 10 14:20:38 pve-nas1 kernel: R13: 0000000000000010 R14: ff3a51d98e124000 R15: ff6418f1814403d4
May 10 14:20:38 pve-nas1 kernel: FS:  0000000000000000(0000) GS:ff3a5208cc800000(0000) knlGS:0000000000000000
May 10 14:20:38 pve-nas1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 10 14:20:38 pve-nas1 kernel: CR2: 00006456ffb7601c CR3: 000000012df4e003 CR4: 0000000000f71ef0
May 10 14:20:38 pve-nas1 kernel: PKRU: 55555554
May 10 14:20:38 pve-nas1 kernel: Call Trace:
May 10 14:20:38 pve-nas1 kernel:  <TASK>
May 10 14:20:38 pve-nas1 kernel:  ? show_regs+0x6c/0x80
May 10 14:20:38 pve-nas1 kernel:  ? __warn+0x8d/0x150

Do you have the “firmware-ast” package installed?

scyto · May 13, 2025

funtowne said:
Do you have the “firmware-ast” package installed?

not that i know of, also drivers tend to be a kernel thing?

its not a general package

Code:

root@pve-nas1:~#  apt -y install firmware-ast
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Package firmware-ast is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is only available from another source

the error goes away when i move back to the 6.8 kernel so either way it still appears as a regression

i am back on 6.8 all nodes give the massive system instabilities i was having on one node (it kept rebooting in some cases and kernel panicing in others) - i think there are multiple issues esp once one startings passing through large numbers of pcie devices (i have used all 16), i am soak testing on 6.8 to elimintae the m obo BIOS being an issue (i had updated that too and the nvidia vgpu drivers) may try moviing again to 6.14 later in week.

chat gpt tells me to install `apt update && apt install -y firmware-misc-nonfree` to get the AST firmware? interesting it isn't needed on 6.8 ?
(maybe the driver was moved to the kernel in full / firmware included in 6.14?)

--edit--
wonder if it is related to this CVE-2025-21747: drm/ast: astdp: Fix timeout for enabling video signal whats odd is the fix is 6.13+ so should already be in kernel, unless this is a new variant of the issue....

soupdiver · May 15, 2025

I also have some issues with 6.14 after upgrading on different machines.
A NUC that I have and a workstation with a Threadripper became unstable.

Machines become unrespomsive or my workstation simply shuts down. No indicator for problems but I have not digged deep because Im not sure where.
Reboot works fine and all is working fine again.

Reverted to 6.11 for now

Taomyn · May 15, 2025

soupdiver said:
I also have some issues with 6.14 after upgrading on different machines.
A NUC that I have and a workstation with a Threadripper became unstable.

Machines become unrespomsive or my workstation simply shuts down. No indicator for problems but I have not digged deep because Im not sure where.
Reboot works fine and all is working fine again.

Reverted to 6.11 for now

That's sounding similar to issues I have with any Fedora 41/42 VMs when they try to use any 6.14 kernel, and seems to be linked to balloon memory as it's not reacting fast enough to requests for more RAM and the OOM watchdog is killing processes trying to keep the server "running". Eventually it either locks up the VM or reboots it. Only fixes I've found for it so far is to either disable ballooning thus fixing the amount of allocated RAM or reverting back to 6.13

I know this is not quite the same, as this is for the host running 6.14, but perhaps something similar is also happening here. I've tried 3 releases of 6.14 for Fedora so far.

Strangely my host running this 6.14 has so far been ok.

soupdiver · May 15, 2025

Taomyn said:
That's sounding similar to issues I have with any Fedora 41/42 VMs when they try to use any 6.14 kernel, and seems to be linked to balloon memory as it's not reacting fast enough to requests for more RAM and the OOM watchdog is killing processes trying to keep the server "running". Eventually it either locks up the VM or reboots it. Only fixes I've found for it so far is to either disable ballooning thus fixing the amount of allocated RAM or reverting back to 6.13

I know this is not quite the same, as this is for the host running 6.14, but perhaps something similar is also happening here. I've tried 3 releases of 6.14 for Fedora so far.

Strangely my host running this 6.14 has so far been ok.

My NUC had an unrelated problem... some kernel panic of the network card because of driver.
But the workstation "seems" more stable again on 6.11... will observe

Uturn · May 17, 2025

I also gave the latest 6.14 kernel (6.14.4-1) a try with my EPYC 9474F/Supermicro H13SSL setup, but it runs again in the bootloop because of ae4dma issues.

@Stoiko Ivanov is the "removing of deprecated PCI IDs (https://github.com/torvalds/linux/commit/b87c29c007e80f4737a056b3c5c21b5b5106b7f7)" finally in?

Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

Member

Renowned Member

Member

Distinguished Member

Member

Distinguished Member

Well-Known Member

Active Member

Member

Member

Active Member

Well-Known Member

Well-Known Member

New Member

Member

Well-Known Member

Active Member

Active Member

Active Member

Member

We value your privacy