Opt-in Linux 6.14 Kernel for Proxmox VE 8 available on test & no-subscription

I have one strange error:
Kernel 6.14 ,lxc with mysql on it.
/var/log/kern.log:3651:2025-04-10T10:35:40.646702+02:00 sp19 kernel: [236703.525309] audit: type=1400 audit(1744274140.640:2244): apparmor="DENIED" operation="create" class="net" namespace="root//lxc-1193_<-var-lib-lxc>" profile="/usr/sbin/mysqld" pid=2674395 comm="mysqld" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none
This doesnt happen with 6.11 kernel.
I Have this same error and its preventing Docker in LXC from working for me, have to revert to 6.11 for now
 
I Have this same error and its preventing Docker in LXC from working for me, have to revert to 6.11 for now

I have one strange error:
Kernel 6.14 ,lxc with mysql on it.
/var/log/kern.log:3651:2025-04-10T10:35:40.646702+02:00 sp19 kernel: [236703.525309] audit: type=1400 audit(1744274140.640:2244): apparmor="DENIED" operation="create" class="net" namespace="root//lxc-1193_<-var-lib-lxc>" profile="/usr/sbin/mysqld" pid=2674395 comm="mysqld" family="unix" sock_type="stream" protocol=0 requested="create" denied="create" addr=none
This doesnt happen with 6.11 kernel.

Maybe (not sure) this is linked to this bug; here & again here?

Edit: I now see it has been picked up here in forums.
 
While booting I have received this error message. It was not present under previous kernels:

[Thu Apr 17 13:57:49 2025] vmbr0: port 2(veth101i0) entered blocking state
[Thu Apr 17 13:57:49 2025] vmbr0: port 2(veth101i0) entered forwarding state
[Thu Apr 17 13:57:49 2025] ------------[ cut here ]------------
[Thu Apr 17 13:57:49 2025] WARNING: CPU: 11 PID: 10245 at net/bridge/br_netfilter_hooks.c:602 br_nf_local_in+0x1b9/0x1e0
[Thu Apr 17 13:57:49 2025] Modules linked in: cfg80211 veth ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_NFLOG xt_limit xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel scsi_transport_iscsi softdog nf_tables nvme_fabrics nvme_keyring 8021q garp mrp sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink ipmi_ssif amd_atl intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd irdma i40e kvm vhost_net vhost ast ib_uverbs vhost_iotlb jc42 acpi_ipmi rapl ib_core ipmi_si wmi_bmof ccp pcspkr i2c_algo_bit k10temp ipmi_devintf ee1004 ptdma ipmi_msghandler joydev input_leds mac_hid tap efi_pstore dmi_sysfs ip_tables x_tables autofs4 zfs(PO) spl(O) btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 raid0 linear uas usb_storage
[Thu Apr 17 13:57:49 2025] hid_generic usbmouse usbhid hid rndis_host cdc_ether usbnet mii polyval_clmulni polyval_generic ghash_clmulni_intel xhci_pci sha256_ssse3 ice nvme sha1_ssse3 gnss ahci bnxt_en libie nvme_core libahci xhci_hcd i2c_piix4 i2c_smbus nvme_auth wmi aesni_intel crypto_simd cryptd
[Thu Apr 17 13:57:49 2025] CPU: 11 UID: 192 PID: 10245 Comm: systemd-network Tainted: P O 6.14.0-2-pve #1
[Thu Apr 17 13:57:49 2025] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE
[Thu Apr 17 13:57:49 2025] Hardware name: Supermicro AS -1114S-WN10RT/H12SSW-NTR, BIOS 3.0 07/29/2024
[Thu Apr 17 13:57:49 2025] RIP: 0010:br_nf_local_in+0x1b9/0x1e0
[Thu Apr 17 13:57:49 2025] Code: df e8 4b 49 d9 ff 66 83 ab b8 00 00 00 08 eb 92 be 04 00 00 00 48 89 df e8 34 49 d9 ff 66 83 ab b8 00 00 00 04 e9 78 ff ff ff <0f> 0b e9 9b fe ff ff 0f 0b e9 e7 fe ff ff 4c 89 e7 e8 f1 d2 e7 ff
[Thu Apr 17 13:57:49 2025] RSP: 0018:ffffad9f80700990 EFLAGS: 00010202
[Thu Apr 17 13:57:49 2025] RAX: 0000000000000002 RBX: ffff9acf12fd9300 RCX: 0000000000000000
[Thu Apr 17 13:57:49 2025] RDX: ffffad9f80700a00 RSI: ffff9acf12fd9300 RDI: 0000000000000000
[Thu Apr 17 13:57:49 2025] RBP: ffffad9f807009b0 R08: 0000000000000000 R09: 0000000000000000
[Thu Apr 17 13:57:49 2025] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9ad194b7a500
[Thu Apr 17 13:57:49 2025] R13: ffffad9f80700a00 R14: 0000000000000001 R15: ffff9acfa928d180
[Thu Apr 17 13:57:49 2025] FS: 00007ce6ce3e0bc0(0000) GS:ffff9b4c4dd80000(0000) knlGS:0000000000000000
[Thu Apr 17 13:57:49 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Thu Apr 17 13:57:49 2025] CR2: 00005750ac359cd0 CR3: 000000079f35c003 CR4: 0000000000f70ef0
[Thu Apr 17 13:57:49 2025] PKRU: 55555554
[Thu Apr 17 13:57:49 2025] Call Trace:
[Thu Apr 17 13:57:49 2025] <IRQ>
[Thu Apr 17 13:57:49 2025] ? show_regs+0x6c/0x80
[Thu Apr 17 13:57:49 2025] ? __warn+0x8d/0x150
[Thu Apr 17 13:57:49 2025] ? br_nf_local_in+0x1b9/0x1e0
[Thu Apr 17 13:57:49 2025] ? report_bug+0x182/0x1b0
[Thu Apr 17 13:57:49 2025] ? handle_bug+0x6e/0xb0
[Thu Apr 17 13:57:49 2025] ? exc_invalid_op+0x18/0x80
[Thu Apr 17 13:57:49 2025] ? asm_exc_invalid_op+0x1b/0x20
[Thu Apr 17 13:57:49 2025] ? br_nf_local_in+0x1b9/0x1e0
[Thu Apr 17 13:57:49 2025] nf_hook_slow+0x46/0x120
[Thu Apr 17 13:57:49 2025] br_pass_frame_up+0x146/0x1d0
[Thu Apr 17 13:57:49 2025] ? __pfx_br_netif_receive_skb+0x10/0x10
[Thu Apr 17 13:57:49 2025] br_handle_frame_finish+0x3ab/0x690
[Thu Apr 17 13:57:49 2025] ? __pfx_br_handle_frame_finish+0x10/0x10
[Thu Apr 17 13:57:49 2025] br_nf_hook_thresh+0x10a/0x120
[Thu Apr 17 13:57:49 2025] ? __pfx_br_handle_frame_finish+0x10/0x10
[Thu Apr 17 13:57:49 2025] br_nf_pre_routing_finish+0x17e/0x390
[Thu Apr 17 13:57:49 2025] ? __pfx_br_handle_frame_finish+0x10/0x10
[Thu Apr 17 13:57:49 2025] ? ipv4_conntrack_in+0x14/0x20 [nf_conntrack]
[Thu Apr 17 13:57:49 2025] br_nf_pre_routing+0x24b/0x5f0
[Thu Apr 17 13:57:49 2025] ? __pfx_br_nf_pre_routing_finish+0x10/0x10
[Thu Apr 17 13:57:49 2025] br_handle_frame+0x2a3/0x440
[Thu Apr 17 13:57:49 2025] ? __pfx_br_handle_frame_finish+0x10/0x10
[Thu Apr 17 13:57:49 2025] __netif_receive_skb_core.constprop.0+0x29a/0x1250
[Thu Apr 17 13:57:49 2025] ? sched_clock+0x10/0x30
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? psi_task_change+0x89/0xc0
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] __netif_receive_skb_one_core+0x3e/0xa0
[Thu Apr 17 13:57:49 2025] __netif_receive_skb+0x15/0x60
[Thu Apr 17 13:57:49 2025] process_backlog+0x90/0x160
[Thu Apr 17 13:57:49 2025] __napi_poll+0x33/0x1f0
[Thu Apr 17 13:57:49 2025] net_rx_action+0x20c/0x400
[Thu Apr 17 13:57:49 2025] handle_softirqs+0xda/0x2e0
[Thu Apr 17 13:57:49 2025] __do_softirq+0x10/0x18
[Thu Apr 17 13:57:49 2025] do_softirq.part.0+0x3f/0x80
[Thu Apr 17 13:57:49 2025] </IRQ>
[Thu Apr 17 13:57:49 2025] <TASK>
[Thu Apr 17 13:57:49 2025] __local_bh_enable_ip+0x6e/0x70
[Thu Apr 17 13:57:49 2025] __dev_queue_xmit+0x278/0x1010
[Thu Apr 17 13:57:49 2025] ? __alloc_skb+0x60/0x1b0
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? alloc_skb_with_frags+0x61/0x240
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] packet_xmit+0xae/0x120
[Thu Apr 17 13:57:49 2025] packet_sendmsg+0xabd/0x1980
[Thu Apr 17 13:57:49 2025] __sys_sendto+0x242/0x250
[Thu Apr 17 13:57:49 2025] __x64_sys_sendto+0x24/0x40
[Thu Apr 17 13:57:49 2025] x64_sys_call+0x1d04/0x2540
[Thu Apr 17 13:57:49 2025] do_syscall_64+0x7e/0x170
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? __handle_mm_fault+0x840/0x10b0
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? __count_memcg_events+0xc0/0x160
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? count_memcg_events.constprop.0+0x2a/0x50
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? handle_mm_fault+0xae/0x360
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? do_user_addr_fault+0x1ec/0x830
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? irqentry_exit_to_user_mode+0x2d/0x1d0
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? irqentry_exit+0x43/0x50
[Thu Apr 17 13:57:49 2025] ? srso_alias_return_thunk+0x5/0xfbef5
[Thu Apr 17 13:57:49 2025] ? exc_page_fault+0x96/0x1e0
[Thu Apr 17 13:57:49 2025] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[Thu Apr 17 13:57:49 2025] RIP: 0033:0x7ce6cecccff7
[Thu Apr 17 13:57:49 2025] Code: c7 c0 ff ff ff ff eb be 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d d5 a0 0f 00 00 41 89 ca 74 10 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 69 c3 55 48 89 e5 53 48 83 ec 38 44 89 4d d0
[Thu Apr 17 13:57:49 2025] RSP: 002b:00007ffc854642d8 EFLAGS: 00000202 ORIG_RAX: 000000000000002c
[Thu Apr 17 13:57:49 2025] RAX: ffffffffffffffda RBX: 00005750d989a0e0 RCX: 00007ce6cecccff7
[Thu Apr 17 13:57:49 2025] RDX: 0000000000000141 RSI: 00005750d9896fd0 RDI: 0000000000000013
[Thu Apr 17 13:57:49 2025] RBP: 00007ffc854642e0 R08: 00005750d989a120 R09: 0000000000000014
[Thu Apr 17 13:57:49 2025] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000
[Thu Apr 17 13:57:49 2025] R13: 00005750d9896fd0 R14: 0000000000000141 R15: 0000000000000150
[Thu Apr 17 13:57:49 2025] </TASK>
[Thu Apr 17 13:57:49 2025] ---[ end trace 0000000000000000 ]---
 
I upgraded my Epyc node to 8.4 and opt in kernel 6.14. This caused my TrueNAS VM to fail to start with nothing in the logs.

"qm start 108" would simply hang forever. Shutdown of the host would never timeout and I had to power off. The VM would start if I removed the PCIe SATA controller PCIe hardware shared to it. Motherboard is an Asrock Rack SIENAD8-2L2T and PCIE7 slot is set to SATA where I have 5 disks that the TrueNAS VM uses normally.

I reverted to kernel 6.11 and the VM booted fine again.

Some more info here: https://forum.proxmox.com/threads/t...e-to-8-4-need-urgent-help.165189/#post-764699
 
I have an lxc that suddenly gets the error

Code:
Failed to open /dev/vhost-net: No such file or directory

On this kernel ... Anyone any ideas?
 
16 x AMD Opteron(tm) Processor 6380 (1 Socket)
PVE 8.4.1
ZFS, also as rootfs
Serial Attached SCSI controller: Broadcom / LSI SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03)
Kernel 6.14.0-2-pve
VM: PBS 3.4.1-1 with kernel 6.14.0-2-pve
No issues so far.
 
Hm - Not sure if I've had more issues since moving kernel

Couple of VMs died and a server reboot..One just now...

Unsure if it's me though - As I'm also trying get a stupid printer to work - Few extracts below - Also a line about tainted kernel?

Code:
Apr 23 14:43:45 proxServ kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
Apr 23 14:43:45 proxServ kernel: #PF: supervisor instruction fetch in kernel mode
Apr 23 14:43:45 proxServ kernel: #PF: error_code(0x0010) - not-present page


Code:
qemu:cpus_kick_thread: Invalid argumentkvm: warning: Spice: playback:0 (0x570618701c00): c>
Apr 23 14:44:22 proxServ QEMU[7402]: kvm: warning: Spice: record:0 (0x570618b4f0b0): channel->thread_id (0x742c6821a480) != pth>

Code:
kvm_amd: kvm [88219]: vcpu5, guest rIP: 0xfffff85e8d33b8e5 Unhandled WRMSR(0xc0010115) = 0x0,
 
Yep - Another CPU 100%, then VM crash - I think is kernel related?
Code:
Apr 24 11:13:43 proxServ kernel: ------------[ cut here ]------------
Apr 24 11:13:43 proxServ kernel: WARNING: CPU: 9 PID: 88247 at arch/x86/kvm/svm/nested.c:1212 svm_free_nested+0xb2/0xe0 [kvm_amd]
Apr 24 11:13:43 proxServ kernel: Modules linked in: iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 tcp_di>
Apr 24 11:13:43 proxServ kernel:  vfio iommufd parport_pc ppdev lp parport efi_pstore dmi_sysfs ip_tables x_tables autofs4 hid_jabra u>
Apr 24 11:13:43 proxServ kernel: CPU: 9 UID: 0 PID: 88247 Comm: CPU 3/KVM Tainted: P      D    OE      6.14.0-2-pve #1
Apr 24 11:13:43 proxServ kernel: Tainted: [P]=PROPRIETARY_MODULE, [D]=DIE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Apr 24 11:13:43 proxServ kernel: Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 5013 03/22/2024
Apr 24 11:13:43 proxServ kernel: RIP: 0010:svm_free_nested+0xb2/0xe0 [kvm_amd]
Apr 24 11:13:43 proxServ kernel: Code: 00 00 00 48 c7 83 68 1a 00 00 00 00 00 00 48 c7 83 a0 1a 00 00 ff ff ff ff 5b 41 5c 5d 31 c0 31>
Apr 24 11:13:43 proxServ kernel: RSP: 0018:ffffad31befdf8b8 EFLAGS: 00010206
Apr 24 11:13:43 proxServ kernel: RAX: ffff9a280e0a4000 RBX: ffff9a2c75cd39c0 RCX: 0000000000000000
Apr 24 11:13:43 proxServ kernel: RDX: 000000000000004d RSI: 0000000000000000 RDI: ffff9a2c75cd39c0
Apr 24 11:13:43 proxServ kernel: RBP: ffffad31befdf8c8 R08: 0000000000000000 R09: 0000000000000000
Apr 24 11:13:43 proxServ kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001
Apr 24 11:13:43 proxServ kernel: R13: 0000000000005d01 R14: ffff9a2c75cd39c0 R15: 0000000000000001
Apr 24 11:13:43 proxServ kernel: FS:  00007d088ddff6c0(0000) GS:ffff9a45af080000(0000) knlGS:0000000000000000
Apr 24 11:13:43 proxServ kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 24 11:13:43 proxServ kernel: CR2: 00007d0630004014 CR3: 000000076cb36000 CR4: 0000000000f50ef0
Apr 24 11:13:43 proxServ kernel: PKRU: 55555554
Apr 24 11:13:43 proxServ kernel: Call Trace:
Apr 24 11:13:43 proxServ kernel:  <TASK>
Apr 24 11:13:43 proxServ kernel:  ? show_regs+0x6c/0x80
Apr 24 11:13:43 proxServ kernel:  ? __warn+0x8d/0x150
Apr 24 11:13:43 proxServ kernel:  ? svm_free_nested+0xb2/0xe0 [kvm_amd]
Apr 24 11:13:43 proxServ kernel:  ? report_bug+0x182/0x1b0
Apr 24 11:13:43 proxServ kernel:  ? handle_bug+0x6e/0xb0
Apr 24 11:13:43 proxServ kernel:  ? exc_invalid_op+0x18/0x80
Apr 24 11:13:43 proxServ kernel:  ? asm_exc_invalid_op+0x1b/0x20
Apr 24 11:13:43 proxServ kernel:  ? svm_free_nested+0xb2/0xe0 [kvm_amd]
Apr 24 11:13:43 proxServ kernel:  ? svm_set_gif+0xd4/0x1d0 [kvm_amd]
Apr 24 11:13:43 proxServ kernel:  svm_set_efer+0x142/0x170 [kvm_amd]
Apr 24 11:13:43 proxServ kernel:  __set_sregs_common.constprop.0+0x270/0x520 [kvm]
Apr 24 11:13:43 proxServ kernel:  kvm_arch_vcpu_ioctl+0x151f/0x1a20 [kvm]
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? __slab_free+0xdf/0x2a0
Apr 24 11:13:43 proxServ kernel:  ? __sigqueue_free+0x3d/0xa0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  kvm_vcpu_ioctl+0x70f/0xaa0 [kvm]
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? kvm_vcpu_ioctl+0x70f/0xaa0 [kvm]
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? get_sigframe+0x103/0x2f0
Apr 24 11:13:43 proxServ kernel:  __x64_sys_ioctl+0xa7/0xe0
Apr 24 11:13:43 proxServ kernel:  x64_sys_call+0xb45/0x2540
Apr 24 11:13:43 proxServ kernel:  do_syscall_64+0x7e/0x170
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? fpu__restore_sig+0x8e/0xc0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? restore_sigcontext+0x187/0x1f0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? __do_sys_rt_sigreturn+0xe2/0x100
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? syscall_exit_to_user_mode+0x38/0x1d0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? do_syscall_64+0x8a/0x170
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? syscall_exit_to_user_mode+0x38/0x1d0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? do_syscall_64+0x8a/0x170
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? arch_exit_to_user_mode_prepare.constprop.0+0x22/0xd0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? syscall_exit_to_user_mode+0x38/0x1d0
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? do_syscall_64+0x8a/0x170
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? irqentry_exit+0x43/0x50
Apr 24 11:13:43 proxServ kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 24 11:13:43 proxServ kernel:  ? exc_page_fault+0x96/0x1e0
Apr 24 11:13:43 proxServ kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Apr 24 11:13:43 proxServ kernel: RIP: 0033:0x7d089915ad1b
Apr 24 11:13:43 proxServ kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89>
Apr 24 11:13:43 proxServ kernel: RSP: 002b:00007d088ddf9c40 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 24 11:13:43 proxServ kernel: RAX: ffffffffffffffda RBX: 0000593a3c0f2000 RCX: 00007d089915ad1b
Apr 24 11:13:43 proxServ kernel: RDX: 00007d088ddf9dc0 RSI: 000000004140aecd RDI: 0000000000000029
Apr 24 11:13:43 proxServ kernel: RBP: 000000004140aecd R08: 0000000000000000 R09: 0000000000000000
Apr 24 11:13:43 proxServ kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007d088ddf9dc0
Apr 24 11:13:43 proxServ kernel: R13: 0000593a3c0f21c0 R14: 0000593a3c0f21f0 R15: 00007d088d5ff000
Apr 24 11:13:43 proxServ kernel:  </TASK>
Apr 24 11:13:43 proxServ kernel: ---[ end trace 0000000000000000 ]---
 
Yep - Another CPU 100%, then VM crash - I think is kernel related?
might be - but it could also be due to the newer kernel expecting some things that were fixed in more recent version of BIOS/Firmware:
Hardware name: System manufacturer System Product Name/PRIME X570-PRO, BIOS 5013 03/22/2024
seems there were some updates available for the board:
https://www.asus.com/de/motherboard...0-pro/helpdesk_bios?model2Name=PRIME-X570-PRO

running a memory test (as available on the PVE ISOs) can also help to find issues with memory.

If both updating the BIOS and checking the memory don't help - please open a new thread (feel free to mention me @Stoiko Ivanov )
 
I've updated BIOS - Given that I downgraded to 6.11 kernel and had another crash over night - Nothing in journalctl, this time.
It's been rock solid on 6.8 for a long time, so I wouldn't have thought it will be hardware...We shall see!
 
  • Like
Reactions: Stoiko Ivanov
GPU passthrough from pre 6.12 upwards crashes my linux VMs (I have no idea what happens in windows, I don't use it) - I found the problematic kernel commit:
Problematic commit

This commit produces an instant VM crash in certain gpu utilizations (in my case a simple youtube video on brave browser) with the following report from the kernel:

Apr 11 17:28:32 pve QEMU[92966]: RAX=000018cc0e283840 RBX=0000000000000780 RCX=0000000000000780 RDX=0000000000000780
Apr 11 17:28:32 pve QEMU[92966]: RSI=000018cc0e283840 RDI=00007a71e5c0f000 RBP=00007a71ed884960 RSP=00007a71ed884960
Apr 11 17:28:32 pve QEMU[92966]: R8 =0000000000000780 R9 =0000000000000110 R10=00000000000003c0 R11=0000000000000800
Apr 11 17:28:32 pve QEMU[92966]: R12=0000000000000110 R13=00005597198ef358 R14=00007a71e5c0f000 R15=000018cc0e283840
Apr 11 17:28:32 pve QEMU[92966]: RIP=000055971ff041d0 RFL=00010202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
Apr 11 17:28:32 pve QEMU[92966]: ES =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: CS =0033 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
Apr 11 17:28:32 pve QEMU[92966]: SS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
Apr 11 17:28:32 pve QEMU[92966]: DS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: FS =0000 00007a71ed8876c0 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: GS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: TR =0040 fffffe1d5342d000 00004087 00008b00 DPL=0 TSS64-busy
Apr 11 17:28:32 pve QEMU[92966]: GDT= fffffe1d5342b000 0000007f
Apr 11 17:28:32 pve QEMU[92966]: IDT= fffffe0000000000 00000fff
Apr 11 17:28:32 pve QEMU[92966]: CR0=80050033 CR2=00007a71e5c0f000 CR3=000000011ec9a004 CR4=00772ef0
Apr 11 17:28:32 pve QEMU[92966]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 11 17:28:32 pve QEMU[92966]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 11 17:28:32 pve QEMU[92966]: EFER=0000000000000d01
Apr 11 17:28:32 pve QEMU[92966]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f

I did a bisect from 6.11.11 to 6.12-rc1 and got to the commit f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101

I reverted that commit from 6.14.4 and the crashes are gone. Another user here using an amd card reported crashes also. I've got then with Nvidia gpus

I'm assuming this is a kernel problem and not qemu fault. I don't have the knowledge to answer this.
I'm now running 6.14.4 with that commit reverted and so far no crashes.

Revert patch is attached.
 

Attachments

  • Like
Reactions: uzumo
After diving in head first with 6.14 - I somehow borked my cluster. going back to 6.11 got eveything back to "normal" - I have 5 different mini PCs running in the cluster - all over 10GB or LACP'd 2.5GB NW (aka 5GB) - I have no hight traffic and I broke the cardinal sin of having the corosync not on a dedicated NW - almost afraid to try and change that because everything was working until now . . .

Something in 6.14 has caused the 5 nodes to lose contact with eachother and I get many retransmit list errors - once I reverted all nodes back to 6.11 things stabilized again - If any of one of the nodes is 6.14 and the rest 6.11 it still has problems. FYI - I have ttwo stand alone devices and update to 6.14 without issue so far:

Apr 26 22:01:16 awowfox corosync[18076]: [TOTEM ] Retransmit List: 23
Apr 26 22:01:17 awowfox corosync[18076]: [TOTEM ] Retransmit List: 23
Apr 26 22:01:18 awowfox corosync[18076]: [TOTEM ] Retransmit List: 23
Apr 26 22:01:19 awowfox corosync[18076]: [TOTEM ] Retransmit List: 23
Apr 26 22:01:20 awowfox corosync[18076]: [TOTEM ] Retransmit List: 23

Apr 26 16:43:45 FoxN100 corosync[1128]: [QUORUM] Sync members[4]: 2 3 4 5
Apr 26 16:43:45 FoxN100 corosync[1128]: [QUORUM] Sync left[1]: 1
Apr 26 16:43:45 FoxN100 corosync[1128]: [TOTEM ] A new membership (2.27db) was formed. Members left: 1
Apr 26 16:43:45 FoxN100 corosync[1128]: [TOTEM ] Failed to receive the leave message. failed: 1
Apr 26 16:43:45 FoxN100 corosync[1128]: [QUORUM] Sync members[4]: 2 3 4 5
Apr 26 16:43:45 FoxN100 corosync[1128]: [QUORUM] Sync joined[3]: 2 3 5
Apr 26 16:43:45 FoxN100 corosync[1128]: [QUORUM] Sync left[4]: 1 2 3 5
Apr 26 16:43:45 FoxN100 corosync[1128]: [TOTEM ] A new membership (2.27e3) was formed. Members joined: 2 3 5 left: 2 3 5
Apr 26 16:43:45 FoxN100 corosync[1128]: [TOTEM ] Failed to receive the leave message. failed: 2 3 5
Apr 26 16:43:46 FoxN100 corosync[1128]: [QUORUM] Sync members[4]: 2 3 4 5
Apr 26 16:43:46 FoxN100 corosync[1128]: [QUORUM] Sync left[1]: 1
Apr 26 16:43:46 FoxN100 corosync[1128]: [TOTEM ] A new membership (2.27e7) was formed. Members
 
Last edited:
Hi,
GPU passthrough from pre 6.12 upwards crashes my linux VMs (I have no idea what happens in windows, I don't use it) - I found the problematic kernel commit:
Problematic commit

This commit produces an instant VM crash in certain gpu utilizations (in my case a simple youtube video on brave browser) with the following report from the kernel:

Apr 11 17:28:32 pve QEMU[92966]: RAX=000018cc0e283840 RBX=0000000000000780 RCX=0000000000000780 RDX=0000000000000780
Apr 11 17:28:32 pve QEMU[92966]: RSI=000018cc0e283840 RDI=00007a71e5c0f000 RBP=00007a71ed884960 RSP=00007a71ed884960
Apr 11 17:28:32 pve QEMU[92966]: R8 =0000000000000780 R9 =0000000000000110 R10=00000000000003c0 R11=0000000000000800
Apr 11 17:28:32 pve QEMU[92966]: R12=0000000000000110 R13=00005597198ef358 R14=00007a71e5c0f000 R15=000018cc0e283840
Apr 11 17:28:32 pve QEMU[92966]: RIP=000055971ff041d0 RFL=00010202 [-------] CPL=3 II=0 A20=1 SMM=0 HLT=0
Apr 11 17:28:32 pve QEMU[92966]: ES =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: CS =0033 0000000000000000 ffffffff 00a0fb00 DPL=3 CS64 [-RA]
Apr 11 17:28:32 pve QEMU[92966]: SS =002b 0000000000000000 ffffffff 00c0f300 DPL=3 DS [-WA]
Apr 11 17:28:32 pve QEMU[92966]: DS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: FS =0000 00007a71ed8876c0 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: GS =0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: LDT=0000 0000000000000000 ffffffff 00c00000
Apr 11 17:28:32 pve QEMU[92966]: TR =0040 fffffe1d5342d000 00004087 00008b00 DPL=0 TSS64-busy
Apr 11 17:28:32 pve QEMU[92966]: GDT= fffffe1d5342b000 0000007f
Apr 11 17:28:32 pve QEMU[92966]: IDT= fffffe0000000000 00000fff
Apr 11 17:28:32 pve QEMU[92966]: CR0=80050033 CR2=00007a71e5c0f000 CR3=000000011ec9a004 CR4=00772ef0
Apr 11 17:28:32 pve QEMU[92966]: DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
Apr 11 17:28:32 pve QEMU[92966]: DR6=00000000ffff0ff0 DR7=0000000000000400
Apr 11 17:28:32 pve QEMU[92966]: EFER=0000000000000d01
Apr 11 17:28:32 pve QEMU[92966]: Code=cc cc cc cc 55 48 89 e5 48 89 f8 48 63 ca 48 89 f7 48 89 c6 <f3> a4 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 89 55 fc f3 0f 6f 07 f3 0f 6f
are you sure this is this the full log? No backtrace or further description of the error?
I did a bisect from 6.11.11 to 6.12-rc1 and got to the commit f9e54c3a2f5b79ecc57c7bc7d0d3521e461a2101

I reverted that commit from 6.14.4 and the crashes are gone. Another user here using an amd card reported crashes also. I've got then with Nvidia gpus

I'm assuming this is a kernel problem and not qemu fault. I don't have the knowledge to answer this.
I'm now running 6.14.4 with that commit reverted and so far no crashes.
Seems like there is a follow-up for the problematic commit: 09dfc8a5f2ce8 ("vfio/pci: Fallback huge faults for unaligned pfn")

Could you try with that applied on top instead of the revert?
 
  • Like
Reactions: Stoiko Ivanov
Hi,

are you sure this is this the full log? No backtrace or further description of the error?

Seems like there is a follow-up for the problematic commit: 09dfc8a5f2ce8 ("vfio/pci: Fallback huge faults for unaligned pfn")

Could you try with that applied on top instead of the revert?
Yes, that's it pretty much ( it repeats 2 or 3 times ) and VM crashes.

I have. The revert patch is for both commits. I went down the rabbit hole. Vanilla 6.14.4 still crashed. I'm now running vanilla 6.14.4 with both commits reverted atm and no more crash.

I have opened a bugzilla about this.
 

Attachments

Last edited:
  • Like
Reactions: uzumo