PVE 7.0 BUG: kernel NULL pointer dereference, address: 00000000000000c0-PF:error_code(0x0000) - No web access no ssh

BrandonN

Member
Mar 2, 2018
28
5
23
32
Colombia
Hello for Everyone, We updated to pve-manager/7.0-11/63d82f4e (running kernel: 5.11.22-4-pve) two week ago and after that I found this bug that disable the proxmox web and ssh access and stop somehow that I have not been able to understand. I need to force reboot and the services starts ok, I used zpool and all it's ok, before the upgrade I make all the necesary configurations to be ready to update :
The server was updated and starts fine as I told but I don't know how to solve this kernel BUG (you can see scroll down in bold). Our server is Dell Power Edge

Thanks in advance about any help.

Sep 20 16:03:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:03:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:03:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:04:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:04:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:04:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:05:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:05:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:05:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:06:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:06:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:06:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:07:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:07:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:07:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:08:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:08:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:08:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:09:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:09:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:09:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:10:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:10:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:10:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:11:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:11:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:11:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:11:25 proxmox kernel: BUG: kernel NULL pointer dereference, address: 00000000000000c0
Sep 20 16:11:25 proxmox kernel: #PF: supervisor read access in kernel mode
Sep 20 16:11:25 proxmox kernel: #PF: error_code(0x0000) - not-present page
Sep 20 16:11:25 proxmox kernel: PGD 0 P4D 0

Sep 20 16:11:25 proxmox kernel: Oops: 0000 [#1] SMP PTI
Sep 20 16:11:25 proxmox kernel: CPU: 7 PID: 398 Comm: kworker/7:1H Tainted: P O 5.11.22-4-pve #1
Sep 20 16:11:25 proxmox kernel: Hardware name: Dell Inc. PowerEdge R720/046V88, BIOS 2.9.0 12/06/2019
Sep 20 16:11:25 proxmox kernel: Workqueue: kblockd blk_mq_timeout_work
Sep 20 16:11:25 proxmox kernel: RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
Sep 20 16:11:25 proxmox kernel: Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
Sep 20 16:11:25 proxmox kernel: RSP: 0018:ffff9b890ecb7d68 EFLAGS: 00010287
Sep 20 16:11:25 proxmox kernel: RAX: 0000000000000000 RBX: ffff9b890ecb7de8 RCX: 0000000000000002
Sep 20 16:11:25 proxmox kernel: RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: RBP: ffff9b890ecb7da0 R08: 0000000000000000 R09: 000000000000003b
Sep 20 16:11:25 proxmox kernel: R10: 0000000000000008 R11: 0000000000000008 R12: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: R13: ffff8f4bff31b400 R14: 0000000000000000 R15: 0000000000000001
Sep 20 16:11:25 proxmox kernel: FS: 0000000000000000(0000) GS:ffff8f7b2f8c0000(0000) knlGS:0000000000000000
Sep 20 16:11:25 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 20 16:11:25 proxmox kernel: CR2: 00000000000000c0 CR3: 000000305b246005 CR4: 00000000000626e0
Sep 20 16:11:25 proxmox kernel: Call Trace:
Sep 20 16:11:25 proxmox kernel: ? bt_iter+0x54/0x90
Sep 20 16:11:25 proxmox kernel: blk_mq_queue_tag_busy_iter+0x1a2/0x2d0
Sep 20 16:11:25 proxmox kernel: ? blk_mq_put_rq_ref+0x60/0x60
Sep 20 16:11:25 proxmox kernel: ? blk_mq_put_rq_ref+0x60/0x60
Sep 20 16:11:25 proxmox kernel: blk_mq_timeout_work+0x5f/0x120
Sep 20 16:11:25 proxmox kernel: process_one_work+0x220/0x3c0
Sep 20 16:11:25 proxmox kernel: worker_thread+0x53/0x420
Sep 20 16:11:25 proxmox kernel: ? process_one_work+0x3c0/0x3c0
Sep 20 16:11:25 proxmox kernel: kthread+0x12b/0x150
Sep 20 16:11:25 proxmox kernel: ? set_kthread_struct+0x50/0x50
Sep 20 16:11:25 proxmox kernel: ret_from_fork+0x22/0x30
Sep 20 16:11:25 proxmox kernel: Modules linked in: binfmt_misc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute veth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace nfs_ssc fscache ebtable_filter ebtables ip_set bonding tls softdog ip6table_nat ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw xt_tcpudp iptable_filter bpfilter nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel ipmi_ssif mgag200 drm_kms_helper crypto_simd cryptd cec glue_helper rc_core i2c_algo_bit fb_sys_fops syscopyarea dcdbas rapl sysfillrect pcspkr joydev mei_me sysimgblt input_leds mei intel_cstate ipmi_si ipmi_devintf mac_hid ipmi_msghandler acpi_power_meter vhost_net vhost vhost_iotlb tap ib_iser
Sep 20 16:11:25 proxmox kernel: rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) hid_logitech_hidpp btrfs blake2b_generic xor raid6_pq libcrc32c hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid ehci_pci crc32_pclmul lpc_ich ehci_hcd megaraid_sas tg3 wmi
Sep 20 16:11:25 proxmox kernel: CR2: 00000000000000c0
Sep 20 16:11:25 proxmox kernel: ---[ end trace 43b5fd3492cb5d6d ]---
Sep 20 16:11:25 proxmox kernel: RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
Sep 20 16:11:25 proxmox kernel: Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
Sep 20 16:11:25 proxmox kernel: RSP: 0018:ffff9b890ecb7d68 EFLAGS: 00010287
Sep 20 16:11:25 proxmox kernel: RAX: 0000000000000000 RBX: ffff9b890ecb7de8 RCX: 0000000000000002
Sep 20 16:11:25 proxmox kernel: RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: RBP: ffff9b890ecb7da0 R08: 0000000000000000 R09: 000000000000003b
Sep 20 16:11:25 proxmox kernel: R10: 0000000000000008 R11: 0000000000000008 R12: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: R13: ffff8f4bff31b400 R14: 0000000000000000 R15: 0000000000000001
Sep 20 16:11:25 proxmox kernel: FS: 0000000000000000(0000) GS:ffff8f7b2f8c0000(0000) knlGS:0000000000000000
Sep 20 16:11:25 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 20 16:11:25 proxmox kernel: CR2: 00000000000000c0 CR3: 000000305b246005 CR4: 00000000000626e0
-- Reboot --
Sep 20 17:25:50 proxmox kernel: Linux version 5.11.22-4-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) ()
Sep 20 17:25:50 proxmox kernel: Command line: BOOT_IMAGE=/vmlinuz-5.11.22-4-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs systemd.unified_cgroup_hierarchy=0 quiet
 

Attachments

  • syslog.txt
    36.3 KB · Views: 0
Hi,

when does the issue occurs? Randomly, or on a lot of IO, or just after the system ran for a while?


But anyhow, with the kernel oops you posted I got enough info to check a bit around and I have a feeling that this is a regression from:
https://git.kernel.org/pub/scm/linu...y&id=a3362ff0433b9cbd545c35baae59c5788b766e53

Which came in through an upstream kernel-stable update a few weeks ago, and was later fixed by another commit that is not yet included in our kernel:
https://git.kernel.org/pub/scm/linu...y&id=ceffaa61b5bb5296e07cb7f4f494377eb659058f

We'll pick that one up for the next kernel build, in the meantime you could try using an older 5.11 version, e.g., by installing pve-kernel-5.11.22-3-pve (if not already installed) and manually select that on next boot.
 
Hi,

Same problem here, reproduced on two different machine (same hardware) with 5.11.22-4-pve.
It occurs after a few days.

Code:
Sep  7 09:11:51 pve12 kernel: [65320.444899] BUG: kernel NULL pointer dereference, address: 0000000000000000
Sep  7 09:11:51 pve12 kernel: [65320.444941] #PF: supervisor instruction fetch in kernel mode
Sep  7 09:11:51 pve12 kernel: [65320.444965] #PF: error_code(0x0010) - not-present page
Sep  7 09:11:51 pve12 kernel: [65320.444986] PGD 0 P4D 0
Sep  7 09:11:51 pve12 kernel: [65320.445000] Oops: 0010 [#1] SMP NOPTI
Sep  7 09:11:51 pve12 kernel: [65320.445017] CPU: 4 PID: 542 Comm: kworker/4:1H Tainted: P           O      5.11.22-4-pve #1
Sep  7 09:11:51 pve12 kernel: [65320.445049] Hardware name: Quanta Cloud Technology Inc. QuantaPlex T22HF-1U/S5HF MB, BIOS 3A05.ON02 03/20/2019
Sep  7 09:11:51 pve12 kernel: [65320.445083] Workqueue: kblockd blk_mq_timeout_work

Error it's not always logged because the FS seems to be the first affected(Server and VM respond to bing, screen continue to display errors...).
 
Hi,

when does the issue occurs? Randomly, or on a lot of IO, or just after the system ran for a while?


But anyhow, with the kernel oops you posted I got enough info to check a bit around and I have a feeling that this is a regression from:
https://git.kernel.org/pub/scm/linu...y&id=a3362ff0433b9cbd545c35baae59c5788b766e53

Which came in through an upstream kernel-stable update a few weeks ago, and was later fixed by another commit that is not yet included in our kernel:
https://git.kernel.org/pub/scm/linu...y&id=ceffaa61b5bb5296e07cb7f4f494377eb659058f

We'll pick that one up for the next kernel build, in the meantime you could try using an older 5.11 version, e.g., by installing pve-kernel-5.11.22-3-pve (if not already installed) and manually select that on next boot.
Hi and thanks.
It occurs when the system ran for a while, right now I didn't found any specific thing to occurs because the server has the same use than before update.
I'll try as you guided me, thanks a lot!
 
To give more info on my case, it was two fresh reinstall on servers previously running fine on Proxmox 6.
The servers where removed from a PVE6/CEPH cluster, reinstalled from PVE7 iso and joined to another PVE7/CEPH cluster (formed by same hardware servers running fine with previous kernel version).
I downgraded the servers to 5.11.22-3-pve, waiting few days to see if it's fixed.
 
  • Like
Reactions: BrandonN
This very same happens to me last week. Yesterday it destroyed all my OSD so I could manually activate all OSDs. Today on the one node of three cause havoc with this messagess
Code:
get_health_metrics reporting 1 slow ops, oldest is osd_op(client.104774156.0:5385119 2.3f3 2:cff4b302:::rbd_data.0d04dc79291870.0000000000000008:head [write 2781184~4096 in=4096b] snapc 9860=[] ondisk+write+known_if_redirected e77207)

My CPU is :
56 x Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz (2 Sockets)

The Kernel bug:
Code:
[Tue Sep 21 15:44:21 2021] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[Tue Sep 21 15:44:21 2021] fwbr146i0: port 2(veth146i0) entered blocking state
[Tue Sep 21 15:44:21 2021] fwbr146i0: port 2(veth146i0) entered forwarding state
[Tue Sep 21 15:58:23 2021] STREAM_RECEIVER[2056031]: segfault at 50 ip 000055b7278387a5 sp 00007fae31849e40 error 4 in netdata[55b7276f7000+1ff000]
[Tue Sep 21 15:58:23 2021] Code: 84 c0 0f 84 0c 01 00 00 4c 89 c6 41 be c5 9d 1c 81 0f 1f 40 00 45 69 f6 93 01 00 01 48 83 c6 01 41 31 c6 0f b6 06 84 c0 75 eb <39> 6b 10 74 1e 66 0f 1f 44 00 00 48 8b 43 30 49 89 dd 48 85 c0 0f
[Tue Sep 21 16:21:32 2021] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[Tue Sep 21 16:21:32 2021] #PF: supervisor read access in kernel mode
[Tue Sep 21 16:21:32 2021] #PF: error_code(0x0000) - not-present page
[Tue Sep 21 16:21:32 2021] PGD 0 P4D 0
[Tue Sep 21 16:21:32 2021] Oops: 0000 [#1] SMP PTI
[Tue Sep 21 16:21:32 2021] CPU: 5 PID: 2067368 Comm: PLUGIN[proc] Tainted: P           O      5.11.22-4-pve #1
[Tue Sep 21 16:21:32 2021] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.11.0 11/02/2019
[Tue Sep 21 16:21:32 2021] RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
[Tue Sep 21 16:21:32 2021] Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
[Tue Sep 21 16:21:32 2021] RSP: 0018:ffffc04fcc753b08 EFLAGS: 00010287
[Tue Sep 21 16:21:32 2021] RAX: 0000000000000000 RBX: ffffc04fcc753b88 RCX: 0000000000000002
[Tue Sep 21 16:21:32 2021] RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff9902ca03d400
[Tue Sep 21 16:21:32 2021] RBP: ffffc04fcc753b40 R08: 0000000000000000 R09: 000000000000003a
[Tue Sep 21 16:21:32 2021] R10: 00000000000005e5 R11: 000000000000000d R12: ffff9902ca03d400
[Tue Sep 21 16:21:32 2021] R13: ffff9902ca00cc00 R14: 0000000000000000 R15: 0000000000000001
[Tue Sep 21 16:21:32 2021] FS:  00007f1abff2eb00(0000) GS:ffff9931ffa80000(0000) knlGS:0000000000000000
[Tue Sep 21 16:21:32 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Sep 21 16:21:32 2021] CR2: 00000000000000c0 CR3: 0000002dfbd5a004 CR4: 00000000003726e0
[Tue Sep 21 16:21:32 2021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Tue Sep 21 16:21:32 2021] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Tue Sep 21 16:21:32 2021] Call Trace:
[Tue Sep 21 16:21:32 2021]  ? bt_iter+0x54/0x90
[Tue Sep 21 16:21:32 2021]  blk_mq_queue_tag_busy_iter+0x1a2/0x2d0
[Tue Sep 21 16:21:32 2021]  ? blk_add_rq_to_plug+0x50/0x50
[Tue Sep 21 16:21:32 2021]  ? blk_add_rq_to_plug+0x50/0x50
[Tue Sep 21 16:21:32 2021]  blk_mq_in_flight+0x38/0x60
[Tue Sep 21 16:21:32 2021]  diskstats_show+0x159/0x300
[Tue Sep 21 16:21:32 2021]  seq_read_iter+0x2c6/0x4b0
[Tue Sep 21 16:21:32 2021]  proc_reg_read_iter+0x51/0x80
[Tue Sep 21 16:21:32 2021]  new_sync_read+0x10d/0x190
[Tue Sep 21 16:21:32 2021]  vfs_read+0x15a/0x1c0
[Tue Sep 21 16:21:32 2021]  ksys_read+0x67/0xe0
[Tue Sep 21 16:21:32 2021]  __x64_sys_read+0x1a/0x20
[Tue Sep 21 16:21:32 2021]  do_syscall_64+0x38/0x90
[Tue Sep 21 16:21:32 2021]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Tue Sep 21 16:21:32 2021] RIP: 0033:0x7f1ac16373c3
[Tue Sep 21 16:21:32 2021] Code: c3 8b 07 85 c0 75 24 49 89 fb 48 89 f0 48 89 d7 48 89 ce 4c 89 c2 4d 89 ca 4c 8b 44 24 08 4c 8b 4c 24 10 4c 89 5c 24 08 0f 05 <c3> e9 53 de ff ff 48 63 ff 50 b8 e5 00 00 00 0f 05 48 89 c7 e8 19
[Tue Sep 21 16:21:32 2021] RSP: 002b:00007f1abff2c0c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[Tue Sep 21 16:21:32 2021] RAX: ffffffffffffffda RBX: 00007f1abff2eb00 RCX: 00007f1ac16373c3
[Tue Sep 21 16:21:32 2021] RDX: 0000000000002800 RSI: 0000555558f8a430 RDI: 0000000000000028
[Tue Sep 21 16:21:32 2021] RBP: 0000555558f89000 R08: 0000000000000000 R09: 0000000000000000
[Tue Sep 21 16:21:32 2021] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
[Tue Sep 21 16:21:32 2021] R13: 0000000000000028 R14: 0000555558f8a430 R15: 00007f1ac185e980
[Tue Sep 21 16:21:32 2021] Modules linked in: option usb_wwan usbserial nft_counter xt_state nft_compat nf_tables veth rbd libceph ebt_arp ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel iptable_filter 8021q garp mrp bonding tls ipmi_watchdog xt_tcpudp xt_DSCP iptable_mangle bpfilter nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper dcdbas rapl mgag200 intel_cstate drm_kms_helper pcspkr cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt mxm_wmi mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO)
[Tue Sep 21 16:21:32 2021]  zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq uas usb_storage dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul ixgbe igb ahci xfrm_algo ehci_pci i2c_algo_bit mdio lpc_ich libahci ehci_hcd dca megaraid_sas wmi
[Tue Sep 21 16:21:32 2021] CR2: 00000000000000c0
[Tue Sep 21 16:21:32 2021] ---[ end trace 02bc89a81abdd48a ]---
[Tue Sep 21 16:21:32 2021] RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
[Tue Sep 21 16:21:32 2021] Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
[Tue Sep 21 16:21:32 2021] RSP: 0018:ffffc04fcc753b08 EFLAGS: 00010287
[Tue Sep 21 16:21:32 2021] RAX: 0000000000000000 RBX: ffffc04fcc753b88 RCX: 0000000000000002
[Tue Sep 21 16:21:32 2021] RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff9902ca03d400
[Tue Sep 21 16:21:32 2021] RBP: ffffc04fcc753b40 R08: 0000000000000000 R09: 000000000000003a
[Tue Sep 21 16:21:32 2021] R10: 00000000000005e5 R11: 000000000000000d R12: ffff9902ca03d400
[Tue Sep 21 16:21:32 2021] R13: ffff9902ca00cc00 R14: 0000000000000000 R15: 0000000000000001
[Tue Sep 21 16:21:32 2021] FS:  00007f1abff2eb00(0000) GS:ffff9931ffa80000(0000) knlGS:0000000000000000
[Tue Sep 21 16:21:32 2021] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Tue Sep 21 16:21:32 2021] CR2: 00000000000000c0 CR3: 0000002dfbd5a004 CR4: 00000000003726e0
[Tue Sep 21 16:21:32 2021] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[Tue Sep 21 16:21:32 2021] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[Tue Sep 21 16:26:36 2021] libceph: osd0 (1)10.0.100.1:6841 socket closed (con state OPEN)
[Tue Sep 21 16:26:36 2021] libceph: osd2 (1)10.0.100.1:6865 socket closed (con state OPEN)
[Tue Sep 21 16:26:36 2021] libceph: osd3 (1)10.0.100.1:6801 socket closed (con state OPEN)
[Tue Sep 21 16:26:36 2021] libceph: osd7 (1)10.0.100.1:6857 socket closed (con state OPEN)
[Tue Sep 21 16:26:36 2021] libceph: osd6 (1)10.0.100.1:6825 socket closed (con state OPEN)
[Tue Sep 21 16:26:36 2021] libceph: osd5 (1)10.0.100.1:6833 socket closed (con state OPEN)
 
FYI, there's a new kernel package available on the pvetest repository for Proxmox VE 7.x. It includes the proposed fix for the regression of the suspicious patch I linked in my previous answer, it's pve-kernel-5.11.22-4-pve in version 5.11.22-9.

Note that due to not being able to reproduce the exact issue here (neither our production nor test lab systems showed such a crash) I cannot guarantee that this is in fact the full fix, but gut feeling tells me that's not unlikely and at least won't hurt, so feedback would be welcomed.
 
FYI, there's a new kernel package available on the pvetest repository for Proxmox VE 7.x. It includes the proposed fix for the regression of the suspicious patch I linked in my previous answer, it's pve-kernel-5.11.22-4-pve in version 5.11.22-9.

Note that due to not being able to reproduce the exact issue here (neither our production nor test lab systems showed such a crash) I cannot guarantee that this is in fact the full fix, but gut feeling tells me that's not unlikely and at least won't hurt, so feedback would be welcomed.
TYSM Thomas, As soon as I can I'll try and write the feedback.
 
Our production suffered over the last few days so much, that I'll stick to the pve-kernel-5.11.22-3-pve version as it turns out to be stable in the last 24hrs until it will be fully confirm. THX anyway
 
  • Like
Reactions: BrandonN
Our production suffered over the last few days so much, that I'll stick to the pve-kernel-5.11.22-3-pve version as it turns out to be stable in the last 24hrs until it will be fully confirm. THX anyway

Is it possible to downgrade to pve-kernel-5.11.22-3-pve without having to choose that version manually on reboot ? I only have ssh access to the server.
 
Hi,

My cluster was also affected by this issue.
kernel 5.11.22-9 installed.

I keep you in touch too.

May I say that packages are moving so fast from testing/no-subscription to enterprise repository ? :p
 
Is it possible to downgrade to pve-kernel-5.11.22-3-pve without having to choose that version manually on reboot ? I only have ssh access to the server.
Why not try out the proposed kernel instead? It may well fix the problem for good and does not require pinning kernel versions.
 
May I say that packages are moving so fast from testing/no-subscription to enterprise repository ? :p
I mean it certainly is a nuisance if one is hit by a regression in their setup, so I get the sentiment :)

The problematic kernel was out for about two weeks before it got to enterprise, IIRC, and until then the issue did not pop up anywhere (it required almost four weeks for that), I mean at least I did not notice any report, so it was deemed to be safe enough for going on enterprise - especially as our whole production setups (and there are a few) run on it since quite early in the release stack. I certainly do not want to make up excuses here, just trying to give some context and slightly ranting about HW/setup specific issues; those are always a bit of a PITA :)

In general, we always need to make a balance between long enough on each non-production repo so that hopefully any issue turns up but not too long as then important bug and security fixes are delayed, so it's a bit of a trade-off.
 
  • Like
Reactions: Falk R.
Kernel is like any other package, so you can install it by typing this: apt install pve-kernel-5.11.22-3-pve
 
  • Like
Reactions: moxfan
Was also affected today with kernel 5.11.22-8. Updating right now...otherwise I will go back to an older kernel as suggested.
 
Hi.

I have the same problem on many clusters and kernel 5.11.22-4, I already downgrading to 5.11.22-3.

Is there chance to resolve this issue in newer kernels and nearly future?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!