PVE 7.0 BUG: kernel NULL pointer dereference, address: 00000000000000c0-PF:error_code(0x0000) - No web access no ssh

This kernel is available in pve-no-subscription :

Indeed, I guess hot fix is also present in latest kernel (5.11.22-10) as well

1634627387243.png
 
You are talking about pve-kernel-5.11.22-9-pve which is not available in pve-no-subscription?

I'am not sure because newest available package pve-kernel-5.11.22-5-pve has "Version: pve-kernel-5.11.22-9"

From this thread:
FYI, there's a new kernel package available on the pvetest repository for Proxmox VE 7.x. It includes the proposed fix for the regression of the suspicious patch I linked in my previous answer, it's pve-kernel-5.11.22-4-pve in version 5.11.22-9.
https://forum.proxmox.com/threads/p...x0000-no-web-access-no-ssh.96598/#post-418744
Kernel have an ABI version and a package version (as else the same ABI and a newer fix could not be rolled out)..
 
We appear to have been bitten by the same bug this morning, we were however already running '5.11.22-9'.

pve-kernel-5.11.22-4-pve: 5.11.22-9


Code:
Oct 27 07:52:02 kvm5k kernel: [1936007.710328] BUG: kernel NULL pointer dereference, address: 0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.710371] #PF: supervisor instruction fetch in kernel mode
Oct 27 07:52:02 kvm5k kernel: [1936007.710398] #PF: error_code(0x0010) - not-present page
Oct 27 07:52:02 kvm5k kernel: [1936007.710422] PGD 0 P4D 0
Oct 27 07:52:02 kvm5k kernel: [1936007.710439] Oops: 0010 [#1] SMP NOPTI
Oct 27 07:52:02 kvm5k kernel: [1936007.710460] CPU: 43 PID: 763 Comm: kworker/43:1H Tainted: P           O      5.11.22-4-pve #1
Oct 27 07:52:02 kvm5k kernel: [1936007.710496] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0012.070720200218 07/07/2020
Oct 27 07:52:02 kvm5k kernel: [1936007.710537] Workqueue: kblockd blk_mq_timeout_work
Oct 27 07:52:02 kvm5k kernel: [1936007.710566] RIP: 0010:0x0
Oct 27 07:52:02 kvm5k kernel: [1936007.710581] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Oct 27 07:52:02 kvm5k kernel: [1936007.710603] RSP: 0018:ffffb62c5ccafd60 EFLAGS: 00010246
Oct 27 07:52:02 kvm5k kernel: [1936007.710635] RAX: 0000000000000000 RBX: ffffb62c5ccafde8 RCX: 0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.710666] RDX: ffffb62c5ccafe48 RSI: 0000000000000000 RDI: ffff8b3b6caaa400
Oct 27 07:52:02 kvm5k kernel: [1936007.710692] RBP: ffffb62c5ccafd68 R08: 0000000000000000 R09: 0000000000000029
Oct 27 07:52:02 kvm5k kernel: [1936007.710716] R10: 0000000000000008 R11: 0000000000000008 R12: ffff8b3b6caaa400
Oct 27 07:52:02 kvm5k kernel: [1936007.710746] R13: ffff8b3b6caa9400 R14: 0000000000000000 R15: 0000000000000001
Oct 27 07:52:02 kvm5k kernel: [1936007.710775] FS:  0000000000000000(0000) GS:ffff8b9900dc0000(0000) knlGS:0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.710799] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 27 07:52:02 kvm5k kernel: [1936007.710817] CR2: ffffffffffffffd6 CR3: 0000008480010004 CR4: 00000000007726e0
Oct 27 07:52:02 kvm5k kernel: [1936007.710838] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.710866] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 27 07:52:02 kvm5k kernel: [1936007.710894] PKRU: 55555554
Oct 27 07:52:02 kvm5k kernel: [1936007.710907] Call Trace:
Oct 27 07:52:02 kvm5k kernel: [1936007.710921]  blk_mq_put_rq_ref+0x47/0x60
Oct 27 07:52:02 kvm5k kernel: [1936007.710941]  bt_iter+0x54/0x90
Oct 27 07:52:02 kvm5k kernel: [1936007.710957]  blk_mq_queue_tag_busy_iter+0x1a2/0x2d0
Oct 27 07:52:02 kvm5k kernel: [1936007.710980]  ? blk_mq_put_rq_ref+0x60/0x60
Oct 27 07:52:02 kvm5k kernel: [1936007.711000]  ? blk_mq_put_rq_ref+0x60/0x60
Oct 27 07:52:02 kvm5k kernel: [1936007.711018]  blk_mq_timeout_work+0x5f/0x120
Oct 27 07:52:02 kvm5k kernel: [1936007.711033]  process_one_work+0x220/0x3c0
Oct 27 07:52:02 kvm5k kernel: [1936007.711051]  worker_thread+0x53/0x420
Oct 27 07:52:02 kvm5k kernel: [1936007.711825]  ? process_one_work+0x3c0/0x3c0
Oct 27 07:52:02 kvm5k kernel: [1936007.712536]  kthread+0x12b/0x150
Oct 27 07:52:02 kvm5k kernel: [1936007.713129]  ? set_kthread_struct+0x50/0x50
Oct 27 07:52:02 kvm5k kernel: [1936007.713686]  ret_from_fork+0x1f/0x30
Oct 27 07:52:02 kvm5k kernel: [1936007.714216] Modules linked in: rbd ceph libceph fscache ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter nfnetlink_cttimeout xfs openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_watchdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common ipmi_ssif isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate ast drm_vram_helper drm_ttm_helper pcspkr ttm drm_kms_helper input_leds joydev cec rc_core i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me ioatdma mei intel_pch_thermal dca acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core
Oct 27 07:52:02 kvm5k kernel: [1936007.714285]  iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbkbd usbmouse usbhid hid raid6_pq libcrc32c raid0 multipath linear raid1 crc32_pclmul xhci_pci ahci i2c_i801 xhci_pci_renesas megaraid_sas i40e lpc_ich i2c_smbus xhci_hcd libahci wmi
Oct 27 07:52:02 kvm5k kernel: [1936007.722818] CR2: 0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.723767] ---[ end trace 6bdad251a9cb648e ]---
Oct 27 07:52:02 kvm5k kernel: [1936007.796332] RIP: 0010:0x0
Oct 27 07:52:02 kvm5k kernel: [1936007.797400] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Oct 27 07:52:02 kvm5k kernel: [1936007.798223] RSP: 0018:ffffb62c5ccafd60 EFLAGS: 00010246
Oct 27 07:52:02 kvm5k kernel: [1936007.799048] RAX: 0000000000000000 RBX: ffffb62c5ccafde8 RCX: 0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.799858] RDX: ffffb62c5ccafe48 RSI: 0000000000000000 RDI: ffff8b3b6caaa400
Oct 27 07:52:02 kvm5k kernel: [1936007.800634] RBP: ffffb62c5ccafd68 R08: 0000000000000000 R09: 0000000000000029
Oct 27 07:52:02 kvm5k kernel: [1936007.801368] R10: 0000000000000008 R11: 0000000000000008 R12: ffff8b3b6caaa400
Oct 27 07:52:02 kvm5k kernel: [1936007.802112] R13: ffff8b3b6caa9400 R14: 0000000000000000 R15: 0000000000000001
Oct 27 07:52:02 kvm5k kernel: [1936007.802912] FS:  0000000000000000(0000) GS:ffff8b9900dc0000(0000) knlGS:0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.803735] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 27 07:52:02 kvm5k kernel: [1936007.804468] CR2: ffffffffffffffd6 CR3: 0000008480010004 CR4: 00000000007726e0
Oct 27 07:52:02 kvm5k kernel: [1936007.805211] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 27 07:52:02 kvm5k kernel: [1936007.805966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Oct 27 07:52:02 kvm5k kernel: [1936007.806757] PKRU: 55555554
Oct 27 07:52:52 kvm5k kernel: [1936057.636565] libceph: osd112 down
 
This appears to exclusively affect Intel Xeon Scalable 2nd generation systems (eg Intel Xeon Gold 6248).

1636309856255.png


SSDs are proper data centre grade and Ceph is performing well. High after house usage are VM backups (many run within the VM), cluster backups (rotating RBD snapshots) and deep scrubs. Latency is perfect:
1636310039533.png


Another one, where we had hoped that this had been patched in 5.11.22-4-pve (5.11.22-10). Issue appears to trigger during high I/O activity. We're running Ceph Pacific (KRBD) enterprise packages, anything I can do to obtain more specific information?

Code:
Nov  3 23:08:27 kvm5h kernel: [2621661.554647] BUG: kernel NULL pointer dereference, address: 00000000000000c0
Nov  3 23:08:27 kvm5h kernel: [2621661.554678] #PF: supervisor read access in kernel mode
Nov  3 23:08:27 kvm5h kernel: [2621661.554696] #PF: error_code(0x0000) - not-present page
Nov  3 23:08:27 kvm5h kernel: [2621661.554711] PGD 0 P4D 0
Nov  3 23:08:27 kvm5h kernel: [2621661.554723] Oops: 0000 [#1] SMP NOPTI
Nov  3 23:08:27 kvm5h kernel: [2621661.554737] CPU: 20 PID: 3386 Comm: kworker/20:1H Tainted: P           O      5.11.22-4-pve #1
Nov  3 23:08:27 kvm5h kernel: [2621661.554762] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020
Nov  3 23:08:27 kvm5h kernel: [2621661.554789] Workqueue: kblockd blk_mq_timeout_work
Nov  3 23:08:27 kvm5h kernel: [2621661.554811] RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
Nov  3 23:08:27 kvm5h kernel: [2621661.554828] Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
Nov  3 23:08:27 kvm5h kernel: [2621661.554875] RSP: 0018:ffffa00a1e75fd68 EFLAGS: 00010287
Nov  3 23:08:27 kvm5h kernel: [2621661.554891] RAX: 0000000000000000 RBX: ffffa00a1e75fde8 RCX: 0000000000000002
Nov  3 23:08:27 kvm5h kernel: [2621661.554911] RDX: 0000000000000001 RSI: 0000000000000206 RDI: ffff8bb6b8aef400
Nov  3 23:08:27 kvm5h kernel: [2621661.554930] RBP: ffffa00a1e75fda0 R08: 0000000000000000 R09: 000000000000003a
Nov  3 23:08:27 kvm5h kernel: [2621661.554950] R10: 0000000000000008 R11: 0000000000000008 R12: ffff8bb6b8aef400
Nov  3 23:08:27 kvm5h kernel: [2621661.554969] R13: ffff8bb6b8a93000 R14: 0000000000000000 R15: 0000000000000001
Nov  3 23:08:27 kvm5h kernel: [2621661.554989] FS:  0000000000000000(0000) GS:ffff8c743f400000(0000) knlGS:0000000000000000
Nov  3 23:08:27 kvm5h kernel: [2621661.555012] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  3 23:08:27 kvm5h kernel: [2621661.555029] CR2: 00000000000000c0 CR3: 0000000a110b0006 CR4: 00000000007726e0
Nov  3 23:08:27 kvm5h kernel: [2621661.555050] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov  3 23:08:27 kvm5h kernel: [2621661.555070] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov  3 23:08:27 kvm5h kernel: [2621661.555090] PKRU: 55555554
Nov  3 23:08:27 kvm5h kernel: [2621661.555100] Call Trace:
Nov  3 23:08:27 kvm5h kernel: [2621661.555111]  ? bt_iter+0x54/0x90
Nov  3 23:08:27 kvm5h kernel: [2621661.555124]  blk_mq_queue_tag_busy_iter+0x1a2/0x2d0
Nov  3 23:08:27 kvm5h kernel: [2621661.555142]  ? blk_mq_put_rq_ref+0x60/0x60
Nov  3 23:08:27 kvm5h kernel: [2621661.555158]  ? blk_mq_put_rq_ref+0x60/0x60
Nov  3 23:08:27 kvm5h kernel: [2621661.555172]  blk_mq_timeout_work+0x5f/0x120
Nov  3 23:08:27 kvm5h kernel: [2621661.555187]  process_one_work+0x220/0x3c0
Nov  3 23:08:27 kvm5h kernel: [2621661.555204]  worker_thread+0x53/0x420
Nov  3 23:08:27 kvm5h kernel: [2621661.555759]  ? process_one_work+0x3c0/0x3c0
Nov  3 23:08:27 kvm5h kernel: [2621661.556268]  kthread+0x12b/0x150
Nov  3 23:08:27 kvm5h kernel: [2621661.556704]  ? set_kthread_struct+0x50/0x50
Nov  3 23:08:27 kvm5h kernel: [2621661.557149]  ret_from_fork+0x1f/0x30
Nov  3 23:08:27 kvm5h kernel: [2621661.557587] Modules linked in: rbd ceph libceph fscache ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables sctp ip6_udp_tunnel udp_tunnel iptable_filter bpfilter nfnetlink_cttimeout xfs openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ipmi_watchdog nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common isst_if_common skx_edac nfit x86_pkg_temp_thermal intel_powerclamp coretemp ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper rapl intel_cstate ast drm_vram_helper pcspkr drm_ttm_helper ttm drm_kms_helper cec joydev input_leds rc_core i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt mei_me ioatdma mei intel_pch_thermal dca acpi_ipmi ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter acpi_pad mac_hid zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core
Nov  3 23:08:27 kvm5h kernel: [2621661.557655]  iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor usbmouse hid_generic usbkbd usbhid hid raid6_pq libcrc32c raid0 multipath linear raid1 crc32_pclmul xhci_pci ahci xhci_pci_renesas i2c_i801 megaraid_sas i40e i2c_smbus lpc_ich xhci_hcd libahci wmi
Nov  3 23:08:27 kvm5h kernel: [2621661.564088] CR2: 00000000000000c0
Nov  3 23:08:27 kvm5h kernel: [2621661.564731] ---[ end trace 0e9eff0915c0397b ]---
Nov  3 23:08:27 kvm5h kernel: [2621661.619277] RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
Nov  3 23:08:27 kvm5h kernel: [2621661.628449] Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
Nov  3 23:08:27 kvm5h kernel: [2621661.629733] RSP: 0018:ffffa00a1e75fd68 EFLAGS: 00010287
Nov  3 23:08:27 kvm5h kernel: [2621661.630264] RAX: 0000000000000000 RBX: ffffa00a1e75fde8 RCX: 0000000000000002
Nov  3 23:08:27 kvm5h kernel: [2621661.631010] RDX: 0000000000000001 RSI: 0000000000000206 RDI: ffff8bb6b8aef400
Nov  3 23:08:27 kvm5h kernel: [2621661.631587] RBP: ffffa00a1e75fda0 R08: 0000000000000000 R09: 000000000000003a
Nov  3 23:08:27 kvm5h kernel: [2621661.632208] R10: 0000000000000008 R11: 0000000000000008 R12: ffff8bb6b8aef400
Nov  3 23:08:27 kvm5h kernel: [2621661.632786] R13: ffff8bb6b8a93000 R14: 0000000000000000 R15: 0000000000000001
Nov  3 23:08:27 kvm5h kernel: [2621661.633395] FS:  0000000000000000(0000) GS:ffff8c743f400000(0000) knlGS:0000000000000000
Nov  3 23:08:27 kvm5h kernel: [2621661.633889] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov  3 23:08:27 kvm5h kernel: [2621661.634613] CR2: 00000000000000c0 CR3: 0000000a110b0006 CR4: 00000000007726e0
Nov  3 23:08:27 kvm5h kernel: [2621661.635265] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Nov  3 23:08:27 kvm5h kernel: [2621661.635877] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Nov  3 23:08:27 kvm5h kernel: [2621661.636310] PKRU: 55555554
Nov  3 23:08:57 kvm5h ceph-osd[2767203]: 2021-11-03T23:08:57.279+0200 7f7440255700 -1 osd.82 503463 get_health_metrics reporting 1 slow ops, oldest is osd_op(client.1048982205.0:1240118895 0.ec 0:3745a054:::rbd_data.3338a3238e1f29.0000000000002b9e:head [write 860160~4096 in=4096b] snapc 45966=[45966] ondisk+write+known_if_redirected e503463)
 
Hrm.... So how is it possible that we were running nodes on 5.11.22-4-pve where 'dpkg -l pve-kernel-5.11.22-4-pve' reports it as being 5.11.22-9 but the active kernel in memory is actually 5.11.22-8?


Code:
[admin@kvm5f ~]# uname -r
5.11.22-4-pve
[admin@kvm5f ~]# dpkg -l pve-kernel-5.11.22-4-pve
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                     Version      Architecture Description
+++-========================-============-============-=================================
ii  pve-kernel-5.11.22-4-pve 5.11.22-9    amd64        The Proxmox PVE Kernel Image
[admin@kvm5f ~]# uname -a
Linux kvm5f 5.11.22-4-pve #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) x86_64 GNU/Linux
 
There's a package version and a Kernel ABI version, two different things. The ABI is 5.11.22-4-pve in your case and the package version is 5.11.22-8. The reason for that is that we want to create a totally different package if the ABI changes (it is encoded directly in the package name) to avoid module loads failing due to the currently booted kernel being overwritten by a version with an incompatible ABI, the package version is still there as not every change means a Kernel ABI break and thus we need a way to express that there's a newer version with the same ABI.
 
I would really like to recommend that Proxmox bump the ABI whenever they release a new package, either to apply a hotfix to the package or when making use of an upstream update. The scenario we found ourselves in is that we got affected by the bug in this forum post, read that package 5.11.22-9 remediated the issue and then thought that we validated what we were running:

Code:
[admin@kvm5b ~]# uname -r
5.11.22-4-pve
[admin@kvm5b ~]# dpkg -l | grep pve-kernel-5.11.22
ii  pve-firmware                         3.3-2                          all          Binary firmware code for the pve-kernel
ii  pve-kernel-5.11                      7.0-8                          all          Latest Proxmox VE Kernel Image
ii  pve-kernel-5.11.22-4-pve             5.11.22-9                      amd64        The Proxmox PVE Kernel Image
ii  pve-kernel-helper                    7.1-2                          all          Function for various kernel maintenance tasks.


This lead us to incorrectly believe that we were running a kernel that included the fix. It however appears that there are two ABI 5.11.22-4-pve, one containing package 5.11.22-8 and another containing package 5.11.22-9. The kernel Oops only records the ABI, only 'uname -a' yielded a clue that nodes were in fact running 5.11.22-8:
Code:
[admin@kvm5b ~]# uname -a
Linux kvm5d 5.11.22-4-pve #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) x86_64 GNU/Linux

It appears the matching ABI resulted in apt updating the installed package without it being clear that the system would need to be restarted. Update command we used:
Code:
apt-get update; apt-get -y dist-upgrade; apt-get autoremove; apt-get autoclean;
 
I would really like to recommend that Proxmox bump the ABI whenever they release a new package, either to apply a hotfix to the package or when making use of an upstream update.
No, we cannot have ABI and version in sync, they are just different things that do not have to match necessarily, and it makes no sense to do try faking so.
FYI, there's a new kernel package available on the pvetest repository for Proxmox VE 7.x. It includes the proposed fix for the regression of the suspicious patch I linked in my previous answer, it's pve-kernel-5.11.22-4-pve in version 5.11.22-9.
My post here explicitly talked about both, ABI version and package version, sorry I can't be clearer than that.
Apt shows ABI in the package name and the version, well as version.
The kernel boot log also shows both.
 
It appears the matching ABI resulted in apt updating the installed package without it being clear that the system would need to be restarted.
Every kernel package update needs a reboot to be actually applied.
 
No, we cannot have ABI and version in sync, they are just different things that do not have to match necessarily, and it makes no sense to do try faking so.

My post here explicitly talked about both, ABI version and package version, sorry I can't be clearer than that.
Apt shows ABI in the package name and the version, well as version.
The kernel boot log also shows both.
Hi Thomas,

I was under the assumption the numeric value before '-pve' in the package name 'pve-kernel-5.11.22-4-pve' was a revisioning number to differentiate packages produced by Proxmox. I subsequently hoped that Proxmox could increment this identifier to avoid situations where 'dpgk -l pve-kernel-5.11.22-4-pve' could refer to two different kernels...

We'll use 'uname -v' in future to validate what we're running...
 
I was under the assumption the numeric value before '-pve' in the package name 'pve-kernel-5.11.22-4-pve' was a revisioning number to differentiate packages produced by Proxmox. I subsequently hoped that Proxmox could increment this identifier to avoid situations where 'dpgk -l pve-kernel-5.11.22-4-pve' could refer to two different kernels...
I mean, that's pretty much standard in Debian or other distros too, for example Debian's kernel package is currently linux-image-5.10.0-9-amd64 (ABI version == 5.10.0-9) with package version 5.10.70-1, or a ZFS library package libzpool5linux (ABI version == 5) with package version 2.1.1-pve3.

One just needs both for some sorts of packages, mostly dynamic loaded libraries and the kernel to ensure that the running system can continue to work OK on ABI incompatible changes, and well the package version is required to differ between changes where the ABI stayed the same and for changes in, for example, packaging only.
Hope that helps understanding the structure of such packages better.
 
Dunno if you guys can help me but it seems like I have a similar issue..

I’m not super qualified with linux and I hope someone here can help me since my system randomly reboot from time to time (ie every 3-4 days) and I’m clueless…

I had Kdump installed in order to try to understand what is happening but I still don’t get it. I landed here since I had the same “BUG: kernel NULL pointer dereference, address: 0000000000000008” message…

I’m running proxmox 7.1.8

uname -r :
5.13.19-2-pve

zfs --version:
zfs-2.1.1-pve3
zfs-kmod-2.1.1-pve1



Dmesg dump in attachment.
 

Attachments

  • dmsg.txt
    112.5 KB · Views: 5
Last edited:
Hello for Everyone, We updated to pve-manager/7.0-11/63d82f4e (running kernel: 5.11.22-4-pve) two week ago and after that I found this bug that disable the proxmox web and ssh access and stop somehow that I have not been able to understand. I need to force reboot and the services starts ok, I used zpool and all it's ok, before the upgrade I make all the necesary configurations to be ready to update :
The server was updated and starts fine as I told but I don't know how to solve this kernel BUG (you can see scroll down in bold). Our server is Dell Power Edge

Thanks in advance about any help.

Sep 20 16:03:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:03:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:03:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:04:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:04:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:04:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:05:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:05:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:05:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:06:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:06:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:06:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:07:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:07:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:07:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:08:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:08:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:08:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:09:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:09:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:09:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:10:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:10:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:10:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:11:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Sep 20 16:11:02 proxmox systemd[1]: pvesr.service: Succeeded.
Sep 20 16:11:02 proxmox systemd[1]: Finished Proxmox VE replication runner.
Sep 20 16:11:25 proxmox kernel: BUG: kernel NULL pointer dereference, address: 00000000000000c0
Sep 20 16:11:25 proxmox kernel: #PF: supervisor read access in kernel mode
Sep 20 16:11:25 proxmox kernel: #PF: error_code(0x0000) - not-present page
Sep 20 16:11:25 proxmox kernel: PGD 0 P4D 0

Sep 20 16:11:25 proxmox kernel: Oops: 0000 [#1] SMP PTI
Sep 20 16:11:25 proxmox kernel: CPU: 7 PID: 398 Comm: kworker/7:1H Tainted: P O 5.11.22-4-pve #1
Sep 20 16:11:25 proxmox kernel: Hardware name: Dell Inc. PowerEdge R720/046V88, BIOS 2.9.0 12/06/2019
Sep 20 16:11:25 proxmox kernel: Workqueue: kblockd blk_mq_timeout_work
Sep 20 16:11:25 proxmox kernel: RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
Sep 20 16:11:25 proxmox kernel: Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
Sep 20 16:11:25 proxmox kernel: RSP: 0018:ffff9b890ecb7d68 EFLAGS: 00010287
Sep 20 16:11:25 proxmox kernel: RAX: 0000000000000000 RBX: ffff9b890ecb7de8 RCX: 0000000000000002
Sep 20 16:11:25 proxmox kernel: RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: RBP: ffff9b890ecb7da0 R08: 0000000000000000 R09: 000000000000003b
Sep 20 16:11:25 proxmox kernel: R10: 0000000000000008 R11: 0000000000000008 R12: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: R13: ffff8f4bff31b400 R14: 0000000000000000 R15: 0000000000000001
Sep 20 16:11:25 proxmox kernel: FS: 0000000000000000(0000) GS:ffff8f7b2f8c0000(0000) knlGS:0000000000000000
Sep 20 16:11:25 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 20 16:11:25 proxmox kernel: CR2: 00000000000000c0 CR3: 000000305b246005 CR4: 00000000000626e0
Sep 20 16:11:25 proxmox kernel: Call Trace:
Sep 20 16:11:25 proxmox kernel: ? bt_iter+0x54/0x90
Sep 20 16:11:25 proxmox kernel: blk_mq_queue_tag_busy_iter+0x1a2/0x2d0
Sep 20 16:11:25 proxmox kernel: ? blk_mq_put_rq_ref+0x60/0x60
Sep 20 16:11:25 proxmox kernel: ? blk_mq_put_rq_ref+0x60/0x60
Sep 20 16:11:25 proxmox kernel: blk_mq_timeout_work+0x5f/0x120
Sep 20 16:11:25 proxmox kernel: process_one_work+0x220/0x3c0
Sep 20 16:11:25 proxmox kernel: worker_thread+0x53/0x420
Sep 20 16:11:25 proxmox kernel: ? process_one_work+0x3c0/0x3c0
Sep 20 16:11:25 proxmox kernel: kthread+0x12b/0x150
Sep 20 16:11:25 proxmox kernel: ? set_kthread_struct+0x50/0x50
Sep 20 16:11:25 proxmox kernel: ret_from_fork+0x22/0x30
Sep 20 16:11:25 proxmox kernel: Modules linked in: binfmt_misc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute veth nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace nfs_ssc fscache ebtable_filter ebtables ip_set bonding tls softdog ip6table_nat ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_security iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_raw xt_tcpudp iptable_filter bpfilter nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul ghash_clmulni_intel aesni_intel ipmi_ssif mgag200 drm_kms_helper crypto_simd cryptd cec glue_helper rc_core i2c_algo_bit fb_sys_fops syscopyarea dcdbas rapl sysfillrect pcspkr joydev mei_me sysimgblt input_leds mei intel_cstate ipmi_si ipmi_devintf mac_hid ipmi_msghandler acpi_power_meter vhost_net vhost vhost_iotlb tap ib_iser
Sep 20 16:11:25 proxmox kernel: rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) hid_logitech_hidpp btrfs blake2b_generic xor raid6_pq libcrc32c hid_logitech_dj hid_generic usbkbd usbmouse usbhid hid ehci_pci crc32_pclmul lpc_ich ehci_hcd megaraid_sas tg3 wmi
Sep 20 16:11:25 proxmox kernel: CR2: 00000000000000c0
Sep 20 16:11:25 proxmox kernel: ---[ end trace 43b5fd3492cb5d6d ]---
Sep 20 16:11:25 proxmox kernel: RIP: 0010:blk_mq_put_rq_ref+0xa/0x60
Sep 20 16:11:25 proxmox kernel: Code: 15 0f b6 d3 4c 89 e7 be 01 00 00 00 e8 cf fe ff ff 5b 41 5c 5d c3 0f 0b 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 8b 47 10 <48> 8b 80 c0 00 00 00 48 89 e5 48 3b 78 40 74 1f 4c 8d 87 e8 00 00
Sep 20 16:11:25 proxmox kernel: RSP: 0018:ffff9b890ecb7d68 EFLAGS: 00010287
Sep 20 16:11:25 proxmox kernel: RAX: 0000000000000000 RBX: ffff9b890ecb7de8 RCX: 0000000000000002
Sep 20 16:11:25 proxmox kernel: RDX: 0000000000000001 RSI: 0000000000000202 RDI: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: RBP: ffff9b890ecb7da0 R08: 0000000000000000 R09: 000000000000003b
Sep 20 16:11:25 proxmox kernel: R10: 0000000000000008 R11: 0000000000000008 R12: ffff8f4bff31c000
Sep 20 16:11:25 proxmox kernel: R13: ffff8f4bff31b400 R14: 0000000000000000 R15: 0000000000000001
Sep 20 16:11:25 proxmox kernel: FS: 0000000000000000(0000) GS:ffff8f7b2f8c0000(0000) knlGS:0000000000000000
Sep 20 16:11:25 proxmox kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 20 16:11:25 proxmox kernel: CR2: 00000000000000c0 CR3: 000000305b246005 CR4: 00000000000626e0
-- Reboot --
Sep 20 17:25:50 proxmox kernel: Linux version 5.11.22-4-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) ()
Sep 20 17:25:50 proxmox kernel: Command line: BOOT_IMAGE=/vmlinuz-5.11.22-4-pve root=ZFS=rpool/ROOT/pve-1 ro root=ZFS=rpool/ROOT/pve-1 boot=zfs systemd.unified_cgroup_hierarchy=0 quiet
Hello everyone,

i have the same error on Proxmox Backup Server Version 3.0-1 with Kernel 6.2.16-3-pve.
My Server is crashing every day if the Backupjob is running. Within idle state everything is fine.

How can i fix this issue?

Enclosed there is the dmesg crash file.
1690378578908.png
Regards
Kevin
 

Attachments

  • dmesg.202307260034.txt
    87.4 KB · Views: 0
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!