Host crashing under VM small load

Moatasem

New Member
Apr 14, 2022
10
0
1
Hi all

I've a brand new Proxmox 3 nodes cluster with full mesh Ceph storage.
Specs per host are:
24 x 13th Gen Intel(R) Core(TM) i7-13700 (1 Socket)
128 Gi RAM
2 x 1Gbp Nic for network traffic.
2 x 10Gbp Nic for Ceph traffic.
2 x 4TB SSD for storage.
1 x 2TB NVMe SSD for OS

On host 1 when I try to load it with VM booting or cloning some times the host crashes and shutdown and I have to manually start it up again.
Looking at the logs for host 1 I find the following:

May 14 16:49:14 AVHOST01 kernel: BUG: unable to handle page fault for address: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: #PF: supervisor write access in kernel mode
May 14 16:49:14 AVHOST01 kernel: #PF: error_code(0x0002) - not-present page
May 14 16:49:14 AVHOST01 kernel: PGD a38601067 P4D a38601067 PUD 0
May 14 16:49:14 AVHOST01 kernel: Oops: 0002 [#1] SMP NOPTI
May 14 16:49:14 AVHOST01 kernel: CPU: 2 PID: 121889 Comm: kvm Tainted: P O 5.15.102-1-pve #1
May 14 16:49:14 AVHOST01 kernel: Hardware name: Gigabyte Technology Co., Ltd. B760 DS3H AX DDR4/B760 DS3H AX DDR4, BIOS F1 10/03/2022
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:14 AVHOST01 kernel: Call Trace:
May 14 16:49:14 AVHOST01 kernel: <TASK>
May 14 16:49:14 AVHOST01 kernel: poll_freewait+0x6f/0xb0
May 14 16:49:14 AVHOST01 kernel: do_sys_poll+0x56e/0x690
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: __x64_sys_ppoll+0xbc/0x150
May 14 16:49:14 AVHOST01 kernel: do_syscall_64+0x59/0xc0
May 14 16:49:14 AVHOST01 kernel: ? syscall_exit_to_user_mode+0x27/0x50
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? sysvec_apic_timer_interrupt+0x4e/0x90
May 14 16:49:14 AVHOST01 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
May 14 16:49:14 AVHOST01 kernel: RIP: 0033:0x7fd5003cee26
May 14 16:49:14 AVHOST01 kernel: Code: 7c 24 08 e8 7c 0f f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 a6 0f f9 ff 8b 44
May 14 16:49:14 AVHOST01 kernel: RSP: 002b:00007ffec0464ef0 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
May 14 16:49:14 AVHOST01 kernel: RAX: ffffffffffffffda RBX: 00005612e12956b0 RCX: 00007fd5003cee26
May 14 16:49:14 AVHOST01 kernel: RDX: 00007ffec0464f10 RSI: 0000000000000049 RDI: 00005612e1d031f0
May 14 16:49:14 AVHOST01 kernel: RBP: 00007ffec0464f7c R08: 0000000000000008 R09: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffec0464f10
May 14 16:49:14 AVHOST01 kernel: R13: 00005612e12956b0 R14: 00007ffec0464f80 R15: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: </TASK>
May 14 16:49:14 AVHOST01 kernel: Modules linked in: veth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables nfnetlink_cttimeout bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log nfnetlink i915 ttm drm_kms_helper cec intel_rapl_msr rc_core intel_rapl_common i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi btusb mt7921e snd_hda_codec coretemp btrtl mt76_connac_lib btbcm mt76 snd_hda_core btintel snd_hwdep bluetooth kvm_intel snd_pcm ecdh_generic mei_hdcp ecc mac80211 kvm irqbypass snd_timer crct10dif_pclmul snd cfg80211 ghash_clmulni_intel aesni_intel gigabyte_wmi crypto_simd cryptd wmi_bmof pcspkr efi_pstore libarc4 soundcore ov01a1s mei_me power_ctrl_logic mei v4l2_fwnode
May 14 16:49:14 AVHOST01 kernel: v4l2_async videodev intel_hid mc acpi_pad sparse_keymap acpi_tad zfs(PO) zunicode(PO) mac_hid zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul nvme atlantic xhci_pci ahci xhci_pci_renesas macsec r8169 libahci nvme_core xhci_hcd realtek wmi video
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: ---[ end trace 361f80607622d2a5 ]---
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:16 AVHOST01 pvedaemon[122639]: VM 101 qmp command failed - VM 101 not running
-- Reboot --

Any thoughts?
 
Hi all

I've a brand new Proxmox 3 nodes cluster with full mesh Ceph storage.
Specs per host are:
24 x 13th Gen Intel(R) Core(TM) i7-13700 (1 Socket)
128 Gi RAM
2 x 1Gbp Nic for network traffic.
2 x 10Gbp Nic for Ceph traffic.
2 x 4TB SSD for storage.
1 x 2TB NVMe SSD for OS

On host 1 when I try to load it with VM booting or cloning some times the host crashes and shutdown and I have to manually start it up again.
Looking at the logs for host 1 I find the following:

May 14 16:49:14 AVHOST01 kernel: BUG: unable to handle page fault for address: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: #PF: supervisor write access in kernel mode
May 14 16:49:14 AVHOST01 kernel: #PF: error_code(0x0002) - not-present page
May 14 16:49:14 AVHOST01 kernel: PGD a38601067 P4D a38601067 PUD 0
May 14 16:49:14 AVHOST01 kernel: Oops: 0002 [#1] SMP NOPTI
May 14 16:49:14 AVHOST01 kernel: CPU: 2 PID: 121889 Comm: kvm Tainted: P O 5.15.102-1-pve #1
May 14 16:49:14 AVHOST01 kernel: Hardware name: Gigabyte Technology Co., Ltd. B760 DS3H AX DDR4/B760 DS3H AX DDR4, BIOS F1 10/03/2022
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:14 AVHOST01 kernel: Call Trace:
May 14 16:49:14 AVHOST01 kernel: <TASK>
May 14 16:49:14 AVHOST01 kernel: poll_freewait+0x6f/0xb0
May 14 16:49:14 AVHOST01 kernel: do_sys_poll+0x56e/0x690
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: __x64_sys_ppoll+0xbc/0x150
May 14 16:49:14 AVHOST01 kernel: do_syscall_64+0x59/0xc0
May 14 16:49:14 AVHOST01 kernel: ? syscall_exit_to_user_mode+0x27/0x50
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? sysvec_apic_timer_interrupt+0x4e/0x90
May 14 16:49:14 AVHOST01 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
May 14 16:49:14 AVHOST01 kernel: RIP: 0033:0x7fd5003cee26
May 14 16:49:14 AVHOST01 kernel: Code: 7c 24 08 e8 7c 0f f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 a6 0f f9 ff 8b 44
May 14 16:49:14 AVHOST01 kernel: RSP: 002b:00007ffec0464ef0 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
May 14 16:49:14 AVHOST01 kernel: RAX: ffffffffffffffda RBX: 00005612e12956b0 RCX: 00007fd5003cee26
May 14 16:49:14 AVHOST01 kernel: RDX: 00007ffec0464f10 RSI: 0000000000000049 RDI: 00005612e1d031f0
May 14 16:49:14 AVHOST01 kernel: RBP: 00007ffec0464f7c R08: 0000000000000008 R09: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffec0464f10
May 14 16:49:14 AVHOST01 kernel: R13: 00005612e12956b0 R14: 00007ffec0464f80 R15: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: </TASK>
May 14 16:49:14 AVHOST01 kernel: Modules linked in: veth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables nfnetlink_cttimeout bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log nfnetlink i915 ttm drm_kms_helper cec intel_rapl_msr rc_core intel_rapl_common i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi btusb mt7921e snd_hda_codec coretemp btrtl mt76_connac_lib btbcm mt76 snd_hda_core btintel snd_hwdep bluetooth kvm_intel snd_pcm ecdh_generic mei_hdcp ecc mac80211 kvm irqbypass snd_timer crct10dif_pclmul snd cfg80211 ghash_clmulni_intel aesni_intel gigabyte_wmi crypto_simd cryptd wmi_bmof pcspkr efi_pstore libarc4 soundcore ov01a1s mei_me power_ctrl_logic mei v4l2_fwnode
May 14 16:49:14 AVHOST01 kernel: v4l2_async videodev intel_hid mc acpi_pad sparse_keymap acpi_tad zfs(PO) zunicode(PO) mac_hid zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul nvme atlantic xhci_pci ahci xhci_pci_renesas macsec r8169 libahci nvme_core xhci_hcd realtek wmi video
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: ---[ end trace 361f80607622d2a5 ]---
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:16 AVHOST01 pvedaemon[122639]: VM 101 qmp command failed - VM 101 not running
-- Reboot --

Any thoughts?
Hi,
please post the output of pveversion -v and uname -a as well as the VM config qm config <VMID>.

You can try the following:
 
Hi Chris

Here is outputs
#pveversion -v
root@AVHOST01:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1


#uname -a
Linux AVHOST01 5.15.102-1-pve #1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z) x86_64 GNU/Linux


When I try to update the kernel I get the following:
root@AVHOST01:~# apt install pve-kernel-6.2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package pve-kernel-6.2
E: Couldn't find any package by glob 'pve-kernel-6.2'
 
Hi Chris

Here is outputs
#pveversion -v
root@AVHOST01:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1


#uname -a
Linux AVHOST01 5.15.102-1-pve #1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z) x86_64 GNU/Linux


When I try to update the kernel I get the following:
root@AVHOST01:~# apt install pve-kernel-6.2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package pve-kernel-6.2
E: Couldn't find any package by glob 'pve-kernel-6.2'
Make sure to correctly configure the apt repositories, use the no-subscription repo if you have no support subscription, see https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_no_subscription_repo
 
Hi Chris
I've done the kernel update on all 3 hosts.
They do perform better now, but 2 of them just randomly crash.
Sometimes they go off completely and sometimes they just hang and I have to hold the power button down until they go off then I turn them back again.

I'm not sure what to do now.
 
Please provide the journal from around the time when the host crashes journalctl --since <DATETIME> --unitl <DATETIME> replacing the <DATETIME> with a meaningfull start and end range.

Do all 3 nodes have the same specifications?
 
Hi Chris

Issue is resolved.. it was a hardware issue .. CPU was overheating and shutting down.

Thank you for your help.