Host crashing under VM small load

Moatasem

New Member
Apr 14, 2022
10
0
1
Hi all

I've a brand new Proxmox 3 nodes cluster with full mesh Ceph storage.
Specs per host are:
24 x 13th Gen Intel(R) Core(TM) i7-13700 (1 Socket)
128 Gi RAM
2 x 1Gbp Nic for network traffic.
2 x 10Gbp Nic for Ceph traffic.
2 x 4TB SSD for storage.
1 x 2TB NVMe SSD for OS

On host 1 when I try to load it with VM booting or cloning some times the host crashes and shutdown and I have to manually start it up again.
Looking at the logs for host 1 I find the following:

May 14 16:49:14 AVHOST01 kernel: BUG: unable to handle page fault for address: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: #PF: supervisor write access in kernel mode
May 14 16:49:14 AVHOST01 kernel: #PF: error_code(0x0002) - not-present page
May 14 16:49:14 AVHOST01 kernel: PGD a38601067 P4D a38601067 PUD 0
May 14 16:49:14 AVHOST01 kernel: Oops: 0002 [#1] SMP NOPTI
May 14 16:49:14 AVHOST01 kernel: CPU: 2 PID: 121889 Comm: kvm Tainted: P O 5.15.102-1-pve #1
May 14 16:49:14 AVHOST01 kernel: Hardware name: Gigabyte Technology Co., Ltd. B760 DS3H AX DDR4/B760 DS3H AX DDR4, BIOS F1 10/03/2022
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:14 AVHOST01 kernel: Call Trace:
May 14 16:49:14 AVHOST01 kernel: <TASK>
May 14 16:49:14 AVHOST01 kernel: poll_freewait+0x6f/0xb0
May 14 16:49:14 AVHOST01 kernel: do_sys_poll+0x56e/0x690
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: __x64_sys_ppoll+0xbc/0x150
May 14 16:49:14 AVHOST01 kernel: do_syscall_64+0x59/0xc0
May 14 16:49:14 AVHOST01 kernel: ? syscall_exit_to_user_mode+0x27/0x50
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? sysvec_apic_timer_interrupt+0x4e/0x90
May 14 16:49:14 AVHOST01 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
May 14 16:49:14 AVHOST01 kernel: RIP: 0033:0x7fd5003cee26
May 14 16:49:14 AVHOST01 kernel: Code: 7c 24 08 e8 7c 0f f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 a6 0f f9 ff 8b 44
May 14 16:49:14 AVHOST01 kernel: RSP: 002b:00007ffec0464ef0 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
May 14 16:49:14 AVHOST01 kernel: RAX: ffffffffffffffda RBX: 00005612e12956b0 RCX: 00007fd5003cee26
May 14 16:49:14 AVHOST01 kernel: RDX: 00007ffec0464f10 RSI: 0000000000000049 RDI: 00005612e1d031f0
May 14 16:49:14 AVHOST01 kernel: RBP: 00007ffec0464f7c R08: 0000000000000008 R09: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffec0464f10
May 14 16:49:14 AVHOST01 kernel: R13: 00005612e12956b0 R14: 00007ffec0464f80 R15: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: </TASK>
May 14 16:49:14 AVHOST01 kernel: Modules linked in: veth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables nfnetlink_cttimeout bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log nfnetlink i915 ttm drm_kms_helper cec intel_rapl_msr rc_core intel_rapl_common i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi btusb mt7921e snd_hda_codec coretemp btrtl mt76_connac_lib btbcm mt76 snd_hda_core btintel snd_hwdep bluetooth kvm_intel snd_pcm ecdh_generic mei_hdcp ecc mac80211 kvm irqbypass snd_timer crct10dif_pclmul snd cfg80211 ghash_clmulni_intel aesni_intel gigabyte_wmi crypto_simd cryptd wmi_bmof pcspkr efi_pstore libarc4 soundcore ov01a1s mei_me power_ctrl_logic mei v4l2_fwnode
May 14 16:49:14 AVHOST01 kernel: v4l2_async videodev intel_hid mc acpi_pad sparse_keymap acpi_tad zfs(PO) zunicode(PO) mac_hid zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul nvme atlantic xhci_pci ahci xhci_pci_renesas macsec r8169 libahci nvme_core xhci_hcd realtek wmi video
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: ---[ end trace 361f80607622d2a5 ]---
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:16 AVHOST01 pvedaemon[122639]: VM 101 qmp command failed - VM 101 not running
-- Reboot --

Any thoughts?
 
Hi all

I've a brand new Proxmox 3 nodes cluster with full mesh Ceph storage.
Specs per host are:
24 x 13th Gen Intel(R) Core(TM) i7-13700 (1 Socket)
128 Gi RAM
2 x 1Gbp Nic for network traffic.
2 x 10Gbp Nic for Ceph traffic.
2 x 4TB SSD for storage.
1 x 2TB NVMe SSD for OS

On host 1 when I try to load it with VM booting or cloning some times the host crashes and shutdown and I have to manually start it up again.
Looking at the logs for host 1 I find the following:

May 14 16:49:14 AVHOST01 kernel: BUG: unable to handle page fault for address: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: #PF: supervisor write access in kernel mode
May 14 16:49:14 AVHOST01 kernel: #PF: error_code(0x0002) - not-present page
May 14 16:49:14 AVHOST01 kernel: PGD a38601067 P4D a38601067 PUD 0
May 14 16:49:14 AVHOST01 kernel: Oops: 0002 [#1] SMP NOPTI
May 14 16:49:14 AVHOST01 kernel: CPU: 2 PID: 121889 Comm: kvm Tainted: P O 5.15.102-1-pve #1
May 14 16:49:14 AVHOST01 kernel: Hardware name: Gigabyte Technology Co., Ltd. B760 DS3H AX DDR4/B760 DS3H AX DDR4, BIOS F1 10/03/2022
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:14 AVHOST01 kernel: Call Trace:
May 14 16:49:14 AVHOST01 kernel: <TASK>
May 14 16:49:14 AVHOST01 kernel: poll_freewait+0x6f/0xb0
May 14 16:49:14 AVHOST01 kernel: do_sys_poll+0x56e/0x690
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: ? __pollwait+0xe0/0xe0
May 14 16:49:14 AVHOST01 kernel: __x64_sys_ppoll+0xbc/0x150
May 14 16:49:14 AVHOST01 kernel: do_syscall_64+0x59/0xc0
May 14 16:49:14 AVHOST01 kernel: ? syscall_exit_to_user_mode+0x27/0x50
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? do_syscall_64+0x69/0xc0
May 14 16:49:14 AVHOST01 kernel: ? sysvec_apic_timer_interrupt+0x4e/0x90
May 14 16:49:14 AVHOST01 kernel: entry_SYSCALL_64_after_hwframe+0x61/0xcb
May 14 16:49:14 AVHOST01 kernel: RIP: 0033:0x7fd5003cee26
May 14 16:49:14 AVHOST01 kernel: Code: 7c 24 08 e8 7c 0f f9 ff 4c 8b 54 24 18 48 8b 74 24 10 41 b8 08 00 00 00 41 89 c1 48 8b 7c 24 08 4c 89 e2 b8 0f 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 2a 44 89 cf 89 44 24 08 e8 a6 0f f9 ff 8b 44
May 14 16:49:14 AVHOST01 kernel: RSP: 002b:00007ffec0464ef0 EFLAGS: 00000293 ORIG_RAX: 000000000000010f
May 14 16:49:14 AVHOST01 kernel: RAX: ffffffffffffffda RBX: 00005612e12956b0 RCX: 00007fd5003cee26
May 14 16:49:14 AVHOST01 kernel: RDX: 00007ffec0464f10 RSI: 0000000000000049 RDI: 00005612e1d031f0
May 14 16:49:14 AVHOST01 kernel: RBP: 00007ffec0464f7c R08: 0000000000000008 R09: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffec0464f10
May 14 16:49:14 AVHOST01 kernel: R13: 00005612e12956b0 R14: 00007ffec0464f80 R15: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: </TASK>
May 14 16:49:14 AVHOST01 kernel: Modules linked in: veth snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables nfnetlink_cttimeout bonding tls openvswitch nsh nf_conncount nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 softdog nfnetlink_log nfnetlink i915 ttm drm_kms_helper cec intel_rapl_msr rc_core intel_rapl_common i2c_algo_bit fb_sys_fops syscopyarea sysfillrect sysimgblt x86_pkg_temp_thermal intel_powerclamp snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi btusb mt7921e snd_hda_codec coretemp btrtl mt76_connac_lib btbcm mt76 snd_hda_core btintel snd_hwdep bluetooth kvm_intel snd_pcm ecdh_generic mei_hdcp ecc mac80211 kvm irqbypass snd_timer crct10dif_pclmul snd cfg80211 ghash_clmulni_intel aesni_intel gigabyte_wmi crypto_simd cryptd wmi_bmof pcspkr efi_pstore libarc4 soundcore ov01a1s mei_me power_ctrl_logic mei v4l2_fwnode
May 14 16:49:14 AVHOST01 kernel: v4l2_async videodev intel_hid mc acpi_pad sparse_keymap acpi_tad zfs(PO) zunicode(PO) mac_hid zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 btrfs blake2b_generic xor zstd_compress raid6_pq simplefb hid_generic usbhid hid dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c crc32_pclmul nvme atlantic xhci_pci ahci xhci_pci_renesas macsec r8169 libahci nvme_core xhci_hcd realtek wmi video
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210
May 14 16:49:14 AVHOST01 kernel: ---[ end trace 361f80607622d2a5 ]---
May 14 16:49:14 AVHOST01 kernel: RIP: 0010:remove_wait_queue+0x29/0x50
May 14 16:49:14 AVHOST01 kernel: Code: 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 53 48 89 f3 e8 39 d3 c7 00 48 8b 53 18 4c 89 e7 48 89 c6 48 8b 43 20 48 89 42 08 <48> 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 43 18 48 83 c0 22 48
May 14 16:49:14 AVHOST01 kernel: RSP: 0018:ffff9b374afa79f8 EFLAGS: 00010046
May 14 16:49:14 AVHOST01 kernel: RAX: ffff8d90d6f42210 RBX: ffff8db1097886e0 RCX: 0000000000000000
May 14 16:49:14 AVHOST01 kernel: RDX: ffff8db0d6f42210 RSI: 0000000000000282 RDI: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: RBP: ffff9b374afa7a08 R08: 0000000000000003 R09: ffff8dafd36c43a0
May 14 16:49:14 AVHOST01 kernel: R10: ffff9b3741b33c28 R11: 0000000000000000 R12: ffff8db0d6f42208
May 14 16:49:14 AVHOST01 kernel: R13: ffff8db109788000 R14: ffff9b374afa7be8 R15: 0000000000000009
May 14 16:49:14 AVHOST01 kernel: FS: 00007fd4fda40200(0000) GS:ffff8dceff680000(0000) knlGS:0000000000000000
May 14 16:49:14 AVHOST01 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 14 16:49:14 AVHOST01 kernel: CR2: ffff8d90d6f42210 CR3: 0000000235234002 CR4: 0000000000772ee0
May 14 16:49:14 AVHOST01 kernel: PKRU: 55555554
May 14 16:49:16 AVHOST01 pvedaemon[122639]: VM 101 qmp command failed - VM 101 not running
-- Reboot --

Any thoughts?
Hi,
please post the output of pveversion -v and uname -a as well as the VM config qm config <VMID>.

You can try the following:
 
Hi Chris

Here is outputs
#pveversion -v
root@AVHOST01:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1


#uname -a
Linux AVHOST01 5.15.102-1-pve #1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z) x86_64 GNU/Linux


When I try to update the kernel I get the following:
root@AVHOST01:~# apt install pve-kernel-6.2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package pve-kernel-6.2
E: Couldn't find any package by glob 'pve-kernel-6.2'
 
Hi Chris

Here is outputs
#pveversion -v
root@AVHOST01:~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.102-1-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.3-3
pve-kernel-5.15.102-1-pve: 5.15.102-1
ceph: 17.2.5-pve1
ceph-fuse: 17.2.5-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-3
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-1
libpve-rs-perl: 0.7.5
libpve-storage-perl: 7.4-2
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
openvswitch-switch: 2.15.0+ds1-2+deb11u4
proxmox-backup-client: 2.3.3-1
proxmox-backup-file-restore: 2.3.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.6.3
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20221111-1
pve-firewall: 4.3-1
pve-firmware: 3.6-4
pve-ha-manager: 3.6.0
pve-i18n: 2.11-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.9-pve1


#uname -a
Linux AVHOST01 5.15.102-1-pve #1 SMP PVE 5.15.102-1 (2023-03-14T13:48Z) x86_64 GNU/Linux


When I try to update the kernel I get the following:
root@AVHOST01:~# apt install pve-kernel-6.2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
E: Unable to locate package pve-kernel-6.2
E: Couldn't find any package by glob 'pve-kernel-6.2'
Make sure to correctly configure the apt repositories, use the no-subscription repo if you have no support subscription, see https://pve.proxmox.com/wiki/Package_Repositories#sysadmin_no_subscription_repo
 
Hi Chris
I've done the kernel update on all 3 hosts.
They do perform better now, but 2 of them just randomly crash.
Sometimes they go off completely and sometimes they just hang and I have to hold the power button down until they go off then I turn them back again.

I'm not sure what to do now.
 
Please provide the journal from around the time when the host crashes journalctl --since <DATETIME> --unitl <DATETIME> replacing the <DATETIME> with a meaningfull start and end range.

Do all 3 nodes have the same specifications?
 
Hi Chris

Issue is resolved.. it was a hardware issue .. CPU was overheating and shutting down.

Thank you for your help.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!