Pve hangs with 7.2.

ozgurerdogan

Renowned Member
May 2, 2010
604
5
83
Bursa, Turkey, Turkey
I am not sure if it is 7.2 realted but it hangs and needs hard reboot.
Code:
May 23 10:03:02 s7 kernel: [893783.712903] show_signal: 8 callbacks suppressed
May 23 10:03:02 s7 kernel: [893783.712906] traps: pvescheduler[2627784] general protection fault ip:55c0dffe3f94 sp:7ffd3cb21a60 error:0 in perl[55c0dff2c000+185000]
May 23 10:04:11 s7 pvescheduler[2630746]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
May 23 10:04:48 s7 kernel: [893889.958084] BUG: Bad page state in process kvm  pfn:ffffff220738ad96
May 23 10:04:48 s7 kernel: [893889.958949] page:00000000fa32aaf3 refcount:-14506 mapcount:0 mapping:0000000000000000 index:0xffff8fd80e2b65b0 pfn:0xffffff220738ad96
May 23 10:04:48 s7 kernel: [893889.960356] memcg:ffff8fd80e2b65d0
May 23 10:04:48 s7 kernel: [893889.960356] flags: 0xffffc756b8aaeec8(waiters|dirty|workingset|slab|owner_priv_1|arch_1|private|private_2|writeback|mappedtodisk|swapbacked|mlocked|hwpoison|node=1023|zone=7|lastcpupid=0x1f1d5a)
May 23 10:04:48 s7 kernel: [893889.963103] raw: ffffc756b8aaeec8 dead000000000100 dead000000000122 ffff8fd80e2b65b0
May 23 10:04:48 s7 kernel: [893889.963103] raw: ffff8fd80e2b65b0 ffffc756862cc008 ffffc756b3610108 ffff8fd80e2b65d0
May 23 10:04:48 s7 kernel: [893889.963103] page dumped because: page still charged to cgroup
May 23 10:04:48 s7 kernel: [893889.963103] Modules linked in: joydev input_leds hid_generic usbmouse usbkbd usbhid hid uas usb_storage veth tcp_diag inet_diag ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_tcpudp xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip_set_hash_net ip_set nf_tables softdog bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common edac_mce_amd amdgpu snd_hda_codec_realtek kvm_amd snd_hda_codec_generic ledtrig_audio kvm snd_hda_codec_hdmi iommu_v2 gpu_sched drm_ttm_helper irqbypass ttm snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi crct10dif_pclmul snd_hda_codec drm_kms_helper ghash_clmulni_intel aesni_intel snd_hda_core cec rc_core snd_hwdep crypto_simd i2c_algo_bit snd_pcm cryptd fb_sys_fops eeepc_wmi syscopyarea snd_timer asus_wmi rapl sysfillrect sysimgblt snd platform_profile soundcore
May 23 10:04:48 s7 kernel: [893889.963103]  sparse_keymap ccp video pcspkr k10temp efi_pstore wmi_bmof mac_hid vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi msr drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs blake2b_generic xor zstd_compress raid6_pq libcrc32c simplefb xhci_pci xhci_pci_renesas crc32_pclmul nvme ahci i2c_piix4 r8169 realtek xhci_hcd libahci nvme_core wmi gpio_amdpt gpio_generic
May 23 10:04:48 s7 kernel: [893889.971099] CPU: 12 PID: 228872 Comm: kvm Tainted: P           O      5.15.35-1-pve #1
May 23 10:04:48 s7 kernel: [893889.975104] Hardware name: ASUS System Product Name/PRIME B550M-K, BIOS 1401 12/03/2020
May 23 10:04:48 s7 kernel: [893889.975104] Call Trace:
May 23 10:04:48 s7 kernel: [893889.975104]  <TASK>
May 23 10:04:48 s7 kernel: [893889.975104]  dump_stack_lvl+0x4a/0x5f
May 23 10:04:48 s7 kernel: [893889.975104]  dump_stack+0x10/0x12
May 23 10:04:48 s7 kernel: [893889.975104]  bad_page.cold+0x63/0x94
May 23 10:04:48 s7 kernel: [893889.975104]  check_free_page_bad+0x66/0x70
May 23 10:04:48 s7 kernel: [893889.975104]  free_pcppages_bulk+0x1c3/0x390
May 23 10:04:48 s7 kernel: [893889.975104]  free_unref_page_commit.constprop.0+0x12b/0x170
May 23 10:04:48 s7 kernel: [893889.975104]  free_unref_page_list+0x1b3/0x320
May 23 10:04:48 s7 kernel: [893889.975104]  release_pages+0x165/0x530
May 23 10:04:48 s7 kernel: [893889.983110]  free_pages_and_swap_cache+0x48/0x60
May 23 10:04:48 s7 kernel: [893889.983110]  tlb_finish_mmu+0x89/0x1c0
May 23 10:04:48 s7 kernel: [893889.983110]  zap_page_range+0x120/0x170
May 23 10:04:48 s7 kernel: [893889.983110]  do_madvise.part.0+0x8ca/0xf20
May 23 10:04:48 s7 kernel: [893889.983110]  ? do_syscall_64+0x69/0xc0
May 23 10:04:48 s7 kernel: [893889.983110]  ? exit_to_user_mode_prepare+0x37/0x1b0
May 23 10:04:48 s7 kernel: [893889.983110]  __x64_sys_madvise+0x58/0x70
May 23 10:04:48 s7 kernel: [893889.987101]  do_syscall_64+0x5c/0xc0
May 23 10:04:48 s7 kernel: [893889.987101]  ? do_syscall_64+0x69/0xc0
May 23 10:04:48 s7 kernel: [893889.987101]  ? do_syscall_64+0x69/0xc0
May 23 10:04:48 s7 kernel: [893889.987101]  ? asm_sysvec_apic_timer_interrupt+0xa/0x20
May 23 10:04:48 s7 kernel: [893889.987101]  entry_SYSCALL_64_after_hwframe+0x44/0xae
May 23 10:04:48 s7 kernel: [893889.991106] RIP: 0033:0x7f3b047d5cf7
May 23 10:04:48 s7 kernel: [893889.991106] Code: ff ff ff ff c3 66 0f 1f 44 00 00 48 8b 15 91 51 0c 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 b8 1c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 69 51 0c 00 f7 d8 64 89 01 48
May 23 10:04:48 s7 kernel: [893889.991106] RSP: 002b:00007f3af8958e68 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
May 23 10:04:48 s7 kernel: [893889.991106] RAX: ffffffffffffffda RBX: 0000556ee42f6350 RCX: 00007f3b047d5cf7
May 23 10:04:48 s7 kernel: [893889.991106] RDX: 0000000000000004 RSI: 0000000000200000 RDI: 00007f3a3b400000
May 23 10:04:48 s7 kernel: [893889.991106] RBP: 00000000ffffffff R08: 0000000100000000 R09: 0000000000000000
May 23 10:04:48 s7 kernel: [893889.995104] R10: 00000000ffffffff R11: 0000000000000246 R12: 0000000000200000
May 23 10:04:48 s7 kernel: [893889.995104] R13: 00007f3a3b400000 R14: 00007f3af895c098 R15: 000000004f600000
May 23 10:04:48 s7 kernel: [893889.995104]  </TASK>
Any suggestion is welcome. Logs:
 
was this a one-off occurrence or can you reproduce it?
 
did you notice whether any VMs crashed at the same time?
 
were there any backups running at the time?
 
no - but please double check
- your memory (e.g., using memtest86, available on the 7.2 iso as option in the menu)
- your on-disk files (e.g., using debsums)

it's of course also possible that it is a kernel bug and you are just the first person on PVE to trigger it, so if the tests above come back clean and the hang re-occurs, please tell us so here!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!