Kernel Panic in Proxmox 6.2

Kosten

New Member
Sep 21, 2020
4
0
1
40
Hello. I have periodical problem: hypervisor hang, no network ping or ssh access to hypervisor or VM. Black screen on display, no reaction on keyboard. Only reset button help. Memtest86 not found any problem. All updates installed. No subscription key. It's a clear install of PVE.

I have to setup writing kernel dumps on hangs with automatic reboot. I attach kernel log in zip file. Also have memory dumps (2-5Gb each), but can't analyze it.

My env:
pveversion: pve-manager/6.2-11/22fb4983 (running kernel: 5.4.60-1-pve)

CPU: AMD Ryzen 5 3500
MB: Gigabyte B450M DS3H-CF bios F51 (latest)
RAM: some Samsung 4x8Gb

Please help resolve problem.
 

Attachments

@Kosten The log messages are showing nfs server connection comes and goes. How is your PVE host connected to nfs? Is nfs a physical storage server? Are you using a separate network card on PVE to connect nfs?
 
@Kosten The log messages are showing nfs server connection comes and goes. How is your PVE host connected to nfs? Is nfs a physical storage server? Are you using a separate network card on PVE to connect nfs?
Yes, my PVE host connected to nfs storage. It's a separate hardware PC with FreeNas 11. PVE and NFS storage has only one network card. As i can see in logs, NFS errors occurs in time backup so NFS link has high utilization.

Kernel crash not linked to nfs utilisation, 2020-09-17 crash occur in working hours nfs wasn't utilized.

Code:
[27076.710390] nfs: server 10.0.12.11 OK
[66446.337797] perf: interrupt took too long (2531 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[90134.500452] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_idle+0x26e/0x270
 
Fresh crash log.

[309609.415549] nfs: server 10.0.12.11 OK
[322572.015463] hrtimer: interrupt took 1012 ns
[323549.300216] BUG: kernel NULL pointer dereference, address: 0000000000000056
[323549.300242] #PF: supervisor read access in kernel mode
[323549.300256] #PF: error_code(0x0000) - not-present page
[323549.300270] PGD 0 P4D 0
[323549.300279] Oops: 0000 [#1] SMP NOPTI
[323549.300291] CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: P O 5.4.60-1-pve #1
[323549.300313] Hardware name: Gigabyte Technology Co., Ltd. B450M DS3H/B450M DS3H-CF, BIOS F51 07/29/2020
[323549.300339] RIP: 0010:load_balance+0x826/0xaf0
[323549.300352] Code: ff 01 c0 89 43 48 e9 9e 00 00 00 48 8b 85 28 ff ff ff c7 00 00 00 00 00 48 8b 85 38 ff ff ff 48 85 c0 74 1c f6 45 a8 01 75 16 <48> 8b 40 10 48 8b 40 10 8b 50 28 85 d2 74 07 c7 40 28 00 00 00 00
[323549.300397] RSP: 0018:ffffa70980003de0 EFLAGS: 00010246
[323549.300411] RAX: 0000000000000046 RBX: 0000000000000000 RCX: 0000000000000000
[323549.300429] RDX: 0000000000000000 RSI: ffff90317b76c000 RDI: 0000000000000000
[323549.300448] RBP: ffffa70980003ec0 R08: 0000000000000000 R09: 000000000000eb00
[323549.300466] R10: 0000000000000002 R11: 0000000000000000 R12: ffff90317aed1500
[323549.300484] R13: 0000000000000000 R14: 0000000104d159eb R15: 0000000000000000
[323549.300503] FS: 0000000000000000(0000) GS:ffff90317e800000(0000) knlGS:0000000000000000
[323549.300523] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[323549.300538] CR2: 0000000000000056 CR3: 00000007a41b0000 CR4: 0000000000340ef0
[323549.300556] Call Trace:
[323549.300565] <IRQ>
[323549.300573] rebalance_domains+0x24f/0x2e0
[323549.300586] ? enqueue_hrtimer+0x3c/0x90
[323549.300598] run_rebalance_domains+0x7a/0xa0
[323549.300611] __do_softirq+0xdc/0x2d4
[323549.300623] irq_exit+0xa9/0xb0
[323549.300633] smp_apic_timer_interrupt+0x79/0x130
[323549.300646] apic_timer_interrupt+0xf/0x20
[323549.300657] </IRQ>
[323549.300665] RIP: 0010:cpuidle_enter_state+0xbd/0x450
[323549.300680] Code: ff e8 57 91 84 ff 80 7d c7 00 74 17 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 63 03 00 00 31 ff e8 7a 01 8b ff fb 66 0f 1f 44 00 00 <45> 85 ed 0f 88 8d 02 00 00 49 63 cd 48 8b 75 d0 48 2b 75 c8 48 8d
[323549.300726] RSP: 0018:ffffffffac203de8 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[323549.300746] RAX: ffff90317e82ad40 RBX: ffffffffac366680 RCX: 000000000000001f
[323549.300764] RDX: 00012644311b3452 RSI: 00000000239f5376 RDI: 0000000000000000
[323549.300782] RBP: ffffffffac203e28 R08: 0000000000000002 R09: 000000000002a5c0
[323549.300800] R10: 00042170370e8be4 R11: ffff90317e8299e0 R12: ffff9031706d4c00
[323549.300818] R13: 0000000000000001 R14: ffffffffac3666f8 R15: ffffffffac3666e0
[323549.300837] ? cpuidle_enter_state+0x99/0x450
[323549.300850] cpuidle_enter+0x2e/0x40
[323549.300861] call_cpuidle+0x23/0x40
[323549.300871] do_idle+0x22c/0x270
[323549.300881] cpu_startup_entry+0x1d/0x20
[323549.300893] rest_init+0xae/0xb0
[323549.300904] arch_call_rest_init+0xe/0x1b
[323549.300916] start_kernel+0x54c/0x56e
[323549.300926] x86_64_start_reservations+0x24/0x26
[323549.300939] x86_64_start_kernel+0x74/0x77
[323549.300952] secondary_startup_64+0xa4/0xb0
[323549.300964] Modules linked in: tcp_diag inet_diag nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter softdog nfnetlink_log nfnetlink zfs(PO) zunicode(PO) zlua(PO) zavl(PO) edac_mce_amd icp(PO) snd_hda_codec_hdmi kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio nouveau mxm_wmi video ttm snd_hda_intel drm_kms_helper snd_intel_dspcfg snd_hda_codec drm snd_hda_core input_leds snd_hwdep i2c_algo_bit serio_raw pcspkr snd_pcm wmi_bmof fb_sys_fops syscopyarea sysfillrect snd_timer sysimgblt snd soundcore ccp k10temp mac_hid zcommon(PO) znvpair(PO) spl(O) vhost_net vhost tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi sunrpc ip_tables x_tables autofs4 btrfs xor zstd_compress raid6_pq dm_thin_pool
[323549.300992] dm_persistent_data dm_bio_prison dm_bufio libcrc32c hid_generic usbmouse usbhid hid i2c_piix4 r8169 realtek xhci_pci ahci xhci_hcd libahci wmi gpio_amdpt gpio_generic
[323549.306354] CR2: 0000000000000056