Proxmox crashes every 1-5 days

wrealcon

New Member
Sep 20, 2021
2
0
1
41
Hello,

I have upgraded Proxmox from 6.2 to 6.4 and encountered crashes. After that, I've upgraded to 7.0, but the crashes still happen.
In the syslog it looks like this:

Code:
Sep  8 10:34:29 buildmachine kernel: [592357.174566] BUG: kernel NULL pointer dereference, address: 0000000000000008
Sep  8 10:34:29 buildmachine kernel: [592357.174585] #PF: supervisor write access in kernel mode
Sep  8 10:34:29 buildmachine kernel: [592357.174589] #PF: error_code(0x0002) - not-present page
Sep  8 10:34:29 buildmachine kernel: [592357.174593] PGD 424c58067 P4D 424c58067 PUD 1e8270067 PMD 0
Sep  8 10:34:29 buildmachine kernel: [592357.174599] Oops: 0002 [#1] SMP NOPTI
Sep  8 10:34:29 buildmachine kernel: [592357.174604] CPU: 64 PID: 3593568 Comm: cc1plus Tainted: P           O      5.11.22-3-pve #1
Sep  8 10:34:29 buildmachine kernel: [592357.174610] Hardware name: Gigabyte Technology Co., Ltd. TRX40 DESIGNARE/TRX40 DESIGNARE, BIOS F4q 04/12/2021
Sep  8 10:34:29 buildmachine kernel: [592357.174615] RIP: 0010:get_page_from_freelist+0x926/0x1160
Sep  8 10:34:29 buildmachine kernel: [592357.174623] Code: 4d 89 f8 49 83 e8 08 0f 84 52 01 00 00 0f 1f 44 00 00 49 8b 37 49 8b 57 08 44 8d 49 ff 48 bf 22 01 00 00 00 00 ad de 4d 63 c9 <48> 89 56 08 48 89 32 48 8d 14 80 48 be 00 01 00 00 00 00 ad de 49
Sep  8 10:34:29 buildmachine kernel: [592357.174630] RSP: 0000:ffffac2702acbc58 EFLAGS: 00010086
Sep  8 10:34:29 buildmachine kernel: [592357.174635] RAX: 0000000000000000 RBX: 0000000000000010 RCX: 0000000000000000
Sep  8 10:34:29 buildmachine kernel: [592357.174640] RDX: ffff9cdeeefd6c50 RSI: 0000000000000000 RDI: dead000000000122
Sep  8 10:34:29 buildmachine kernel: [592357.174644] RBP: ffffac2702acbd48 R08: fffff1bc49b84890 R09: ffffffffffffffff
Sep  8 10:34:29 buildmachine kernel: [592357.174648] R10: 0000000000000100 R11: 0000000000000000 R12: ffff9cddede32180
Sep  8 10:34:29 buildmachine kernel: [592357.174652] R13: 0000000000000010 R14: ffff9cdeeefd6b80 R15: fffff1bc49b84898
Sep  8 10:34:29 buildmachine kernel: [592357.174656] FS:  00007f8ff84afac0(0000) GS:ffff9cddede00000(0000) knlGS:0000000000000000
Sep  8 10:34:29 buildmachine kernel: [592357.174661] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep  8 10:34:29 buildmachine kernel: [592357.174665] CR2: 0000000000000008 CR3: 000000022c9b0000 CR4: 0000000000350ee0
Sep  8 10:34:29 buildmachine kernel: [592357.174669] Call Trace:
Sep  8 10:34:29 buildmachine kernel: [592357.174675]  __alloc_pages_nodemask+0x164/0x310
Sep  8 10:34:29 buildmachine kernel: [592357.174681]  alloc_pages_vma+0x87/0x270
Sep  8 10:34:29 buildmachine kernel: [592357.174692]  handle_mm_fault+0xf72/0x1a70
Sep  8 10:34:29 buildmachine kernel: [592357.174697]  do_user_addr_fault+0x1a3/0x450
Sep  8 10:34:29 buildmachine kernel: [592357.174702]  ? exit_to_user_mode_prepare+0x37/0x190
Sep  8 10:34:29 buildmachine kernel: [592357.174709]  exc_page_fault+0x6c/0x150
Sep  8 10:34:29 buildmachine kernel: [592357.174715]  ? asm_exc_page_fault+0x8/0x30
Sep  8 10:34:29 buildmachine kernel: [592357.174720]  asm_exc_page_fault+0x1e/0x30
Sep  8 10:34:29 buildmachine kernel: [592357.174724] RIP: 0033:0x7f8ff85477cb
Sep  8 10:34:29 buildmachine kernel: [592357.174729] Code: fb b3 14 00 48 89 73 60 48 39 cb 0f 95 c1 48 83 ca 01 0f b6 c9 48 c1 e1 02 4c 09 e9 48 83 c9 01 48 89 48 08 8b 05 99 e6 14 00 <48> 89 56 08 85 c0 0f 84 ff f8 ff ff e9 6a fc ff ff 64 49 8b 04 24
Sep  8 10:34:29 buildmachine kernel: [592357.174736] RSP: 002b:00007fff8c85e5b0 EFLAGS: 00010206
Sep  8 10:34:29 buildmachine kernel: [592357.174740] RAX: 0000000000000000 RBX: 00007f8ff8692ba0 RCX: 0000000000001f71
Sep  8 10:34:29 buildmachine kernel: [592357.174744] RDX: 0000000000008861 RSI: 00000000026f17a0 RDI: 0000000000000003
Sep  8 10:34:29 buildmachine kernel: [592357.1747Sep  8 11:04:13 deimos dmeventd[1632]: dmeventd ready for processing.

But very often syslog contains garbage like this:
Code:
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Sep 20 11:26:02 deimos dmeventd[1656]: dmeventd ready for processing.
Sep 20 11:26:02 buildmachine kernel: [    0.000000] Linux version 5.11.22-4-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.11.22-8 (Fri, 27 Aug 2021 11:51:34 +0200) ()


This machine has around 30 (in total) containers and VMs (both Linux and Windows XP, 7, 10). Additionally, some containers host docker containers.

What options do I have to investigate the root cause of this issue?

Kind regards.
 
We have the same suspicion thus in the next week we plan downtime to perform mem-tests. I hope this is the HW issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!