PVE 8.2.2 freezes every 2-3 days

wingyiulam

New Member
Jun 6, 2024
3
0
1
Hi all, I am new to Proxmox. I am so confused if my mini pc is bad or the PVE itself, my pc needs to hard restart every 2-3 days due to a crash/ freeze. i already turned off c-state & turbo boost in bio for increasing stability, but no luck. here is a recent log, all green icons on PVE turns into question mark, no crash yet. I restarted the pvestated, gui looks normal now; however, I have no clue when is it going to be freezes again.

mini pc
N100 (c-state & turbo boost disabled)
4 x 2.5 intel I225-V
32gb Ram DDR5
Samsung 970 Pro NVME
1 Disk of 500gb NVME wZFS (options zfs zfs_arc_max=4294967296, options zfs zfs_arc_min=2147483648)
No overheating issue
No subscription
No cluster/ Ceph
Backup through PBS 3.2-3 on an external server (PBS is stable no issue)
running 4 vm & lxc daily, pfsesne, cloudflared, bitwarden & vsftp server.
CPU idle at 12% & up to 45%, Memory usage 44%. Storage usage 7%

here is my recent log

Thanks,
Wing
 

Attachments

  • pve_log.txt
    32.8 KB · Views: 3
  • Screen Shot 2024-06-14 at 12.04.14 PM.png
    Screen Shot 2024-06-14 at 12.04.14 PM.png
    311.3 KB · Views: 6
  • Screen Shot 2024-06-14 at 12.12.26 PM.png
    Screen Shot 2024-06-14 at 12.12.26 PM.png
    571.9 KB · Views: 5
Last edited:
The system just crashed; pvestatd restart doesn't work anymore. All vm & lxc are still running without any stats on GUI, In my experience, they will all eventually stop running sooner or later.
 

Attachments

  • Screen Shot 2024-06-14 at 10.54.04 PM.png
    Screen Shot 2024-06-14 at 10.54.04 PM.png
    564 KB · Views: 1
  • Screen Shot 2024-06-14 at 10.54.12 PM.png
    Screen Shot 2024-06-14 at 10.54.12 PM.png
    609.6 KB · Views: 2
  • Screen Shot 2024-06-14 at 10.54.19 PM.png
    Screen Shot 2024-06-14 at 10.54.19 PM.png
    552.5 KB · Views: 0
  • Screen Shot 2024-06-14 at 10.53.57 PM.png
    Screen Shot 2024-06-14 at 10.53.57 PM.png
    574.8 KB · Views: 2
  • Screen Shot 2024-06-14 at 10.52.12 PM.png
    Screen Shot 2024-06-14 at 10.52.12 PM.png
    463.6 KB · Views: 2
  • Screen Shot 2024-06-14 at 10.48.55 PM.png
    Screen Shot 2024-06-14 at 10.48.55 PM.png
    755.9 KB · Views: 2
  • pve_log2.txt
    37.5 KB · Views: 1
  • journalctl -b.txt
    90.2 KB · Views: 2
Looks like some kind of hardware problem. To check the condition of the disk, memory, power supply, processor cooling.
 
  • Like
Reactions: wingyiulam
The system just crashed; pvestatd restart doesn't work anymore. All vm & lxc are still running without any stats on GUI, In my experience, they will all eventually stop running sooner or later.

Code:
Jun 14 22:28:17 pve pvestatd[2401728]: got timeout
Jun 14 22:28:17 pve kernel: BUG: unable to handle page fault for address: ffff9815bb5b0db0
Jun 14 22:28:17 pve kernel: #PF: supervisor read access in kernel mode
Jun 14 22:28:17 pve kernel: #PF: error_code(0x0000) - not-present page
Jun 14 22:28:17 pve kernel: PGD 5b0001067 P4D 5b0001067 PUD 0
Jun 14 22:28:17 pve kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jun 14 22:28:17 pve kernel: CPU: 2 PID: 2862319 Comm: pvestatd Tainted: P    BU     O       6.8.4-3-pve #1
Jun 14 22:28:17 pve kernel: Hardware name: Default string Default string/Default string, BIOS GF1264NP124LV11R007 11/20/2023
Jun 14 22:28:17 pve kernel: RIP: 0010:anon_vma_interval_tree_remove+0x253/0x300
Jun 14 22:28:17 pve kernel: Code: 43 08 49 8b 7c 24 30 49 8b 4c 24 28 48 85 ff 0f 85 f7 fd ff ff 49 8b 74 24 20 48 89 f2 48 83 e2 fc 48 89 d0 48 83 fe 03 76 67 <4c> 3b 6a 10 74 1f 48 89 4a 08 48 85 c9 75 1f 83 e6 01 74 75 48 85
Jun 14 22:28:17 pve kernel: RSP: 0018:ffffaf19e229bbe8 EFLAGS: 00010292
Jun 14 22:28:17 pve kernel: RAX: ffff9815bb5b0da0 RBX: ffff981daca0d258 RCX: 0000000000000000
Jun 14 22:28:17 pve kernel: RDX: ffff9815bb5b0da0 RSI: ffff9815bb5b0da0 RDI: 0000000000000000
Jun 14 22:28:17 pve kernel: RBP: ffffaf19e229bc00 R08: 0000000000000000 R09: 0000000000000000
Jun 14 22:28:17 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff981db631ba40
Jun 14 22:28:17 pve kernel: R13: ffff981db631ba60 R14: ffff981daca0d208 R15: ffff981db631ba40
Jun 14 22:28:17 pve kernel: FS:  0000000000000000(0000) GS:ffff98219fb00000(0000) knlGS:0000000000000000
Jun 14 22:28:17 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 22:28:17 pve kernel: CR2: ffff9815bb5b0db0 CR3: 00000005af236000 CR4: 0000000000f52ef0
Jun 14 22:28:17 pve kernel: PKRU: 55555554
Jun 14 22:28:17 pve kernel: Call Trace:
Jun 14 22:28:17 pve kernel:  <TASK>
Jun 14 22:28:17 pve kernel:  ? show_regs+0x6d/0x80
Jun 14 22:28:17 pve kernel:  ? __die+0x24/0x80
Jun 14 22:28:17 pve kernel:  ? page_fault_oops+0x176/0x500
Jun 14 22:28:17 pve kernel:  ? anon_vma_interval_tree_remove+0x253/0x300
Jun 14 22:28:17 pve kernel:  ? kernelmode_fixup_or_oops+0xb2/0x140
Jun 14 22:28:17 pve kernel:  ? __bad_area_nosemaphore+0x1a5/0x270
Jun 14 22:28:17 pve kernel:  ? bad_area_nosemaphore+0x16/0x30
Jun 14 22:28:17 pve kernel:  ? do_kern_addr_fault+0x7b/0xa0
Jun 14 22:28:17 pve kernel:  ? exc_page_fault+0x10d/0x1b0
Jun 14 22:28:17 pve kernel:  ? asm_exc_page_fault+0x27/0x30
Jun 14 22:28:17 pve kernel:  ? anon_vma_interval_tree_remove+0x253/0x300
Jun 14 22:28:17 pve kernel:  unlink_anon_vmas+0xb4/0x1c0
Jun 14 22:28:17 pve kernel:  free_pgtables+0x12d/0x1c0
Jun 14 22:28:17 pve kernel:  exit_mmap+0x19b/0x3f0
Jun 14 22:28:17 pve kernel:  __mmput+0x41/0x140
Jun 14 22:28:17 pve kernel:  mmput+0x31/0x40
Jun 14 22:28:17 pve kernel:  do_exit+0x324/0xae0
Jun 14 22:28:17 pve kernel:  do_group_exit+0x35/0x90
Jun 14 22:28:17 pve kernel:  __x64_sys_exit_group+0x18/0x20
Jun 14 22:28:17 pve kernel:  x64_sys_call+0x1822/0x24b0
Jun 14 22:28:17 pve kernel:  do_syscall_64+0x81/0x170
Jun 14 22:28:17 pve kernel:  ? do_user_addr_fault+0x343/0x6b0
Jun 14 22:28:17 pve kernel:  ? irqentry_exit_to_user_mode+0x7b/0x260
Jun 14 22:28:17 pve kernel:  ? irqentry_exit+0x43/0x50
Jun 14 22:28:17 pve kernel:  ? exc_page_fault+0x94/0x1b0
Jun 14 22:28:17 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
Jun 14 22:28:17 pve kernel: RIP: 0033:0x7f1b586c3349
Jun 14 22:28:17 pve kernel: Code: Unable to access opcode bytes at 0x7f1b586c331f.
Jun 14 22:28:17 pve kernel: RSP: 002b:00007ffd7bab3ce8 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Jun 14 22:28:17 pve kernel: RAX: ffffffffffffffda RBX: 00005cdac14382a0 RCX: 00007f1b586c3349
Jun 14 22:28:17 pve kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Jun 14 22:28:17 pve kernel: RBP: 0000000000000001 R08: ffffffffffffff78 R09: 0000000000000000
Jun 14 22:28:17 pve kernel: R10: 00007f1b586011e0 R11: 0000000000000206 R12: 0000000000000004
Jun 14 22:28:17 pve kernel: R13: 0000000000000000 R14: 00005cdac6fe7218 R15: 00005cdac1991a08
Jun 14 22:28:17 pve kernel:  </TASK>

Proxmox does not ship debug kernels because ... they are "too big" ... so start with memtest (swap out RAM modules, do one at a time, check for compatibility charts, to rule that one out). Is this brand new setup or did the machine run for ages before nonstop on some other system/kernel? Try running on different kernel.
 
  • Like
Reactions: wingyiulam
Code:
Jun 14 22:28:17 pve pvestatd[2401728]: got timeout
Jun 14 22:28:17 pve kernel: BUG: unable to handle page fault for address: ffff9815bb5b0db0
Jun 14 22:28:17 pve kernel: #PF: supervisor read access in kernel mode
Jun 14 22:28:17 pve kernel: #PF: error_code(0x0000) - not-present page
Jun 14 22:28:17 pve kernel: PGD 5b0001067 P4D 5b0001067 PUD 0
Jun 14 22:28:17 pve kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Jun 14 22:28:17 pve kernel: CPU: 2 PID: 2862319 Comm: pvestatd Tainted: P    BU     O       6.8.4-3-pve #1
Jun 14 22:28:17 pve kernel: Hardware name: Default string Default string/Default string, BIOS GF1264NP124LV11R007 11/20/2023
Jun 14 22:28:17 pve kernel: RIP: 0010:anon_vma_interval_tree_remove+0x253/0x300
Jun 14 22:28:17 pve kernel: Code: 43 08 49 8b 7c 24 30 49 8b 4c 24 28 48 85 ff 0f 85 f7 fd ff ff 49 8b 74 24 20 48 89 f2 48 83 e2 fc 48 89 d0 48 83 fe 03 76 67 <4c> 3b 6a 10 74 1f 48 89 4a 08 48 85 c9 75 1f 83 e6 01 74 75 48 85
Jun 14 22:28:17 pve kernel: RSP: 0018:ffffaf19e229bbe8 EFLAGS: 00010292
Jun 14 22:28:17 pve kernel: RAX: ffff9815bb5b0da0 RBX: ffff981daca0d258 RCX: 0000000000000000
Jun 14 22:28:17 pve kernel: RDX: ffff9815bb5b0da0 RSI: ffff9815bb5b0da0 RDI: 0000000000000000
Jun 14 22:28:17 pve kernel: RBP: ffffaf19e229bc00 R08: 0000000000000000 R09: 0000000000000000
Jun 14 22:28:17 pve kernel: R10: 0000000000000000 R11: 0000000000000000 R12: ffff981db631ba40
Jun 14 22:28:17 pve kernel: R13: ffff981db631ba60 R14: ffff981daca0d208 R15: ffff981db631ba40
Jun 14 22:28:17 pve kernel: FS:  0000000000000000(0000) GS:ffff98219fb00000(0000) knlGS:0000000000000000
Jun 14 22:28:17 pve kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 14 22:28:17 pve kernel: CR2: ffff9815bb5b0db0 CR3: 00000005af236000 CR4: 0000000000f52ef0
Jun 14 22:28:17 pve kernel: PKRU: 55555554
Jun 14 22:28:17 pve kernel: Call Trace:
Jun 14 22:28:17 pve kernel:  <TASK>
Jun 14 22:28:17 pve kernel:  ? show_regs+0x6d/0x80
Jun 14 22:28:17 pve kernel:  ? __die+0x24/0x80
Jun 14 22:28:17 pve kernel:  ? page_fault_oops+0x176/0x500
Jun 14 22:28:17 pve kernel:  ? anon_vma_interval_tree_remove+0x253/0x300
Jun 14 22:28:17 pve kernel:  ? kernelmode_fixup_or_oops+0xb2/0x140
Jun 14 22:28:17 pve kernel:  ? __bad_area_nosemaphore+0x1a5/0x270
Jun 14 22:28:17 pve kernel:  ? bad_area_nosemaphore+0x16/0x30
Jun 14 22:28:17 pve kernel:  ? do_kern_addr_fault+0x7b/0xa0
Jun 14 22:28:17 pve kernel:  ? exc_page_fault+0x10d/0x1b0
Jun 14 22:28:17 pve kernel:  ? asm_exc_page_fault+0x27/0x30
Jun 14 22:28:17 pve kernel:  ? anon_vma_interval_tree_remove+0x253/0x300
Jun 14 22:28:17 pve kernel:  unlink_anon_vmas+0xb4/0x1c0
Jun 14 22:28:17 pve kernel:  free_pgtables+0x12d/0x1c0
Jun 14 22:28:17 pve kernel:  exit_mmap+0x19b/0x3f0
Jun 14 22:28:17 pve kernel:  __mmput+0x41/0x140
Jun 14 22:28:17 pve kernel:  mmput+0x31/0x40
Jun 14 22:28:17 pve kernel:  do_exit+0x324/0xae0
Jun 14 22:28:17 pve kernel:  do_group_exit+0x35/0x90
Jun 14 22:28:17 pve kernel:  __x64_sys_exit_group+0x18/0x20
Jun 14 22:28:17 pve kernel:  x64_sys_call+0x1822/0x24b0
Jun 14 22:28:17 pve kernel:  do_syscall_64+0x81/0x170
Jun 14 22:28:17 pve kernel:  ? do_user_addr_fault+0x343/0x6b0
Jun 14 22:28:17 pve kernel:  ? irqentry_exit_to_user_mode+0x7b/0x260
Jun 14 22:28:17 pve kernel:  ? irqentry_exit+0x43/0x50
Jun 14 22:28:17 pve kernel:  ? exc_page_fault+0x94/0x1b0
Jun 14 22:28:17 pve kernel:  entry_SYSCALL_64_after_hwframe+0x78/0x80
Jun 14 22:28:17 pve kernel: RIP: 0033:0x7f1b586c3349
Jun 14 22:28:17 pve kernel: Code: Unable to access opcode bytes at 0x7f1b586c331f.
Jun 14 22:28:17 pve kernel: RSP: 002b:00007ffd7bab3ce8 EFLAGS: 00000206 ORIG_RAX: 00000000000000e7
Jun 14 22:28:17 pve kernel: RAX: ffffffffffffffda RBX: 00005cdac14382a0 RCX: 00007f1b586c3349
Jun 14 22:28:17 pve kernel: RDX: 000000000000003c RSI: 00000000000000e7 RDI: 0000000000000000
Jun 14 22:28:17 pve kernel: RBP: 0000000000000001 R08: ffffffffffffff78 R09: 0000000000000000
Jun 14 22:28:17 pve kernel: R10: 00007f1b586011e0 R11: 0000000000000206 R12: 0000000000000004
Jun 14 22:28:17 pve kernel: R13: 0000000000000000 R14: 00005cdac6fe7218 R15: 00005cdac1991a08
Jun 14 22:28:17 pve kernel:  </TASK>

Proxmox does not ship debug kernels because ... they are "too big" ... so start with memtest (swap out RAM modules, do one at a time, check for compatibility charts, to rule that one out). Is this brand new setup or did the machine run for ages before nonstop on some other system/kernel? Try running on different kernel.
Thank you guys! It's a new setup splitting the workload from the main server, and the vm/lxc on the new setup still remain the same config as before moving. I haven't tried any different kernel yet, the current kernel is 6.8.4-3. I did some research on Intel N100, 32GB of ram works for some people and some don't. I suspect if the 32GB of ram causes the crash since N100 chip only supports 16GB of ram, I already ordered a 16GB of ram and give it a shot.
 
Hi all, I am new to Proxmox. I am so confused if my mini pc is bad or the PVE itself, my pc needs to hard restart every 2-3 days due to a crash/ freeze. i already turned off c-state & turbo boost in bio for increasing stability, but no luck. here is a recent log, all green icons on PVE turns into question mark, no crash yet. I restarted the pvestated, gui looks normal now; however, I have no clue when is it going to be freezes again.

mini pc
N100 (c-state & turbo boost disabled)
4 x 2.5 intel I225-V
32gb Ram DDR5
Samsung 970 Pro NVME
1 Disk of 500gb NVME wZFS (options zfs zfs_arc_max=4294967296, options zfs zfs_arc_min=2147483648)
No overheating issue
No subscription
No cluster/ Ceph
Backup through PBS 3.2-3 on an external server (PBS is stable no issue)
running 4 vm & lxc daily, pfsesne, cloudflared, bitwarden & vsftp server.
CPU idle at 12% & up to 45%, Memory usage 44%. Storage usage 7%

here is my recent log

Thanks,
Wing
i am having random kernel freezes too on the same version. I was told to downgrade the kernel to 6.5, but it still happens. Not a hardware issue for sure, it works like charm on esxi, i reboot once a year... I might go back
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!