Proxmox Freezing (new instalation) 9.2.2 - Intel 12700T

Dec 1, 2023
2
1
8
Good morning, good afternoon, or good evening.

I would appreciate your help in identifying the cause of a problem I'm facing, and so far I have no other options in mind.

I migrated my Proxmox to a Lenovo M80q mini PC, Intel i7 12700T, originally with 16GB of RAM, but upgraded to 32GB. I use it for home infrastructure, Home Assistant, Frigate (12 cameras and Coral), and other small LXCs that don't require much hardware. Currently on Proxmox version 9.2.2.

In the first few days after the migration, it presented problems due to network interface incompatibility, which is already documented and I have already corrected it with the help of this thread.

But it's not over yet. I started experiencing complete freezes, and worse, there's no log to help pinpoint the cause, as you can see in this log fragment below. Note that nothing is recorded from the time of the freeze until the reboot.


May 29 01:00:01 pve CRON[1846905]: pam_unix(cron:session): session closed for user root
May 29 01:05:01 pve CRON[1873732]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 29 01:05:01 pve CRON[1873735]: (root) CMD (/root/ha_cpu_temp_mqtt.sh)
May 29 01:05:01 pve CRON[1873732]: pam_unix(cron:session): session closed for user root
May 29 01:10:01 pve CRON[1900930]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 29 01:10:01 pve CRON[1900933]: (root) CMD (/root/ha_cpu_temp_mqtt.sh)
May 29 01:10:01 pve CRON[1900930]: pam_unix(cron:session): session closed for user root
May 29 01:15:01 pve CRON[1927539]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 29 01:15:01 pve CRON[1927542]: (root) CMD (/root/ha_cpu_temp_mqtt.sh)
May 29 01:15:01 pve CRON[1927539]: pam_unix(cron:session): session closed for user root
May 29 01:15:37 pve smartd[966]: Device: /dev/nvme0, SMART/Health value: Percentage Used changed from 5 to 6
May 29 01:17:01 pve CRON[1938479]: pam_unix(cron:session): session opened for user root(uid=0) by root(uid=0)
May 29 01:17:01 pve CRON[1938481]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
May 29 01:17:01 pve CRON[1938479]: pam_unix(cron:session): session closed for user root
-- Reboot --
May 29 05:38:02 pve kernel: Linux version 7.0.2-6-pve (build@proxmox) (gcc (Debian 14.2.0-19) 14.2.0, GNU ld (GNU Binutils for Debian) 2.44) #1 SMP PREEMPT_DYNAMIC PMX 7.0.2-6 (2026-05-20T08:55Z) ()
May 29 05:38:02 pve kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-7.0.2-6-pve root=/dev/mapper/pve-root ro quiet
May 29 05:38:02 pve kernel: KERNEL supported cpus:
May 29 05:38:02 pve kernel: Intel GenuineIntel
May 29 05:38:02 pve kernel: AMD AuthenticAMD
May 29 05:38:02 pve kernel: Hygon HygonGenuine
May 29 05:38:02 pve kernel: Centaur CentaurHauls
May 29 05:38:02 pve kernel: zhaoxin Shanghai
May 29 05:38:02 pve kernel: x86/tme: not enabled by BIOS
May 29 05:38:02 pve kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
May 29 05:38:02 pve kernel: BIOS-provided physical RAM map:


I made step-by-step changes to try and find a solution: I switched the Performance thermal profile to Balanced and changed the CSTATE to C1 and even disabled it to ensure that no power fluctuations could interfere with the availability of the CPU or PCIe.

Any advice would be helpful, and I would be immensely grateful for it.
 
Did you also recently do this RAM upgrade? If so, it might be good to run a memtest.
It's too early to celebrate, but it seems one of the RAM sticks failed the memtest in both combined and isolated tests. I've replaced it with a new stick and will monitor the situation.

Thank you for prompting a test I was reluctant to do because, theoretically, it didn't make sense due to the lack of OOM logs, for example.
 
  • Like
Reactions: daanw
Hi,

Thank you for prompting a test I was reluctant to do because, theoretically, it didn't make sense due to the lack of OOM logs, for example.
Well, bad RAM generally produces crash / reboot (application and/or whole unit) / segfault.
An OOM is more when there is not enough RAM and eventually can happen if you use a server with ECC and the server detects the faulty stick and removes it (never see this symptom on consumer hardware).

Best regards,