Proxmox VE Crashes

railcar

New Member
Feb 26, 2025
7
0
1
Hey, i have a problem with a proxmox VE installation of mine.

pve-manager/8.3.2
8 x Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
Linux 6.8.12-5-pve (2024-12-03T10:26Z)
32GB RAM

after a short while ( sometimes hours only, other times a day or max 2 ) they system completely locks up, if I use a screen and keyboard I cant type anything in the console even. its not dead, I can see the console, but no inputs can be given or anything. SSH dies, the webUI for proxmox dies and cant be reached, all LXC, all VM just dies.

where should I look in log files to see what causes this ? its a single machine with a modest installation, only a couple of LXCs and 1 VM
 
Hello railcar! Could you please post the output of journalctl --since <TIME> > journal.txt with a time at least 30 minutes before the issue occurred. Then please attach the file here.

Just wondering, did the server work without issues until now? Did something change before the issues started to happen?
 
Sure, it is here. :)

I am not entirely sure when it started, I thought it might be when I added a LXC so I had 3 running instead of only 2, but then I went back for a while, to only have 2 LXC and the machine still froze.
 

Attachments

The journal reports some Ext4-related warnings:
Feb 26 09:29:55 pve kernel: EXT4-fs warning (device dm-11): ext4_multi_mount_protect:328: MMP interval 42 higher than expected, please wait.
Also, at the end:
Feb 26 09:59:45 pve smartd[748]: Device: /dev/sdb [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 111 to 113

It seems that there is some issue related to /dev/sdb. In general, this can happen due to different reasons:
  • broken filesystem (e.g. after a power loss)
  • faulty disk
  • faulty SATA cables
Just to be sure, could you please post the output of smartctl -a /dev/sda and smartctl -a /dev/sdb?
 
The HDD at /dev/sdb has a high number of errors, so at the moment it seems like your HDD is failing. You might want to try running smartctl -t short /dev/sdb just to be sure, then check the results using smartctl -a /dev/sdb (or smartctl -l selftest /dev/sdb).

Either way, you should prepare backups before it's too late. In case you want to replace the failing HDD, keep in mind that we recommend datacenter/enterprise SSDs with power-loss protection.
 
The harddrive is an old drive, but should it cause the entire system to halt ? i mean its "just" a second disk for storing the LXCs onto, not Proxmox itself.
 
Thanks for the output! You are right, the HDD should not affect the whole system. Does it still freeze when you don't start the LXC containers?

Also, could you please do the same tests above on /dev/sda?
 
I haven't encountered that, but even the disk dies and thus the LXC stopped working, I would assume the WebGUI and the proxmox box would still function, so I don't see why this have a correlation ?

i've attached the output after scan of /dev/sda
 

Attachments

Alright, then at least we know that the RAM is probably not the issue.

Just to rule out other issues (as far as possible), could you please:
  1. Update PVE to the latest version and reboot. Your kernel is not fully up to date.
  2. Install the latest BIOS update for your system.
  3. Enable the non-free-firmware Debian repository and install the CPU-vendor specific microcode package. Make sure to restart afterwards.
  4. Check the temperature of the CPU.
  5. Perform a CPU stress test and check the CPU temperature.
  6. If nothing helped, try the newer opt-in kernel 6.11.