PVE randomly crashes

garnoux

Member
Apr 7, 2023
25
0
6
France
garnx.fr
Hello, I have a problem with PVE installed on an OVH dedicated server.

On the first, since May 15th and every 10/15 days, I've been experiencing totally unexpected system crashes. The host suddenly stops responding, SSH and KVM access are interrupted. The server doesn't reboot, and I have to do it by IPMI in a brutal way.


The system is installed in the following version:
pve-manager/8.2.2/9355359cd7afbae4 (running kernel: 6.8.4-3-pve)


Attached are the three journalctl dumps during the crashes.

Should we include the possibility of a hardware problem?
I remain at your disposal for further information.
 

Attachments

Last edited:
Ouch. Looking over the latest one of those crash dumps (the 2024.06.10 one), there are some errors about it not being able to contact something on port 8007.

Doing some searching online, that's probably related to a Proxmox Backup Server installation and might not be relevant.

As a first thought, it might be a good idea to drop back to the slightly older 6.5.x series of kernels. There have been a lot of weird problems being reported with the 6.8x kernels (hopefully they get fixed in coming weeks).

Switching to the older kernel isn't hard. The steps for it are listed here under the "Kernel 6.8" heading:

https://pve.proxmox.com/wiki/Roadmap#Known_Issues_&_Breaking_Changes