Proxmox Crashing

yunmkfed

Member
Sep 2, 2023
48
6
8
www.alanbonnici.com
Hi,

Proxmox is crashing taking down all the VMs / Containers. I am not at home and I asked for a screen shot of the console.

Proxmox has been stable for months. The issues started soon after I upgraded from VirtIO 0.1.248 to 0.1.262. I was getting an x86/split lock detection. I posted on this at https://forum.proxmox.com/threads/x...plit-lock-trap-at-address.153568/#post-698924.

Today I am on 0.1.262. To work around the split lock detection system freeze I disabled split lock detection. Yet as the screen photo shows this did not work.

I am willing to do a fresh install of Proxmox if this helps stabilise the system.

Help really appreciated.



WhatsApp Image 2024-09-09 at 11.12.45_8324b886.jpg
 
Should I scrap the existing Proxmox installation and do a fresh install and restore the backed up VMs and containers?

If this instability is frequent than I might prove to be unsustainable for me.
 
Don't think VirtIO would make the whole host to crash... To me this seems more like a hardware issue with the storage or memory, given that the screenshot shows that ip6tables-save is the binary that segfaulted, which is not directly related to PVE/QEMU/LXC.

Your freezes are completely unrelated to split-lock detection, as it logs programs that make misaligned memory accesses and makes them sleep for 10ms, making then run slower instead of affecting the performance of the whole host.
 
Is the error the same every time? In the other post, you say the error is unrelated to VirtIO drivers. Split lock is an indication of your VM doing some weird stuff. I have seen the issue if you are passing through certain hardware to a VM as well, but generally it is related to very buggy software/OS or faulty hardware, everything from bad memory, bad CPU to bad power supply. Based on the error (related to memory mapping), I would start with RAM.
 
Last edited:
Don't think VirtIO would make the whole host to crash... To me this seems more like a hardware issue with the storage or memory, given that the screenshot shows that ip6tables-save is the binary that segfaulted, which is not directly related to PVE/QEMU/LXC.

Your freezes are completely unrelated to split-lock detection, as it logs programs that make misaligned memory accesses and makes them sleep for 10ms, making then run slower instead of affecting the performance of the whole host.
Thanks for your input. Proxmox was very stable for a very long time.

I did have the split-lock error and it was bringing down the entire system. Below is a screen shot of the split-memory crash. It was bringing down the entire system. If I understand you correctly split-lock should not impact Proxmox system.

Would you suggest I perform memory and secondary storage tests to verify whether a component has failed? Are there any tools you suggest?

Thanks
.IMG_20240828_211430.png
 
Those split-lock events happened to happen around the same time something else crashed in the host but the are not the cause for the host to fail.
I would definitely start by using Proxmox ISO and running memtest, then stress the CPU [1]. Check thermals, check for clogged fans/ducts, etc.

Once hardware looks fine, I would take a look at the running versions in that host and maybe update. We can't discard the fact that maybe you are running a QEMU/KVM version with some bug that for some reason didn't showed up until now (maybe to apps running in the VMs that got updated, who knows).

[1] https://www.tecmint.com/linux-cpu-load-stress-test-with-stress-ng-tool/
 
  • Like
Reactions: yunmkfed
As suggested I performed a PassMark Test and no errors were reported. Since I don't really know how to test a PSU I decided to replace it. In the process I updated the BIOS of the motherboard.

When I booted back into the console I did an apt update. Below is the photo of the console.

IMG_20240917_112157.png

I don't know whether the crash has been resolved because it sometimes took 6 days for Proxmox to stop working.

Are there any other tests I could perform?

What baffles me is how VMs that were operationally stable for months suddenly start giving the split lock error. Should a split lock error in a Windows VM bring down the entire system?

Thanks
 
I would like to share that the homelab is stable after replacing the PSU. I didn't post before to make ***sure*** that all is OK and exactly 1 month on I feel comfortable doing so.

One question, should I reverse the split-lock changes? If yes what is the default and how to I get to it?

Thanks
 
  • Like
Reactions: Johannes S

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!