Proxmox crashed constantly pve/data corrupted

Vin

New Member
Mar 6, 2023
17
3
3
Hello everybody,

Proxmox somehow constantly crashes, I have setup everything multiple times.

After the crash often times the pve is corrupted, and even though firstly It can be repaired with lvconvert --repair pve/data, eventually it turns into a persistant corruption and I have to setup everything all over again.

Can anybody make sense out of the logs, or tell me how to troubleshoot below issues and the crashes?

Thank you in advance

dmesg
https://pastebin.com/HnP5SF3v

journal
https://pastebin.com/ghGvbLjU

root@Snake:~# vgchange -a y pve
Check of pool pve/data failed (status:1). Manual repair required!
2 logical volume(s) in volume group "pve" now active
root@Snake:~# lvchange -a y pve/data
Check of pool pve/data failed (status:1). Manual repair required!
root@Snake:~# lvconvert --repair pve/data
Child 4234 exited abnormally
Repair of thin metadata volume of thin pool pve/data failed (status:-1). Manual repair required!
 
Hello everybody,

Proxmox somehow constantly crashes, I have setup everything multiple times.

After the crash often times the pve is corrupted, and even though firstly It can be repaired with lvconvert --repair pve/data, eventually it turns into a persistant corruption and I have to setup everything all over again.

Can anybody make sense out of the logs, or tell me how to troubleshoot below issues and the crashes?

Thank you in advance

dmesg
https://pastebin.com/HnP5SF3v

journal
https://pastebin.com/ghGvbLjU

root@Snake:~# vgchange -a y pve
Check of pool pve/data failed (status:1). Manual repair required!
2 logical volume(s) in volume group "pve" now active
root@Snake:~# lvchange -a y pve/data
Check of pool pve/data failed (status:1). Manual repair required!
root@Snake:~# lvconvert --repair pve/data
Child 4234 exited abnormally
Repair of thin metadata volume of thin pool pve/data failed (status:-1). Manual repair required!
Most probable a hardware error (disk? controller?). AFAIU lvm pve is corrupt after a while. It may help to have a look into lvm's history, shown by archive and backup file:
Code:
cat /etc/lvm/archive/*
cat /etc/lvm/backup/*
 
I reinstalled the entire setup multiple times by now

Also I just changed the NVMe to a brand new one
Only installed Proxmox and ist crashed again, with a corrupted file system

So unfortunately I dont have logfiles anymore.

Is there a way to see the I/O errors in a running system?

Also I apparently do have some corrupted sections in my NAS HDDs, can those lead to the crashes?
I tried to fix them via gparted, but I just ran into errors
 

Attachments

  • 1.png
    1.png
    129.5 KB · Views: 16
  • 2.png
    2.png
    48 KB · Views: 14
  • 3.png
    3.png
    56.9 KB · Views: 17
  • 5.png
    5.png
    725.3 KB · Views: 14
  • 6.png
    6.png
    841.8 KB · Views: 16
I reinstalled the entire setup multiple times by now

Also I just changed the NVMe to a brand new one
Only installed Proxmox and ist crashed again, with a corrupted file system

So unfortunately I dont have logfiles anymore.

Is there a way to see the I/O errors in a running system?

Try to boot via an external live media and investigate the file from previous Proxmox installation then.

Also I apparently do have some corrupted sections in my NAS HDDs, can those lead to the crashes?
I tried to fix them via gparted, but I just ran into errors
NAS is used as data storage only (or?), therefore it cannot cause the problem of Proxmox crash you reported. In order to exclude any inflince from NAS I suggest to run Proxmox for the moment without it (and configure it later as soon as Proxmox is stable).
 
I reinstalled everything from scratch, still crashes

dmesg Proxmox
https://pastebin.com/AkPiDT6j

Regarding the NAS I do passthrough the disks to a Debian VM with OMV installed to it.
In this particular dmesg from Proxmox here, I only passed one SSD to the VM.

I do suspect an I/O problem, due to I/O load, as described years ago there
https://bugzilla.kernel.org/show_bug.cgi?id=199727#c0

I did change all my VM Disks to VirtiIO SCSI single, Cache = Write Back, Discard = 1, IO Thread = 1, Async IO=threads, SSD emulation
 
Its impossible to use Proxmox

I really do think, its about the IO handling of Proxmox itself


as stated above, i already set my HDDs and NVMEs to IO thread and async io = threads
I still get a high IO delay of around 40-50 when I do copy things between HDDs

I do passthrough specific SATA HDDs and NVMEs to the VM without a HBA


Also it crashes constantly the Debian VM where OMV is located

Message from syslogd@debian at Apr 4 12:24:53 ...
kernel:[ 1071.553173] watchdog: BUG: soft lockup - CPU#6 stuck for 26s! [kworker/6:1:119]

Message from syslogd@debian at Apr 4 12:25:21 ...
kernel:[ 1099.552508] watchdog: BUG: soft lockup - CPU#6 stuck for 53s! [kworker/6:1:119]
 
Last edited:
In case that Proxmox is a debian based environment, just try to install a standard debian kernel for your system. You may check then if your system has some trouble to run debian or if Proxmox kernel causes the issue on your side.

I am encoutering a similar issue on our side, when running Proxmox on less CPU on a VMWARE Cluster. On heavy disk activity the system stops and reboot.

https://forum.proxmox.com/threads/proxmox-7-3-host-always-reboots-on-snapshot-via-vmware.125269/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!