Help to diagnose random crash

I had similar issues with Proxmox 8.x. I backup to a synology nas via NFS and a host completely froze up during backup.

I also had migration issues, both live and cold would randomly crash.
Backup restores displayed the same symptoms as migrations.

However:
I had 2 clusters: one using LVM and the other using ZFS. The ZFS cluster had no issues with backups and migrations. The LVM cluster was the one which would freeze during backups and migrations.

BTW these did not happen with the Proxmox 7.x series, even with the testing kernel and LVM.
A server locked up in my ZFS cluster recently. It was over the weekend when I run backups to an NFS share.

The oddity is that 3 different servers in 2 different clusters (zfs & Lvm) have been affected.
The only commonality I can find is running backups to an NFS share on a Synology.
 
I have the same problem. both with 7.4-17 and with 8.0.3
installed on a terramaster NAS.

I noticed that everyone hangs while the log writes the same operations.

try changing the gateway to the network card, removing the ability to navigate from proxmox.

tell me if it stops crashing.



Oct 25 18:03:23 pmox01 systemd[1]: systemd-update-utmp-runlevel.service: Succeeded.
Oct 25 18:03:23 pmox01 systemd[1]: Finished Update UTMP about System Runlevel Changes.
Oct 25 18:03:23 pmox01 systemd[1]: Startup finished in 34.495s (firmware) + 5.661s (loader) + 3.335s (kernel) + 9.892s (userspace) = 53.385s.
Oct 25 18:03:24 pmox01 chronyd[840]: Selected source 212.45.144.206 (2.debian.pool.ntp.org)
Oct 25 18:03:24 pmox01 chronyd[840]: System clock TAI offset set to 37 seconds
Oct 25 18:03:25 pmox01 chronyd[840]: Selected source 80.211.137.82 (2.debian.pool.ntp.org)
Oct 25 18:03:44 pmox01 systemd[1]: systemd-fsckd.service: Succeeded.
Oct 25 18:03:48 pmox01 pvedaemon[1060]: <root@pam> successful auth for user 'root@pam'
Oct 25 18:06:41 pmox01 chronyd[840]: Selected source 85.199.214.99 (2.debian.pool.ntp.org)
Oct 25 18:17:01 pmox01 CRON[3004]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Oct 25 18:17:01 pmox01 CRON[3005]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 25 18:17:01 pmox01 CRON[3004]: pam_unix(cron:session): session closed for user root
Oct 25 18:18:41 pmox01 systemd[1]: Starting Cleanup of Temporary Directories...
Oct 25 18:18:41 pmox01 systemd[1]: systemd-tmpfiles-clean.service: Succeeded.

Oct 25 18:18:41 pmox01 systemd[1]: Finished Cleanup of Temporary Directories.
-- Reboot --
Oct 26 14:10:19 pmox01 kernel: Linux version 5.15.126-1-pve (build@proxmox) (gcc (Debian 10.2.1-6) 10.2.1 20210110, GNU ld (GNU Binutils for Debian) 2.35.2) #1 SMP PVE 5.15.126-1 (2023-10-03T17:24Z) ()
Oct 26 14:10:19 pmox01 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.126-1-pve root=/dev/mapper/pve-root ro quiet
Oct 26 14:10:19 pmox01 kernel: KERNEL supported cpus:
Oct 26 14:10:19 pmox01 kernel: Intel GenuineIntel
Oct 26 14:10:19 pmox01 kernel: AMD AuthenticAMD
Oct 26 14:10:19 pmox01 kernel: Hygon HygonGenuine
Oct 26 14:10:19 pmox01 kernel: Centaur CentaurHauls
Oct 26 14:10:19 pmox01 kernel: zhaoxin Shanghai
Oct 26 14:10:19 pmox01 kernel: x86/split lock detection: #AC: crashing the kernel on kernel split_locks and warning on user-space split_locks
Oct 26 14:10:19 pmox01 kernel: BIOS-provided physical RAM map:
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x0000000000100000-0x0000000076042fff] usable
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x0000000076043000-0x0000000078542fff] reserved
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x0000000078543000-0x00000000787c2fff] ACPI data
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x00000000787c3000-0x00000000788c2fff] ACPI NVS
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x00000000788c3000-0x0000000078efefff] reserved
Oct 26 14:10:19 pmox01 kernel: BIOS-e820: [mem 0x0000000078eff000-0x0000000078ffefff] type 20
 
I wanted to know if anyone present had solved the problem permanently.

if so how???

crc-error-79 ??? TOLF ??
 
I wanted to know if anyone present had solved the problem permanently.

if so how???

crc-error-79 ??? TOLF ??
Yes I did.
I think that it were the hardware I was using because since I moved to an used xeon e3 1240 everything works fine, my uptime is now 55 days since the last reboot after an update.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!