I'm facing a similar issue where the machine itself shuts down abruptly without leaving any crash messages in the systemd journal logs.
My setup:
Kernel version: Linux 6.8.4-3-pve
Intel Core i3 9100F
MSI Z390A Pro
GTX 3070
56Gb DDR4 RAM (16+16+16+8)
Proxmox installed on a 500Gb Samsung 980 Pro NVMe
5x4Tb WD SATA and 1x256Gb SSD passed through to a VM (used as NAS)
6 VMs in total, no LXCs
Things I've tried:
1. Disabling all power saving related features of the CPU from BIOS. (Didn't work, Proxmox crashed without any error logs within 24hours)
2. Added kernel params
intel_idle.max_cstate=1 libata.force=noncq to
/etc/kernel/cmdline to disable power save for the CPU and ATA (Didn't work, crashed within 24hours)
3. Removed memory and ran the sytem on just a single memory stick. (Didn't work, crash witin 24 hours)
4. Tried with different memory once again. (Didn't work)
5. Replaced NVMe boot drive to a brand new one. (Didn't work)
6. Disabled TPM2.0 from BIOS.
Tried last option 48 hours ago and the system has been up the longest I can remember. I'll report back if it crashes again.
Edit 1:
Crashed after 3 days today. So the TPM2.0 option also did not work, it possibly increased stability but can't say for sure.
7. Added
enable_dc=0 to the boot parameter.
8. Ran
ethtool -K eno1 tso off gso off via command line to fix the hardware unit hang up error for network card.
Also tried following this guide to setup a remote log server
https://pve.proxmox.com/wiki/Kernel_Crash_Trace_Log
The guide simply does not work because netconsole on the latest proxmox does not support bonded/bridged interfaces.
Console error:
Code:
netconsole: network logging stopped on interface eno1 as it is joining a master device
Using the default eth0 or vmbr0 the error is interface not found. To make netconsole work I'll have to install an additional network card which is not feasible for me right now.
Will try to figure out other ways to get kernel crash logs.