PVE goes offline without any notice in logfiles, dmesg, even without kdump

Exzellius

New Member
Nov 1, 2022
9
0
1
Hi guys,

I could not find any guidlines for opening posts like that, so here we go:
The environment:
PVE Host hosted on hetzner root server (AX41)
PVE version: 7.3-6
CPU: 12 x AMD Ryzen 5 3600 6-Core Processor (1 Socket)
RAM: 64 GB
Storage:
2x1TB HDD in RAID1 as storage for VMs and containers and for OS
a little hetzner storage box (cost free with the server) via SMB as ISO storage
PBS on a different hoster, around 360GB
one public IP, which is passed through to pfsense VM, which provides NAT to VMs and a VPN for management
7 containers
3 VMs

Little special config:
1 ramdisk configured with the following line in /etc/fstab for the little VMs/cts that need extra speed (I know it is not persistent):
tmpfs /mnt/ramdisk tmpfs defaults,size=10g 0 0

The problem:
The server goes offline ... without any notice in any logfile, dmesg doesnt show anything (as it gets reset on reboot).
Also the server doesn't start up again automatically (could just be a hetzner thing, I don't know unfortunately) and I need to push the power button to get it running again.
I even configured kdump (and tested it successfully with "echo c >/proc/sysrq-trigger") and it doesn't write a dump when the problem happens.

The problem appeared (times from last entry in journalctl):
30th of January 03:21 AM -> I thought it was a once in a lifetime thing and didn't troubleshoot much, got it back up and that was it
26th of March 05:04 AM -> it crashed again, I investigated, found nothing at all, as I configured the ramdisk a week ago, I tested some RAM (online, no access to console to run memtest unfortunately)
27th of March 00:42 AM -> well it was the second time in a row, I investigated, found nothing and configured kdump and tested it successfully
28th of March 01:53 AM -> damn, again ... I checked for crash dumps and found nothing, not even the copied dmesg file from the test, I am clueless

Any ideas what I could check next? I have no clue where to go next, so I thought why not get help from the community.
If you need any more info, just let me know, I am grateful for everything you can provide.

Have a nice day and thanks ahead.

Best regards,
Ex
 
I also opened a support ticket with hetzner for them to check the hardware, seems weird that no core dump gets written, stinks of hardware issue.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!