PVE 6.4 - random reboots

Fred Saunier

Well-Known Member
Aug 24, 2017
55
2
48
Brussels, BE
Hello all,

We are experiencing what seem to be random reboots on our 2-node cluster.

Node 1 has 144 GB of RAM + 128 GB swap, and is hosting a single Windows VM with 128 GB of RAM (no ballooning).
Node 2 has 256 GB of RAM + 256 GB swap, and is hosting a single Windows VM with 230 GB of RAM (no ballooning). This VM replicates to Node 1 once a week.

On Node 2's last reboot, syslog shows :
Code:
systemd[1]: Started Proxmox VE replication runner.

This reboot happened on a Wed. at 10:00, when replication takes place on Fridays at 19:00.

Any help or pointers to troubleshoot this would be welcome, as adding swap did not seem to solve the reboots.

Thanks
Fred
 
did you enable ha?

also, since you mentioned replication, i guess you use zfs on those hosts?
maybe the hosts run out of memory and swapping is not fast enough? zfs by default can use up to 50% of the hosts memory.

here you can find info on how to limit zfs memory usage: https://pve.proxmox.com/wiki/ZFS_on_Linux#sysadmin_zfs_limit_memory_usage

otherwise, the log is maybe helpful or (if that does not show anything), setting up a remote syslog might catch some errors a local log does not

at last, i'd check the hardware, crashes are often caused by faulty/undersized hardware (e.g. faulty ram, bad psu, etc.)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!