Pve cluster with ceph - random VMs reboots with node reboot

Mar 27, 2021
102
15
23
43
Hi community,

Recently we started observing weird behavior as follows:
- VMs are migrated out of of the cluster node (1/7)
- norecover and norebalace OSD flags are set
- The node (pve12) is shut down for HW maintenance (ram and battery replacement)
- Random number of VMs are rebooted on another node pve11 (HA is starting them)

1658338208191.png

Looking for any suggestions what may be causing this.
The latest changes we made were in the ceph config, disabling debug logs and osd_memory_target=64g.
Current ceph config is attached.

I cannot see anything in the cep logs nor in the system logs of the node.
Well I see that this happening but no explanation why. Also attached.

Looking for comment.

Many thanks,
hepo
 

Attachments

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!