Proxmox/Ceph cluster failed; nodes running, but VMs offline.

Jason King

New Member
Oct 30, 2016
1
0
1
49
I have a 3-node Proxmox/Ceph cluster at OVH. All systems have been operating over the past month, until this morning when I found that all the VMs on this cluster had gone down. HA VMs were down and the restart process seemed locked up. With only minor intervention, i.e. logging in to web gui and stopping some of the restart tasks and then restarting them all VMs came back online, but were down for 5 hours. Now I need to find out what happened, but I'm not sure which logs to review. I know this isn't much to go on, but if someone could point to any logs I should review to find the cause of the failure, I'd greatly appreciate the assistance.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!