Hello!
On a new PVE 8.3 testing cluster:
- 4 nodes
- Ceph
* 4 nodes with OSD
* 3 nodes with managers & monitors
* Dedicated corosync VLANs on management bond (2*10GbE)
* Separate bonds for Ceph Access, Ceph Replica and VM traffic
I mistakenly rebooted a node without moving the VMs beforehand. From what I can remember, a similar operation in the past would trigger. Batch migration of VMs, but in this case a batch shutdown was issued.
VMs in the other 3 nodes run happily.
The thing is, I was not able to restart the VMs that were running on the mentioned node until it came back to life (firmware upgrade was in progress). I had the VMs out of service for 10 minutes.
Is this expected?, having 3 out of 4 nodes I would expect to have quorum to overrule any consistency doubt.
This is currently just a testing environment, but for production this would be a deal breaker.
On a new PVE 8.3 testing cluster:
- 4 nodes
- Ceph
* 4 nodes with OSD
* 3 nodes with managers & monitors
* Dedicated corosync VLANs on management bond (2*10GbE)
* Separate bonds for Ceph Access, Ceph Replica and VM traffic
I mistakenly rebooted a node without moving the VMs beforehand. From what I can remember, a similar operation in the past would trigger. Batch migration of VMs, but in this case a batch shutdown was issued.
VMs in the other 3 nodes run happily.
The thing is, I was not able to restart the VMs that were running on the mentioned node until it came back to life (firmware upgrade was in progress). I had the VMs out of service for 10 minutes.
Is this expected?, having 3 out of 4 nodes I would expect to have quorum to overrule any consistency doubt.
This is currently just a testing environment, but for production this would be a deal breaker.
Last edited: