what is the point of CEPH, if there is no HA?

jpv31

New Member
Mar 28, 2024
6
0
1
Hello,

I use proxmox regularly and I think I have understood how it works, except what happens if CEPH is installed without HA.

We have a cluster of 3 servers.

CEPH is installed on each of the servers, and we have defined OSD and POOLS, with replication on the 3 servers.

If HA is installed and correctly configured, and for example if node n°1 fails, the virtual machines running on node n°1 will migrate and automatically start on nodes n°2 and n°3.

So far it’s clear!

Now if there is no HA and node n°1 goes down, what will happen?

The VMs from node n°1 will not automatically migrate and start on node n°2 and 3.

In this case what is the point of CEPH, if there is no HA?

Will the VMs which were on node n° 1 will migrate, without starting automatically, on nodes n° 2 and n° 3 ?

Will it nevertheless be possible to start manually on ?

Thank you in advance for your clarifications!
Best regards
 
The VMs from node n°1 will not automatically migrate and start on node n°2 and 3.
No, they won't be restarted automatically, that would be the point of HA. So if you don't use HA for your guests, they won't be restarted automatically.

In this case what is the point of CEPH, if there is no HA?
You still have a distributed file system with redundancy. So you can manually restart the VMs without starting the missing node again. It also allows you do to do maintenance more easily without having to migrate the guest's storage to another node's local storage. Also Ceph provides redundancy if set up properly, so the loss of a node shouldn't result in the loss data.

Will the VMs which were on node n° 1 will migrate, without starting automatically, on nodes n° 2 and n° 3 ?
They won't start automatically. You'd need to migrate them yourself and then restart them. This should be doable since their storage is on CEPH and not local to node 1.

Will it nevertheless be possible to start manually on ?
Yes.
 
hello Sterzy.

Thank you for taking the time to answer such a simple question in such detail !

But there are somethings that I don't quite understand.

I return to my example of node 1 which fails.

I understood that thanks to CEPH the VMs are distributed across the 3 nodes of the cluster.

If my node n° 1 fails, how do I manually start on node n° 2 or node n° 3 the VMs that where on node n°1 ?

In fact, these VMs do not appear directly in the list of VMs of other nodes.

They only appear in the pool, in the "VM Disks" tab.

And i dont' see how it is possible to migrate a VM, if if the node hosting this vm is down.

I don't know if my question is expressed well enough...
Thanks in advance for the help.
 
Last edited:
Hi @jpv31

Not sure about the GUI, but you should be able to move all VMs from a bad pve1 server to a good pve2 server from CLI using a command like that 'mv /etc/pve/nodes/pve1/qemu-server/*.conf /etc/pve/nodes/pve2/qemu-server/'. Then you should be able to start them..
The etc/pve is a replicated filesystem available on clustered Proxmox servers as long as there is a cluster quorum (e.g. 2 out of 3 nodes are up).
 
hello mfed.

I tested this morning and it is indeed possible to migrate VMs from a node that is out of order to a node that is working, using the command line.

The command is indeed "mv /etc/pve/nodes/pve1/qemu-server/*.conf /etc/pve/nodes/pve2/qemu-server/".

Thanks a lot!
 
To add to this, just think that Ceph adds data protection (i.e. reduces the chances of data loss), while HA adds VM uptime protection (i.e. reduces the downtime of the VMs on some events like host failure). They are solutions to different problems and together increase the availability of the whole system.

The HA service essentially does that mv of the config files internally when it moves a VM from a failed node to a running one.