Proxmox HA and Ceph

rbeard.js

Member
Aug 11, 2022
32
2
8
Hi there!
Im trying to learn more about Ceph storage so we can use it in an upcoming installation.
We have a database running on windows server that most of the company relies upon. I was looking into getting a 4 blade server and running proxmox ve on 3 of the blades and pbs on the last blade as a slim install to save on colo costs as this server is offsite.

On site where I am, we have a proxmox cluster running on zfs and replication setup on our more critical vms. Replication runs every 15 minutes to catch any changes. If a vm goes down, we only lose 15 minutes of data.

For this new build, the database and server are much more critical. Im liking that ceph copies the data on write and Im not relying on replication.
However, If network or a node goes down, connections still drop off and it takes a few minutes for the VM to come back up and reboot on the next node and I still loose everything that was in ram.

My question is if there is a solution here that makes the HA instant with no drop off? If the node dies, I would like the VM to pick up on the other node instantly so users dont notice anything changed. Im not sure if this is in the capabilities of ceph or if there is another storage system I should be looking at.
I would also like to ask if there is a way to increase the write speeds when doing a migration. I only have 6GB of ram on my test setup and the speed is not amazing. I know ceph has to write to each node on every write but if users are in the system during a migration, that ram information is changing alot by the time the migration completes.

Any and all information is welcome and thank you
 
May want to post your question at /r/ceph since they can answer DB/VM questions.
 
If the node dies, I would like the VM to pick up on the other node instantly so users dont notice anything changed.
This is not possible.

(Without having a malfunction it is possible to live migrate a VM from one node to another - so users don't notice anything.)

A normal application is running in a VM on one node. When this current node dies the KVM process of that VM dies. It is gone. It can not get resurrected on another node in a microsecond - including the current state of all user sessions.

The HA mechanism we have will start a fresh instance of that VM on another node. This VM needs to re-instantiate the services of the now dead node resp. those lost VMs.


Then there is the idea of a FT (Fault Tolerant) service. This would require two (or more) nodes to sync a KVM process every microsecond or so. In this vision a user of the service would actually not notice a died node.

As far as I know the current feature set of KVM / PVE does not include this.

I would be happy if someone could tell us that I am wrong...

Best regards
 
  • Like
Reactions: ucholak
Gotcha so I'll always potentially loose what's in ram at most using ceph. Theoretically all writes are done replicated across my servers so if the node goes down, it will reboot on another node and hopefully only loose what was in ram at that time.

I think I'll post this on /r/ceph too to see if something you mentioned would be a possibility but even this level of fault tolerance is a lot better than what we are doing now on our less critical systems.
 
  • Like
Reactions: UdoB
I think I'll post this on /r/ceph too to see if something you mentioned would be a possibility
My question is if there is a solution here that makes the HA instant with no drop off?
These things are unrelated. the ceph backend provides just the storage. What you ask is not currently available with Proxmox; you would need vmware (vsphere replication.)

The proxmox HA engine does not have hot replica functionality at this time. I'm not sure if its even on the roadmap.
 
Hello,

As explained in https://localhost:8006/pve-docs/chapter-pveceph.html, it is recommended to setup at least 3 monitors in your ceph cluster, if so if a node goes down you enter a degraded state but the cluster is still functional and can recover itself once the node goes back up.

Regarding downtime, if node A goes down it will take 2 minutes for the VMs to be recovered to another node when HA is enabled, thats because it is not possible to tell what happened to node A or if it will come back up quicker than the time required recover the VMs.

If the HA timeout is too much for you, consider setting up HA on the application level. So if one node goes down, the second VM on another node can take over. You can utilize the HA groups with the "restricted" option to make sure that the two VMs are never on the same physical nodes.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!