Proxmox VE with 6 nodes and CEPH with 2 DC

pveadventure

New Member
Jul 1, 2025
9
1
3
Hi there.
I need some advise about this scenario, using 6 Dell servers with Proxmox and Ceph.
3 nodes will be locate in DC1 and 3 nodes on the DC2.
This 2 DC are 10KM separate from each other, and there is a MPLS link between the two DCs.
The ping reponse is about 1ms.
So I wonder if ceph will works properly in this scenario.
Any special configuration is needed?
I read something about stretch ceph os something like that. Is this needed?

Thanks for any help.

Best regards.
 
Hi...

How fast is the MPLS link? 10 Gbps?
The link is a 100G MPLS.

I wonder about this qdevice and ceph witness. Is it a VM with Proxmox on it, setted up link a regular PVE and Ceph server? (Of course with no OSD!)
One vm in each side?

Thanks
 
Last edited:
Keep in mind 3 nodes at each site is the minimum, more is best.

From what I've read on this, Ceph stretch mode is aware of site topology. You can assign MONs to datacenters and add a witness MON in a VM in a third location.
  • In case of a site failure, the surviving site + witness can automatically maintain quorum.
  • It’s designed for exactly this kind of dual-DC setup.
So while stretch mode isn’t mandatory, it’s strongly recommended for your scenario. It adds a layer of safety and automation that’s hard to replicate manually.

Without stretch mode if the link between locations fails you risk a "split-braiin" situation. You would also need to intervene manually to restore service.
 
  • Like
Reactions: pveadventure
A Qdevice can be a VM but is typically a separate hardware device so it stays running: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support. It can be a Pi or a Debian server...typically not running PVE. It can run on PBS I believe.

You would not want two Qdevices. After installing the first, you would have an odd number of votes, then adding a second would return to an even number of votes, which is a Bad Idea with a Qdevice: https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_supported_setups
 
I have found this

That is exactly my scenario. I just was in doubt about the VM Witness.
 
  • Like
Reactions: MagicFab
Hi folks...

Just another question: if I create a VM to be qdevice/vm witness to pve cluster/ceph cluster, can I put this VM into HA Layer?
In this manner, when the DC1 goes south, then the VM Witness/qdevice goes to the other DC, to be a witness to that 3 servers.
Is that right?
(I hope I coule made myself clear!)
 
Just another question: if I create a VM to be qdevice/vm witness to pve cluster/ceph cluster, can I put this VM into HA Layer?
That won't work.
In this manner, when the DC1 goes south, then the VM Witness/qdevice goes to the other DC, to be a witness to that 3 servers.
If you lose half of your nodes (due to a fire in DC1) then the other half won't have more than half the votes and it won't consider itself quorate (and it won't start VMs).

If one node goes down, the cluster will continue working fine and the HA VMs (when on the broken node) will be restarted on a working node. If you lose quorum (due to losing too many nodes at the same time) then it's proven to be impossible to recover automatically. However, you can recover manually by choosing the remaining nodes to expect less votes. Maybe this is good enough for you? Personally, I would go with more than 3 nodes in each DC as it becomes vulnerable when one DC burns down and you lose one node due to hardware problems.

Both DCs are communicating to each other, usually via a single point of failure (SPOF), that also connects the users to both DCs and the VMs running on them. Put the QDevice on the SPOF that connects users (or any other place closer to the users) to the (distributed over two DCs) cluster and it will chose the surviving DC automatically.

This has been discussed several tmes before (on this forum), and every time someone tries to be clever, but it's mathematically impossible to prevent split-brain automatically with two DCs without a third site and a QDevice.
 
Last edited:
That won't work.

If you lose half of your nodes (due to a fire in DC1) then the other half won't have more than half the votes and it won't consider itself quorate (and it won't start VMs).

If one node goes down, the cluster will continue working fine and the HA VMs (when on the broken node) will be restarted on a working node. If you lose quorum (due to losing too many nodes at the same time) then it's proven to be impossible to recover automatically. However, you can recover manually by choosing the remaining nodes to expect less votes. Maybe this is good enough for you? Personally, I would go with more than 3 nodes in each DC as it becomes vulnerable when one DC burns down and you lose one node due to hardware problems.

Both DCs are communicating to each other, usually via a single point of failure (SPOF), that also connects the users to both DCs and the VMs running on them. Put the QDevice on the SPOF that connects users (or any other place closer to the users) to the (distributed over two DCs) cluster and it will chose the surviving DC automatically.

This has been discussed several tmes before (on this forum), and every time someone tries to be clever, but it's mathematically impossible to prevent split-brain automatically with two DCs without a third site and a QDevice.
So this configuration here is tottaly incorrect?
I am not trying to be a smart guy, but even this thread and the Ceph Doc page recommend the stretched ceph when two DC are involved.

I just trying to understand all the facts before goes to pratice.
 
I am not trying to be a smart guy, but even this thread and the Ceph Doc page recommend the stretched ceph when two DC are involved.
Please note that this is Ceph and uses 3 DC. I assumed you wanted to do this with 2 DC and PVE. I suggested a solution with 3 sites as well, but also pointed out that 2 sites is not enough. At least for automatic recovery when half of the nodes is lost (but manually it is possible).
I did not read any thread with focus and in detail, so I might be way off, in which case I'm sorry, and I'll be interested to learn.
 
Last edited:
Please note that this is Ceph and uses 3 DC. I assumed you wanted to do this with 2 DC and PVE. I suggested a solution with 3 sites as well, but also pointed out that 2 sites is not enough.
I did not read any thread with focus and in detail, so I might be way off, in which case I'm sorry, and I'll be interested to learn.
Perhaps I missunderstoud that thread, because I was imaging that DC3 could be a VM or another physical server, inside DC1 or DC2.
But seems that quorum point of view, the VM or another physical server should be outside DC1/DC2.
 
Perhaps I missunderstoud that thread, because I was imaging that DC3 could be a VM or another physical server, inside DC1 or DC2.
A "virtual third site" does not work when one of the two real sites burns down. That was the point I was trying to make. Sorry that this was not more clear.
But seems that quorum point of view, the VM or another physical server should be outside DC1/DC2.
Yes: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum . My suggestion would be to put it in the site of the users (and use that as a third site), if that does not coincide with DC1 or DC2.

EDIT: Don't trust me (as I'm just a stranger on the internet) and please do try your ides out and test them by yanking (virtual) wires before putting any of this into production.
 
Last edited:
  • Like
Reactions: pveadventure
A "virtual third site" does not work when one of the two real sites burns down. That was the point I was trying to make. Sorry that this was not more clear.

Yes: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_quorum . My suggestion would be to put it in the site of the users (and use that as a third site), if that does not coincide with DC1 or DC2.

EDIT: Don't trust me (as I'm just a stranger on the internet) and please do try your ides out and test them by yanking (virtual) wires before putting any of this into production.
But even so, if I have a third "DC", the CEPH will be with 3 server + qdevice/vm witness, i.e, 4 server, which by design it's not ideal configuration to Ceph, right?
Forgive me if I look so newbie. I'm still try to understand the whole pieces.
 
Last edited:
I send another question, but was put in moderation. I am not sure why!
So, here we go again:
"But even so, if I have a third "DC", the CEPH will be with 3 server + qdevice/vm witness, i.e, 4 server, which by design it's not ideal configuration to Ceph, right?
Forgive me if I look so newbie. I'm still try to understand the whole pieces."
 
I send another question, but was put in moderation. I am not sure why!
So, here we go again:
"But even so, if I have a third "DC", the CEPH will be with 3 server + qdevice/vm witness, i.e, 4 server, which by design it's not ideal configuration to Ceph, right?
Forgive me if I look so newbie. I'm still try to understand the whole pieces."
Well... Now the message completely disappear... I wonder why!
Previously was in moderation hold on. Now disappear it!
Nice. Thank you!
 
Well... Now the message completely disappear... I wonder why!
Previously was in moderation hold on. Now disappear it!
Nice. Thank you!
I see "But even so, if I have a third "DC", the CEPH will be with 3 server + qdevice/vm witness, i.e, 4 server, which by design it's not ideal configuration to Ceph, right?
Forgive me if I look so newbie. I'm still try to understand the whole pieces." three times now: https://forum.proxmox.com/threads/proxmox-ve-with-6-nodes-and-ceph-with-2-dc.168028/post-781458 , https://forum.proxmox.com/threads/proxmox-ve-with-6-nodes-and-ceph-with-2-dc.168028/post-781459 and https://forum.proxmox.com/threads/proxmox-ve-with-6-nodes-and-ceph-with-2-dc.168028/post-781681 .
But I have no experience with Ceph and don't know how to answer.
 
  • Like
Reactions: Gilberto Ferreira
I see "But even so, if I have a third "DC", the CEPH will be with 3 server + qdevice/vm witness, i.e, 4 server, which by design it's not ideal configuration to Ceph, right?
Forgive me if I look so newbie. I'm still try to understand the whole pieces." three times now: https://forum.proxmox.com/threads/proxmox-ve-with-6-nodes-and-ceph-with-2-dc.168028/post-781458 , https://forum.proxmox.com/threads/proxmox-ve-with-6-nodes-and-ceph-with-2-dc.168028/post-781459 and https://forum.proxmox.com/threads/proxmox-ve-with-6-nodes-and-ceph-with-2-dc.168028/post-781681 .
But I have no experience with Ceph and don't know how to answer.
That's ok. I will managed myself. Thank you.