Feature request: Dedicated quorum node

witoldcebularz · May 21, 2025

Hello, I'm a system engineer at an IT outsourcing company. My job is to design and deploy virtualization solutions. I've used Proxmox for many years now, but only recently our company started to offer Proxmox as an alternative to ESXi/HyperV. I know Proxmox throughout and know its capabilities and limitations.

I've noticed a very important thing that's missing from Proxmox - a dedicated quorum node, specifically for Ceph Stretch Cluster configuration. I've made a post on Reddit on that. You can read the post and the comments to get my perspective.

Why not another node? Because it would be virtualization-capable and, as such, would have to be fully licensed according to some guest-software licensing rules (Windows Server that has the licensing based on host CPUs in the cluster) and that would be inviable. As for HA goups - from what I've been told by Microsoft representative, HA groups don't matter for licensing and all the host CPUs in the cluster have to be licensed, wether the host is or is not in a HA group made for Windows Server VM.

Why qdevice is not an option? Because clients want to have easy access, monitoring, and notifications from the quorum node. Many of them are not familiar with Linux ecosystem and want an elegant and convenient GUI solution. Notifications are important, as they need to know if the qdevice is up and running and if the cluster can reach it. Another thing is Ceph MON that is necessary for Ceph Stretch Cluster configuration. Sure, you can do qdevice with Ceph MON, but this configuration is too awkward for my clients and they perceive it as a hacky, non-production workaround and it's inacceptable for them.

From our perspective, Proxmox quorum is very inconvenient as it is (most of the competition have different solutions that are much cheaper, such a cluster disk/network share) and the need to cover licenses for the another quorum host significantly reduces the interest in Proxmox. And qdevice/qdevice+ceph-mon is off the table for every enterprise client. It's the biggest flaw of Proxmox in my opition, since every other enterprise software can do with 2 node quorum or has other quorum options than another node, which, as I pointed out ealier, causes some licesing issues.

What would be a solution:

a checkbox in Proxmox installer, something like "quorum node"
such a quorum node would have no capatilites to virtualize (no QEMU/KVM, LXC installed)
inablitily to put such node in HA group
ability to use Proxmox SDN stack, as it may be useful in some network configurations or FRR openfabric when it gets support in GUI
full Ceph support, as it would be needed to have a Ceph MON for Stretch Cluster and, additionally, it could function as non-HCI host
full GUI representation, with a different icon than virtualization host, and with ordinary settings (network, disks) as they are for a standard host
full notification and monitoring support

And most importantly, it has to be an officialy documented and supported by Proxmox, so that in case of problems the Proxmox support doesn't turn the client away.

Are there any plans to implement such a feature? Would that be feasable any time soon?

t.lamprecht · May 22, 2025

Hi,

Thanks for your feedback and thoughts.

witoldcebularz said:
What would be a solution:

a checkbox in Proxmox installer, something like "quorum node"

such a quorum node would have no capatilites to virtualize (no QEMU/KVM, LXC installed)

inablitily to put such node in HA group

ability to use Proxmox SDN stack, as it may be useful in some network configurations or FRR openfabric when it gets support in GUI

full Ceph support, as it would be needed to have a Ceph MON for Stretch Cluster and, additionally, it could function as non-HCI host

full GUI representation, with a different icon than virtualization host, and with ordinary settings (network, disks) as they are for a standard host

full notification and monitoring support

This would add a lot of edge cases and respective code to maintain and test and besides that there are better ways to improve the status quo (see below). And I really am not sure that with that list you would pass the "not virtualization-capable" rule of any vendor.
And with not being sure I actually mean I'm rather sure it won't be enough, as having been part of discussion that included such companies I'm rather definitively sure it won't be enough if the core system would not be altered to add a DRM like system to forbid the installation of any tool that might allow running guests, which not only goes totally against our core values (providing FLOSS), it would be also not really technically feasible to really guarantee this. And how would vendors even check this, user could just run the virtualization disabled node during that and then reboot into a co-installed "normal" PVE.

In short and to be frank, to avoid any false hope: no, this most definitively won't be implemented as here reuqested.

Besides, if a soft-block would be enough, then you can just blacklist the KVM module and optionally replace the qemu and lxc executables using dpkg-divert, that would be just four commands that one needs after installation.

In essence such a soft-block could look something like:

Code:

dpkg-divert /usr/libexec/qemu-system-x86_64 /usr/bin/qemu-system-x86_64
dpkg-divert /usr/libexec/lxc /usr/bin/lxc              
ln -s /usr/bin/false /usr/bin/qemu-system-x86_64 
ln -s /usr/bin/false /usr/bin/kvm

Using dpkg-divert means that the packaging systems knows about the file moves, making the block resistant against future updates, which would override the symlinks to the false binary otherwise.

In essences that's quite probably as good as it gets without adding any DRM, so above is in effects either enough for your provided reason that a "normal" PVE node cannot work, or it needs DRM, which is simply which is a no-go for us anyway.

If this is really common, we could provide a more definitive how-to for achieving this, and potentially also a small tool to roll it out conveniently; but being that simple and needed mostly in enterprise environments, most probably sites should be able to handle this through their (hopefully existing) automation stack.

witoldcebularz said:
Why qdevice is not an option? Because clients want to have easy access, monitoring, and notifications from the quorum node. Many of them are not familiar with Linux ecosystem and want an elegant and convenient GUI solution. Notifications are important, as they need to know if the qdevice is up and running and if the cluster can reach it. Another thing is Ceph MON that is necessary for Ceph Stretch Cluster configuration. Sure, you can do qdevice with Ceph MON, but this configuration is too awkward for my clients and they perceive it as a hacky, non-production workaround and it's inacceptable for them.

So, basically the question can be rephrased too, what's needed to make qdevice a valid option for users that need full (well, at least some basic) UI integration and a good and approachable documentation (e.g. how-to) for setting up an external ceph-mon, potentially with tooling integrated in PVE to simplify this.

Seeing the qdevice status and simplifying addition is definitively something that would be nice to have, for the former an enhancement request already exists on our https://bugzilla.proxmox.com/ IIRC, for the latter I'm not so sure. FWIW, one idea that is not fully decided yet is also to provide qdevice "server" integration in the in-development PDM (datacenter manager), with that having nice integration would be relatively simple for those planning to use that project.

ceph-mon might be a bit harder to integrate nicely, but basic status and the like should be doable as we get lots of information already from ceph directly. And a ceph stretch cluster is IMO not really something an inexperienced admin does, so needing to be a bit more hands-on for the initial setups is IMO itself not really a limiting factor, as long as it's the documentation gets created and improved to provide clear directions for doing this.

Feel free to open enhancement requests for those qdevice integration and ceph-mon integration/docs improvements, where there are no existing entries.

witoldcebularz · May 22, 2025

Hello, thank you for your answer and sorry for a late reply.

Let me give you some background of how we work so you can understand some better. There are 2 kinds of deployment that we do: 1. for the clients that already have or want to have an active IT outsourcing contract with us, and 2. one-time deployment. In none of these I am the main user of Proxmox - the client is, and the client is the IT department of a company and they are the ones to decide on virtualization solution - we only lay out the options based on the requirements. In both cases we need Proxmox Support, but in the first one we can monitor the state of the cluster, correct mistakes and provide best-effort support. In the second one, however, we really need to make sure it's an easy to maintain (update, monitor) and future-proof configuration and based only on official Proxmox recommendations and something that Proxmox Support can service.

Also, we not only have to work with technical limitations, but very often some policies and law, as it's the case with smaller governmental bodies. These can severely limit the configuration options.

t.lamprecht said:
And with not being sure I actually mean I'm rather sure it won't be enough, as having been part of discussion that included such companies I'm rather definitively sure it won't be enough if the core system would not be altered to add a DRM like system to forbid the installation of any tool that might allow running guests, which not only goes totally against our core values (providing FLOSS), it would be also not really technically feasible to really guarantee this. And how would vendors even check this, user could just run the virtualization disabled node during that and then reboot into a co-installed "normal" PVE.

EVERYTHING WHAT I'M ABOUT TO SAY ABOUT LICENSING MAY BE WRONG, I'M NOT A LAWYER NOR AN EXPERT ON LICENSING so feel free to correct my miskates.
From what I've read and what I know, I think that making an appliance that is not designed for virtualization would probably do the trick. Like, you can theoretically install QEMU on QDevice, PBS or PDM, yet I don't think Microsoft auditors would have a problem with PBS's CPUs not being covered with Windows Server licenses, even though it has deep (and very convenient) integration with PVE. As far as I'm aware (I'm not sure), and drawing conclusions from other software working with PVE (PBS, PDM) it would only require Proxmox to make a dedicated quorum node installer, come up and state: "this product is not designed for virtualization, it's only intended for quorum and storage services". It probably also could be designed so that it cannot connect to a cluster as a normal node (just like PBS) easly. In my opinion, just thinking about it and drawing parallels to Windows ecosystem, with the same VM licensing rules on HyperV as on any other virtualizaiton solution (apart from a bonus license for a host of course), Windows also has the ability to install/uninstall HyperV role or add/remove from a cluster. Let me know your opinion on this take. Again, it's only my speculation and thoughts.

And as for the softlock you've provided, in my opinion, I don't think it would cut it, simply because PVE was build for virtualization and is meant for virtualization, and all of the components are meant for virtualization (HA, resource mappings, and the ease of reverting the softlock).

t.lamprecht said:
ceph-mon might be a bit harder to integrate nicely, but basic status and the like should be doable as we get lots of information already from ceph directly. And a ceph stretch cluster is IMO not really something an inexperienced admin does, so needing to be a bit more hands-on for the initial setups is IMO itself not really a limiting factor, as long as it's the documentation gets created and improved to provide clear directions for doing this.

Well it's not about the initial configuration, it's about other things, such as:

Easy, long-term maintainability - as I said, after many deployments, I won't be there to take care of the cluster, and my clients (IT department of a company we make deployment for) want an easy way to maintain (update) the QDevice directly through the GUI. They love Proxmox GUI and often would prefer Proxmox over competition simply becuase of it, but QDevice is something that is inconvenient to the point that it destroys the initial excitement, and breaks the biggest advantage of Proxmox for them - amazing, conveniant and solid GUI.
Official Proxmox Support - some of the clients will have to manage the cluster by themselves, and in case of problems, they'll reach out to the Proxmox Support. I wouldn't like for them to hear "sorry, this configuration is not officially supported by Proxmox, we can't help you", because they'd contact our company demanding explanation on why we have deployed an officially unsupported configuration, potentially threatening taking legal action againt us. BTW, is such a configuration (external Ceph node installed from Ceph repos [that are a bit ahead from PVE ones]) officially supported? If I were to deploy such a configuration to production, would Proxmox Support help my client with that non-PVE Ceph MON?
A bit connected to the second point, but it needs to be something done by Proxmox, it needs to be an almost set-and-forget solution (apart from updates) without any corrections, incompatibilities with hosts packages over the cluster lifetime. It would have to be "synchronized" with PVE repo.

And YES, the documentation for making QDevice with Ceph MON would be a huge step forward!!! Why? Let me explain my perspective:

It would provide the official guide from Proxmox, with Proxmox recommended settings. That would also enable large-scale testing of this one, specific configuration.
It would streamline the configuration, so that if a client has some problems with it and approaches the Promox Support, instead of explaining the enteire configuration, they'd just say "QDevice with Ceph MON" and the support would instantly know the topology and how it was configured - that's extremely important for me.
During the consulting phase, before the deployment, and while presenting PVE to a client, if they have any doubts about such a configuration, I could simply point the client to Proxmox Wiki and show them that it's a standard procedure with official instructions - that would surely make them more comftable.

Okay, to wrap it all up, why is Proxmox quorum such an important topic for me? The truth is that the current system is inconvenient for us and the clients and quorum is the single reason why all the clients interested in Proxmox resigned and went for a different solution entirely. Here's a breakdown of the clientele:

Small users with few VMs that want HA at low cost (typically aiming for Community/Basic, rarely Standard plans) - they are used to HyperV setup with 2 hosts and a disk array (main storage and witness) over directly attached, redundant FC. When they hear the requirement of having 3 nodes for quorum and fully shared storage they're discouraged, as it causes the deployment to get much more expensive, simply becuase of the additional, uneeded Microsoft licenses. In majority of the cases they could afford it, but it would be a waste of money compaired to just running HyperV, which they're already familiar with. The solution would be a Quorum server with full Ceph setup, that would essentially be bought instead of a disk array. And would be more redundant, all the pros without the cons. Especially with FRR openfabric. They would love it.
A bit bigger users, interested in Standard/Premium, rarely Basic plans - 4-8 servers - they don't want the failure domain of a host, they typically have 2 locations (+ a colocation or a smaller site with limited infrastructure) and want a full redundacy between them, that's why I'm so "obsessed" with stretch cluster. They want to be prepaired for an instant, full outage of a site, for whatever reason. As they are typically, in our case, small govt organizations, they typically have some policies that prevent storing sensitive data in, for example, a colocated site/cloud - only quorum is allowed there. Would be great not having to pay for Microsoft licenses for another node, especially because the competition allows that and does so more elegantly with pre-made solutions (cloud witness, that also gets rid of the need to buy another hardware and pay for colocation [QDevice is not recommended in the cloud, so I'm not even considering it]). A quorum node with Ceph MON would be a compromise many clients would be willing to make, just for having great Proxmox Backup solution, for example. Additional note: Things like stretch cluster or failure domain of a datacenter are the things that make SDS solutions, such as Ceph, really pop out, since the equivalent configuration on disk arrays are a bit cluncky, overcomplicated, expensive and poorly documented in case of some vendors, and is "natural" with Ceph.

We typically don't have bigger clients, and if we do, they go straight for VMware without thinking twice.

I like Proxmox and would like it to succeed, that's why I'm providing you with feedback and the reasons why some clients opted for a different solution. I simply think that the lack of this feature causes Proxmox to lose a lot of smaller, but paying customers, at least in my bubble.

PS. I think PBS and its integration is right now the biggest feature that make Proxmox stand out, especially for the IT staff who had to deal with HyperV + Nakivo. They're mesmerizied at how nicely and seemlessly it works, and some are even willing to give up snapshots on shared storage just becuase of PBS. The only feature missing is the official support for SMB/iSCSI shares + running it in a VM.

Hope you appreciate the feedback. Have a nice day!

alexskysilk · May 22, 2025

so, allow me to summarize your argument:
You have customers who dont want three nodes. Those customers balk at having a quorum device that looks like a pve node, or a quorum witness appliance. for reasons. Is that about the gist?

Let me ask you something else: what, exactly, are the services YOU provide to those customers? why is the above not included in those? When I architect a solution, I stand by the design and if I hand over the keys to the customer's ops team I also give them documentation. it is RARELY a single vendor affair.

witoldcebularz · May 23, 2025

alexskysilk said:
for reasons

For good reasons outlined in my post, such as needlessly buying additional licenses.

Search

Search

Feature request: Dedicated quorum node

witoldcebularz

New Member

t.lamprecht

Proxmox Staff Member

witoldcebularz

New Member

alexskysilk

Distinguished Member

witoldcebularz

New Member

We value your privacy