[SOLVED] 2-node PVE cluster with PBS as a Qdevice

QSA

New Member
Aug 19, 2024
9
1
3
Hello everyone.

I have a 2-node PVE cluster and a PBS server that will be used for backing up the VMs running on the PVE Cluster.
I wanted to add the PBS as a Qdevice in order to achieve 3 votes for quorum.

As I was going through the setup of the Qdevice, I noticed that I could only specify one network link whereas while setting up the PVE Cluster, I could specify multiple net links and with priority.

> Here is my question : Is it possible to specify more net links (IP) for a Qdevice, or is it only just one ? I would rather specify a « slower » secondary netlink than having the cluster going bad.

If this solution isn’t feasible, maybe there is a way to make it work by properly adding the PBS node in the PVE cluster’s corosync.conf file (and in the corosync.conf file on the PBS) ?

And yet, if this is not possible, can’t I just install PVE then PBS or vice-versa and just use the PBS part of it on port 8007 ? If some PVE functionality aren’t used, and there probably will be, can I disable them safely ?

Thank you in advance for your help.

Kind regards,
 
AFAIK just one but IIRC the Qdevice is more forgiving than corosync. Would be nice though.

https://pbs.proxmox.com/docs/installation.html#install-proxmox-backup-server-on-proxmox-ve has docs but also a warning.

First of all thank you for your fast answer.

You are right, I hope the ability to specify multiple net links will be added one day (IPV4 and IPV6).

Regarding the doc's warning : even if the backup server is a separate physical server, if the PBS fails... it fails right ? You cannot access the backups anymore.
Or maybe I'm missing something that the warning was trying to explain ?

Maybe was this warning for someone who wants to make a PVE + PBS cohabitation while running VMs on it ? In this case, of course it is problematic.

But if I'm not running virtual machines on it, just using PVE for the pve-cluster part (specifying multiple net links) shouldn't that be alright ?
 
First of all thank you for your fast answer.

You are right, I hope the ability to specify multiple net links will be added one day (IPV4 and IPV6).

Regarding the doc's warning : even if the backup server is a separate physical server, if the PBS fails... it fails right ? You cannot access the backups anymore.
Or maybe I'm missing something that the warning was trying to explain ?

The warning has two sides: First you want your backups not on the same machine and the same storage as your VMs/LXCs since in case of a failure of that storage your backups will be lost too. The second reason is PBS and PVE together might hinder an update (for example PVE 9 beta update instructions say, that a paaralel PBS needs to be version 4 which isn't relased yet (not even as a beta). And a PBS inside a VM needs (obviouvsly) a running PVE thus you have additional complexity/restore work in case of an emergency.

Another thing to consider is security: If one can login via ssh on one node of the cluster you can also ssh on the other without additional authentification. So in case of an hacker attack the attacker only needs to hack the cluster and he will also be able to take over your PBS if it's installed inside your cluster. If on the other hand the PBS is seperated from the cluster you avoid this risc.
 
  • Like
Reactions: QSA
The warning has two sides: First you want your backups not on the same machine and the same storage as your VMs/LXCs since in case of a failure of that storage your backups will be lost too. The second reason is PBS and PVE together might hinder an update (for example PVE 9 beta update instructions say, that a paaralel PBS needs to be version 4 which isn't relased yet (not even as a beta). And a PBS inside a VM needs (obviouvsly) a running PVE thus you have additional complexity/restore work in case of an emergency.

Another thing to consider is security: If one can login via ssh on one node of the cluster you can also ssh on the other without additional authentification. So in case of an hacker attack the attacker only needs to hack the cluster and he will also be able to take over your PBS if it's installed inside your cluster. If on the other hand the PBS is seperated from the cluster you avoid this risc.

Thanks a lot for your reply too.

If I understood correctly, in my use case, as I will not be running VMs on it, the 2 majors inconvenients are 1. updates from both products that could interfere with themselves and 2. security as you told.

Regarding security, I wonder if the same thing could happend with the PBS as a Qdevice. (The fact of being able to ssh directly to another node without additional authentication is due to the cluster configuration -> ssh-keygen ; pub key exchange right ?)

If I remember correctly, it is also done while setting up the Qdevice, so even by using the PBS as a Qdevice, it's going to be vulnerable to the same attack.

Edit :

"I hope that I am wrong but right now I can't think of any other way of doing what I want initialy. "

I meant in a secure way but maybe my idea is just not feasible at all. It won't have the same result as having the PBS isolated and only "accessible" by the PVE nodes through a properly configured "Proxmox Backup Authentication Server Realm" user for backups.
 
Last edited:
One can SSH from PVE into a Qdevice without password, yes.
Thank you for the confirmation, I also just tested it before seeing your asnwer. :D

By taking security into account (obviously) If the backup server must be isolated from the production environment in order to not be directly accessible from it (I'm thinking about ransomware in worst case), it looks like I have no way to achieve quorum without having the 2 PVE-nodes + 1 VPS/Another dedicated server as a Qdevice and finally, the isolated PBS for backups.
 
It is possible to give one PVE node 2 votes, but then it must always be on. Or use a VM that can migrate between the nodes, but then the VM also must be running. Or put a VM on a client PC, say Hyper-V. A Qdevice can also be something small like a Pi.
 
It is possible to give one PVE node 2 votes, but then it must always be on. Or use a VM that can migrate between the nodes, but then the VM also must be running. Or put a VM on a client PC, say Hyper-V. A Qdevice can also be something small like a Pi.
Thank you for your advices.

The 2 votes on one PVE node seems dangerous in production, and the VM that should be running all the time too.

A raspberry PI kinda seems like a good idea but I can't put it in place because i'm not on-premise and even if it was, is Qdevice "that much" more forgiving than corosync regarding latencies ?
If it's correct that you were talking about latency about Qdevice ? If so how much is acceptable ? And couldn't that Qdevice latency delay even more VM switch/migration on another node ?

Kind regards,
 
I am not sure, however, I seem to recall posts asking about a two-datacenter cluster, suggesting to use a Qdevice in a third location, to provide a vote in case one DC goes offline. So I would think it's somewhat more tolerant.
 
  • Like
Reactions: QSA
One can SSH from PVE into a Qdevice without password, yes.
Good point, I didn't thought this trough *sigh* Then I would run the third node as single-node-PVE to run the qdevice in a small Debian vm. PBS then could be installed in it's own VM or parallel to the PVE.
 
  • Like
Reactions: QSA
Thank you for your advices.

The 2 votes on one PVE node seems dangerous in production, and the VM that should be running all the time too.

Yes, the two votes are more a kind of temporary workaround in case you need to troubleshoot something, not for production.

A raspberry PI kinda seems like a good idea but I can't put it in place because i'm not on-premise and even if it was, is Qdevice "that much" more forgiving than corosync regarding latencies ?
According to the documentation of PVE this seems to be the case:
The only requirements for the external host are that it needs network access to the cluster and to have a corosync-qnetd package available. We provide a package for Debian based hosts, and other Linux distributions should also have a package available through their respective package manager.

Note: Unlike corosync itself, a QDevice connects to the cluster over TCP/IP. The daemon can also run outside the LAN of the cluster and isn’t limited to the low latencies requirements of corosync.

https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support
 
  • Like
Reactions: QSA
Yes, the two votes are more a kind of temporary workaround in case you need to troubleshoot something, not for production.


According to the documentation of PVE this seems to be the case:

Again, thank you all for your time.

I was looking for latencies comparison "need" between qdevice and corosync but couldn't find anything related.
I assume that all it matters is that the Qdevice is giving a valid vote that can keep the cluster alive.

Regarding some reddit posts about the same/almost the same case as mine (2 PVE Nodes with 1 PBS as a Qdevice), they do not mention the security risk of doing so because even If PBS has mecanisms against ransomwares like immutable backups or encryption, If the attacker is logged in as root on the PBS, he can delete everything.

I found one way to counter that, and it will make the 3-2-1 rule easier to attain ; having a second PBS-node off-site with a sync job.

PBS documentation says :
https://pbs.proxmox.com/docs/storage.html#the-3-2-1-rule-with-proxmox-backup-server

You can configure sync jobs to not remove snapshots if they vanished on the remote-source to avoid that an attacker that took over the source can cause deletions of backups on the target hosts. If the source-host became victim of a ransomware attack, there is a good chance that sync jobs will fail, triggering an error notification.

I will probably explore and try 2 routes from now on :
1. Have a VPS/Small debian VM with a latency as small as possible even if Qdevice is not as needy as corosync, and have the PBS isolated only accessible via its "backup" user via the PVE GUI. Also I will still need to find a way to move backup on an offsite location for the 3-2-1 rule.
2. Have the 3rd server (the PBS) as a Qdevice and a 4th server (another PBS) "outside" the cluster with sync job, immutable backups (WORM if PBS can do it, I'll verify that), and encryption.

Edit : Forgot "and corosync" at first sentence.
 
Last edited:
  • Like
Reactions: Johannes S
2. Have the 3rd server (the PBS) as a Qdevice and a 4th server (another PBS) "outside" the cluster with sync job, immutable backups (WORM if PBS can do it, I'll verify that), and encryption.

This is the way to go. I'm not sure about WORM, but if they are attached as normal external USB drive (like a harddisc) you could use it as "removable datastore". If you secure your PBS accordingly this isn't needed though: https://pbs.proxmox.com/docs/storage.html#ransomware-protection-recovery

The basic idea is to configure your PBS servers this way:
- The PVE nodes can create backups on the local PBS and restore them, but not edit or remove them. Removing is only possible on the PBS and usually done with prune+garbage collection jobs.
- The remote PBS is allowed to read backups from the local PBS (via a pull-sync job), but not remove or edit them. The local PBS and PVE are not allowed to do anything on the remote PBS. Again removing backups is only possible directly on the remote PBS and usually done via housekeeping jobs

This approach can be combined with iptables/firewall rules on the PBS Servers, so that the remote PBS can't be accesses from your network at all except via VPN for administration. This way even if a bad actor manage to take over all your local infrastructure he can't access the remote PBS.

@Falk R. mentioned several times in the German forum, that this is his basic setup for his corporate customers. For offsite PBS he uses one of the Cloud Storage offerings (for small amounts of data) or rents a dedicated server from Hetzner or some other dedicated server provider (Hetzner has so called storage servers which give you a lot of storage space for your bucks).
 
  • Like
Reactions: QSA and Falk R.
Hi.

I will try to make that solution work around this week, if I have another question I'll reply here.

Thank you for your time and help.

Kind regards,
 
Hi, I'm back.

Little update :

I was mainly trying to set up Linstor DRBD for high-availability shared storage, and I was skeptical that it would work as intended on PBS because perhaps the PBS kernel is different from that of a PVE, or certain packages might not be compatible, etc.

I did ask some questions about it on Linbit's forum (if it can help someone in the same case as mine : https://forums.linbit.com/t/linstor-drbd-on-proxmox-backup-server/945/8), and I was successful in making it work.

Yesterday, someone from LINBIT (The company behind Linstor DRBD) responded to me by resuming the essential steps :

Welcome to the forums.

Looks like you’re on the right track. LINSTOR needs at least two diskful nodes with storage, the 3rd node can be exclusively diskless. This means it is also a LINSTOR satellite node, and DRBD is required even if there isn’t local storage. This 3rd node is also a good candidate to run the LINSTOR controller as a combined LINSTOR node.

Your cluster configuration is exactly what we refer to as the “minimal” LINSTOR deployment pattern, you just so happen be using PBS on your 3rd node.

One thing to mention about kernels, it is always recommended to run the same kernels throughout a cluster if possible (not just the same DRBD kernel module versions). If the “PVE kernel” is easily installable on the PBS node, even if you have to add a Proxmox repo to make the same kernel available on PBS, that’s what I would recommend.

Out of curiosity @Falk R. , I was wondering what you are using for shared storage so that high availability can function properly. Would you mind telling me if you're using Ceph/Linstor DRBD/SAN or anything else totally different ?

Thank you for your time and help as always.

Kind regards,
 
Hi, I'm back.

Little update :

I was mainly trying to set up Linstor DRBD for high-availability shared storage, and I was skeptical that it would work as intended on PBS because perhaps the PBS kernel is different from that of a PVE, or certain packages might not be compatible, etc.

I did ask some questions about it on Linbit's forum (if it can help someone in the same case as mine : https://forums.linbit.com/t/linstor-drbd-on-proxmox-backup-server/945/8), and I was successful in making it work.

Yesterday, someone from LINBIT (The company behind Linstor DRBD) responded to me by resuming the essential steps :



Out of curiosity @Falk R. , I was wondering what you are using for shared storage so that high availability can function properly. Would you mind telling me if you're using Ceph/Linstor DRBD/SAN or anything else totally different ?

Thank you for your time and help as always.

Kind regards,
Hi,
I use many different Solutions. For new Setups with 2 Nodes, only ZFS with replication.
For customers who come from VMware and already have shared storage or storage virtualization such as Datacore, the existing storage is connected via FC, iSCSI or NVMe over TCP.

The PBS has the same Kernel Like PVE and is one of the best Options for an qdevice.
 
  • Like
Reactions: QSA and Johannes S
Thank you for your answer and clear information :)

As I don't have any questions, I will mark this as solved.

Thank you all again for your time and help.

Have a nice day,

Kind regards,