A Conversation about Backup Strategies and PBS

Jan 12, 2022
39
8
13
34
I would like to have a constructive conversation with both the Proxmox team and the community about the conceptual backup strategy that this product is currently following.
This may not be the best place for such a topic, but I wanted it to be in the sight of the community rather than just a email inbox somewhere.


From a security perspective it is less secure to have backup job configurations, crons, and restore functionality located on the hypervisor host.
Known as a "push" configuration, it is generally known to be less secure than a "pull" configuration.

For a specific example, in the event of a ransomware attacker, an attacker with access to the hypervisor host would be able to damage, degrade, or destroy backup jobs, resulting in the loss of future backups if monitoring is not properly performed.
An attacker could also access the backup server to then delete the backup data, because the hypervisor intiates network connections to the backup server, which also happens to be the same port where the management interface is located.


I would like to see PBS change to a pull backup strategy.
For something as critical as backup data, I think it is incredibly important to put security as a priority. Unpriviledged PBS users disallowing the changing of existing data is an important step in the right direction.

I thoroughly enjoy this product and I think it has the potential to be a secure and comprehensive backup solution, but not with its current trajectory.
I do realize this is not a simple change and would require a significant amount of work.

I would love to hear about Proxmox's concerns as to why they did/did not follow this type of backup strategy.
I completely understand if the reason for not undertaking this concept is because it would require too much work for little gain.
 
  • Like
Reactions: Zamana
Important to me, security is usually worth very little to others.
This may be a misjudgment of the majority of people on here.

I see where you are coming from, and I agree it is a better strategy. But Push backups are an inescapable truth in the enterprise. Your most frequent vector of attack to backups will be a timebomb, where the attacker taints all of your backups for a month or two then activates the ransomware. If your Hypervisor is compromised, you have much more significant issues. Keeping your management LAN separate, using Keys and 2FA for SSH, and 2fa on the web console will have a far greater impact than Push vs Pull.

I just want to be clear your VMs (and most computers on your network) Should never be able to communicate with your hypervisor and there should be multiple layers of Firewall and a Vlan to prevent that.
 
Last edited:
I see where you are coming from, and I agree it is a better strategy. But Push backups are an inescapable truth in the enterprise. Your most frequent vector of attack to backups will be a timebomb, where the attacker taints all of your backups for a month or two then activates the ransomware. If your Hypervisor is compromised, you have much more significant issues. Keeping your management LAN separate, using Keys and 2FA for SSH, and 2fa on the web console will have a far greater impact than Push vs Pull.

Yes, very true, using push backups as an attack vector is not the more likely scenario to occur.
But just because something is unlikely doesn't mean attempts to mitigate the vulnerability shouldnt be taken. Defense in depth.

I would also like to point out that Veeam uses a pull configuration. I would also like to point out I do not like their product very much.
I'm not saying the Proxmox team should follow Veeams example, but they (Veeam) are a very widely-used and almost industry standard for virtualization backup.


Security issues asside, If Proxmox's reason for recommending PBS to be deployed to its own dedicated hardware is in case of PVE host destruction, then does it not make more sense from a restore point of view (and mean time to recovery) to have your backup job configurations and restore functionality located on that dedicated hardware?
 
Last edited:
Yes, very true, using push backups as an attack vector is not the more likely scenario to occur.
But just because something is unlikely doesn't mean attempts to mitigate the vulnerability shouldnt be taken. Defense in depth.
Very True I cannot argue with that.
I would also like to point out that Veeam uses a pull configuration. I would also like to point out I do not like their product very much.
I'm not saying the Proxmox team should follow Veeams example, but they (Veeam) are a very widely-used and almost industry standard for virtualization backup.
Also Very true.
Security issues asside, If Proxmox's reason for recommending PBS to be deployed to its own dedicated hardware in case of PVE host destruction, does it not make more sense from a restore point of view (and mean time to recovery) to have your backup job configurations and restore functionality located on that dedicated hardware?
In my research of PBS I got more of the sense that it is protecting from host destruction in the physical sense, Fire, Water, etc. We can also rest assured that a compromised VM cannot directly tamper with its own backups because the machine has no idea it was backed up. where as a machine that sees a Veem process may interfere with the backup at the service level.

I think tapes would be a better way to combat your issues. I don't mean to derail the conversation, but honestly, the fact that Proxmox seems to not care about Amazon's S3 Virtual Tape storage is more of a concern to me. If other backup systems like Bacula (also open source) and Veeme can Support AWS Tape Gateway, then I would imagine that Proxmox can too. I'm not a programmer; I'm a sysadmin. But This would provide Immutable Low maintenance Long-term backups that would (i believe) resolve most of your issues.

IMO this is a security issue, I am told this is because AWS won't support some LTO-4 Features. I wonder how important those features truly are and if they can be worked around without breaking support for physical Tapes.
 
Very True I cannot argue with that.

Also Very true.

In my research of PBS I got more of the sense that it is protecting from host destruction in the physical sense, Fire, Water, etc. We can also rest assured that a compromised VM cannot directly tamper with its own backups because the machine has no idea it was backed up. where as a machine that sees a Veem process may interfere with the backup at the service level.

I think tapes would be a better way to combat your issues. I don't mean to derail the conversation, but honestly, the fact that Proxmox seems to not care about Amazon's S3 Virtual Tape storage is more of a concern to me. If other backup systems like Bacula (also open source) and Veeme can Support AWS Tape Gateway, then I would imagine that Proxmox can too. I'm not a programmer; I'm a sysadmin. But This would provide Immutable Low maintenance Long-term backups that would (i believe) resolve most of your issues.

IMO this is a security issue, I am told this is because AWS won't support some LTO-4 Features. I wonder how important those features truly are and if they can be worked around without breaking support for physical Tapes.

All good points, thank you for weighing in.

I hope to see a staff member weigh in, and see what the team's perspective is.
 
quick ammendment, I see now that Veeme has a proxmox integration (that I don't feel like trying) and would be able complete snapshot backups too.
One of the big problems I had with Veeam was the licensing requirement for more than 10 VMs, and their stubbornness to never develop a Linux based backup and replication server.
It was a significant part of why I dumped ESXi and moved to PVE.
 
I would like to have a constructive conversation with both the Proxmox team and the community about the conceptual backup strategy that this product is currently following.
This may not be the best place for such a topic, but I wanted it to be in the sight of the community rather than just a email inbox somewhere.


From a security perspective it is less secure to have backup job configurations, crons, and restore functionality located on the hypervisor host.
Known as a "push" configuration, it is generally known to be less secure than a "pull" configuration.

For a specific example, in the event of a ransomware attacker, an attacker with access to the hypervisor host would be able to damage, degrade, or destroy backup jobs, resulting in the loss of future backups if monitoring is not properly performed.
An attacker could also access the backup server to then delete the backup data, because the hypervisor intiates network connections to the backup server, which also happens to be the same port where the management interface is located.
You can create different users/tokens and limit the rights. You can for example create a token for your PVE so that the PVE may create and/or restore backups but forbit the PVE to prune/delete backups. And pruning can be done as a job on the PBS. In that case you can ignore the backup retentions on the PVE side. So that should prevent a ransomware compromised PVE host to alter any backups.

I think a push configuration was also chosen because that way you don'T need to trust the PBS hoster. You can encrypt your backups on the PVE and push them to the PBS, so the PBS hoster got zero knowlage.
 
Last edited:
You can create different users/tokens and limit the rights. You can for example create a token for your PVE so that the PVE may create and/or restore backups but forbit the PVE to prune/delete backups. And pruning can be done as a job on the PBS. In that case you can ignore the backup retentions on the PVE side.

I think a push configuration was also chosen because that way you don'T need to trust the PBS hoster. You can encrypt your backups on the PVE and push them to the PBS, so the PBS hoster got zero knowlage.

I briefly mentioned the users/tokens feature originally, I think its a great security feature and use it myself.

Your point about not trusting the PBS host is an interesting backup model.
If PBS is being used as an offsite backup, then this model does make sense, and is used frequently in the wild (see Synology Hyper Backup).

I do believe the more common backup model is the PBS being the more trusted and protected asset in the network. Backups are critical to sustaining any sort of failure, malware, or user error, and thus should be highly protected and behind layers of defense.
But when used on an internal network, I don't particularily see the need for storage encryption (although there are reasons to use it internally), and would say the PBS should be a more trusted host / in a more trusted subnet.

I think a good middle ground would be offering both features. A zero trust offsite backup repository or a highly trusted centralized backup manager, depending on the users needs.
 
For a specific example, in the event of a ransomware attacker, an attacker with access to the hypervisor host would be able to damage, degrade, or destroy backup jobs, resulting in the loss of future backups if monitoring is not properly performed.
First, without monitoring, you also won't detect if the storage deteriorates and any backup (push or pull) is just garbage, and that can happen without any attacker, so this is rather a moot point anyway, but let's assume it isn't, so for pull based the attacker would just block the incoming PBS connections or alter the guest state locally, allowing them to have the exact same impact as you describe
An attacker could also access the backup server to then delete the backup data, because the hypervisor intiates network connections to the backup server, which also happens to be the same port where the management interface is located.
No it cannot delete any, not even their own older backups, at least if the user sets up the access token with a backup creation-only privilege, i.e., DatastoreBackup vs. DatastorePowerUser.
Please see https://pbs.proxmox.com/docs/user-management.html#access-control
A different port wouldn't change anything, opening a pull-connection will allow talking back anyhow, due to connection tracking, as else no data could be "pulled" back...

I would like to see PBS change to a pull backup strategy.
For backup this is not planned due to security concern, we'd need to give the PBS access to the whole infrastructure, this would need to be with a high level privileged access token and incoming network access open for the PBS, as otherwise it cannot access the whole guests data at anytime, trusted client-side encryption would be harder to get right, as having this configured on the PVE would violate the vector (hypervisor gets attacked) you used to argue your "pull" strategy, so we'd effectively lose a major security feature, or at least a good amount of its trust/security benefits.
This would then make the backup server a possible central point of failure, allowing access on all hypervisors with the full production workload and data, far from ideal.

A pull-based strategy must be bidirectional by nature (one way sends control/initiation commands, the other ways data is sent), so attacker can mess with data streams on either side to influence the other side, while a push based approach can be unidirectional, so much easier to secure (code and setup wise) because there's only one incoming direction that needs to be checked, and that's rather easy to do there, as we already assume any incoming connection to the rest server as untrusted from the beginning.

For something as critical as backup data, I think it is incredibly important to put security as a priority.
Exactly, that's why we went for push based backups with possible client side encryption, API access tokens that can use very reduced privileges to only do the job they must and nothing more.
 
  • Like
Reactions: tomstull
The second way, i use always, is to have a local pbs and a remote pbs. The remote pulls from local without deleting vanished.
For a longer time than the pulled one.

In the local pbs the backups are pushed, in the remote are pulled from local.

Example:
In the local maintining 12 days and in the remote 30 days

I'm sure that this is not the best solution but it's reasonable.

Diaolin
 
First, without monitoring, you also won't detect if the storage deteriorates and any backup (push or pull) is just garbage, and that can happen without any attacker, so this is rather a moot point anyway, but let's assume it isn't, so for pull based the attacker would just block the incoming PBS connections or alter the guest state locally, allowing them to have the exact same impact as you describe

No it cannot delete any, not even their own older backups, at least if the user sets up the access token with a backup creation-only privilege, i.e., DatastoreBackup vs. DatastorePowerUser.
Please see https://pbs.proxmox.com/docs/user-management.html#access-control
A different port wouldn't change anything, opening a pull-connection will allow talking back anyhow, due to connection tracking, as else no data could be "pulled" back...


For backup this is not planned due to security concern, we'd need to give the PBS access to the whole infrastructure, this would need to be with a high level privileged access token and incoming network access open for the PBS, as otherwise it cannot access the whole guests data at anytime, trusted client-side encryption would be harder to get right, as having this configured on the PVE would violate the vector (hypervisor gets attacked) you used to argue your "pull" strategy, so we'd effectively lose a major security feature, or at least a good amount of its trust/security benefits.
This would then make the backup server a possible central point of failure, allowing access on all hypervisors with the full production workload and data, far from ideal.

A pull-based strategy must be bidirectional by nature (one way sends control/initiation commands, the other ways data is sent), so attacker can mess with data streams on either side to influence the other side, while a push based approach can be unidirectional, so much easier to secure (code and setup wise) because there's only one incoming direction that needs to be checked, and that's rather easy to do there, as we already assume any incoming connection to the rest server as untrusted from the beginning.


Exactly, that's why we went for push based backups with possible client side encryption, API access tokens that can use very reduced privileges to only do the job they must and nothing more.

You make some great points, and I'd like to take a moment to address some details in your response.

If an attacker compromised the hypervisor and also had the PBS management root credentials, there is nothing stopping them from logging into the management interface from a PVE host. A network firewall can do nothing about this because you use a single port. This would be the same problem as in a Pull configuration, so this issue is more a problem of having multiple services exposed on a single port.
I can see a possible solution to this using IP whitelisting at the application/service level on the back server itself to ensure a hypervisor cannot send any data to the management service.
I do not know exactly how you have the web service setup, so admittedly I do not know how feasable that is.

Giving access PBS access to the whole infrastructure isn't necessarily bad. Its simply a different backup model where the backup server is more trusted than the hypervisor.

I would also like to comment on a Pull strategy being bidirectional. If you mean it is a TCP connection and data flows both directions, then yes it is bidirectional.

But this is not FTP, there is no need to make a network connection one way, only to tell the host to make another connection back the other way.
For example, Rsync used in a Pull configuration is a single TCP connection but can pull or push data depending on the source/destination specified, but a network conneciton is only initiated once.
In a PBS example, PBS can initiate a TCP connection to the PVE host and pull backup data back inside the same TCP connection.

Push configurations are not unidirectional. There is a TCP connection iniated from the PVE host to PBS. Yes the majority of the data flows PVE -> PBS, but there is still data flowing the other direction for packet error recovery and checksums.

In a Pull configuration, you can also have proper user access control. Having user accounts on the PVE host for read only permissions to only allow PBS to copy data is an example.

Overall, I definitely do realize Push backups have a use case, and should stay as an option in PBS. They are undeniably great in a network model where the backup server is untrusted.
I would really like to see the optional feature where a Pull strategy could be chosen by the user.
If the backup server is the more trusted device in a network, I think it makes a lot of sense to have the backup job configuration and restore functionality located centrally on the backup server.
 
Giving access PBS access to the whole infrastructure isn't necessarily bad. Its simply a different backup model where the backup server is more trusted than the hypervisor.
Sure, nothing is black and white, but as it has a much higher attack, and that to the raw data (vs. client encrypted data).
It's not that we did not think about that, rather we explicitly decided against that approach (not only for that reason, but definitively had some weight) after much discussion between developers.
But this is not FTP, there is no need to make a network connection one way, only to tell the host to make another connection back the other way.
For example, Rsync used in a Pull configuration is a single TCP connection but can pull or push data depending on the source/destination specified, but a network conneciton is only initiated once.
In a PBS example, PBS can initiate a TCP connection to the PVE host and pull backup data back inside the same TCP connection.
The connection count can always be one, that's not what I meant. The communication being bidirectional != higher connection count. It's just that push is simple one way data flow, where the backup source never needs to interpret backup-protocol level server control messages (=additional attack surface) vs. the pull-based is two-way.

Push configurations are not unidirectional. There is a TCP connection iniated from the PVE host to PBS. Yes the majority of the data flows PVE -> PBS, but there is still data flowing the other direction for packet error recovery and checksums.
We're talking beside each other on that one. Every connection having two ends != both sides need to handle logic, interpret commands, ...
I'd lightly suggest prototyping the approach you like to persuade people too, IMO that's the simplest way to see the fallacies/drawbacks of that approach, and that's not meant in any derogatory way, but that's IMO indeed the best way to evaluate those things. We also played around and discussed different approaches when PBS was initial developed in ~2019 before settling on the current design, doesn't mean that its fixed forever, but the rough direction will be. As we're fully open source you could also see what it would take to implement your request in the actual PBS code base.

In a Pull configuration, you can also have proper user access control. Having user accounts on the PVE host for read only permissions to only allow PBS to copy data is an example.
You have proper, fine-grained access control now already, so even if one could add it to a push based approach it wouldn't be a new benefit, as it exists already.

I would really like to see the optional feature where a Pull strategy could be chosen by the user.
If the backup server is the more trusted device in a network, I think it makes a lot of sense to have the backup job configuration and restore functionality located centrally on the backup server.
I can understand if one argues possible convenience benefits for their respective setup, or idea for a setup, like the last quoted paragraph above seems to do, as security-wise it's just not arguable (IMO/IME).

But, to be straight with anybody requesting this, and to avoid "false hopes": Implementing pull based backup strategy with one PBS connecting everywhere is not planned for the foreseeable future. For one part, we just see no benefit in security (on the contrary for most, not carefully set up cases there's a big risk of a SPoF) and it doesn't allow to model setups that cannot be modelled with the existing architecture already (small use of existing technology for workarounds deemed to be OK), as the ROI is just too low for the amount of work this would require.
 
to add: if you really really really require this kind of semantics - you can always co-install PBS with PVE (per node or per cluster) and have your central big PBS pull from all those PBS via sync ;)
 
to add: if you really really really require this kind of semantics - you can always co-install PBS with PVE (per node or per cluster) and have your central big PBS pull from all those PBS via sync ;)

We definitely disagree on the value of a pull model, but I completely understand the ROI for this feature.

At the end of the day, I'm nobody and you guys are the devs making this all happen, and I appreciate the work y'all have done.
 
  • Like
Reactions: diaolin and fabian
Ok... backup strategies...

What I'm going to say has no relation with pushes and pulls (and sorry if this sounds like a thread hijacking), but Proxmox in general (and not only PBS) has a serious issue for years regarding backup.

All "strategies" and tools focus on backing up your VMs and containers. Great!

But what about all those glory terabytes of data that I've being accumulating for the last ten years in my ZFS' datasets? pve-zsync? zrepl? zfs-autobackup? No, thanks.

I know that PVE is (only?) a hypervisor, but it really needs to address the direct backup of the underlying ZFS storage through the WEB UI, independently of the VMs and containers around.
 
But what about all those glory terabytes of data that I've being accumulating for the last ten years in my ZFS' datasets? pve-zsync? zrepl? zfs-autobackup? No, thanks.

I know that PVE is (only?) a hypervisor, but it really needs to address the direct backup of the underlying ZFS storage through the WEB UI, independently of the VMs and containers around
I'm not sure what else you would use your hypervisor for? Maybe these other tasks would be better suited to a LXC container? and not the Host OS? I am actually interested and would like to hear your use case.

if you want to make backups of a filesystem, within your host (but not a container) you can use the Proxmox Backup Client. and point your backup tasks at your mount point. a cron job like this would work.
Code:
# proxmox-backup-client backup disk1.pxar:/mnt/disk1 disk2.pxar:/mnt/disk2
| https://pbs.proxmox.com/docs/backup-client.html


lastly your totally "thread hijacking" lol :D
 
Hi!

Thanks for replying and thanks for the interest in my case.

As far as I can understand, my underlying storage filesystem is ZFS only at the host level. Once it is bind mounted inside a container, it is not ZFS anymore...

So my point is that I want to backup my pools as ZFS pools/datasets, and not as simple files and directories as they appear when I browse them.

Maybe your example using promox-backup-client could be a solution to my question, but I confess that I do not understand really well what (or, better saying: "how") it does. In other words: is this the equivalent to the "zfs send" here and "zfs recv" there?

Because, if yes, than we are talking about a pve-zsync revamped.

And if not, what's the difference of pbs-client as opposed to backing up VMs and containers with my "files" in there?

A good example: TrueNAS with its snapshot/replication tasks. They work just fine. I can backup my storage, as ZFS pools, at the host level, regardless they are inside a jail or not, with the difference that I can use the WEB UI to configure them.

lastly your totally "thread hijacking" lol :D
I swear that wasn't my intention... ;-)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!