Lesson learned: How a backup job took down my PVE node (NFS + snapshot)

Flancer

New Member
Mar 27, 2026
1
4
3
Hi everyone!
Apologies if this has been brought up before.
Here's my scenario: this morning, the PVE web interface on the node went down, along with several VMs. The node itself is beefy – packed with RAM, all SSDs, multiple ZFS pools, and even the root filesystem lives on ZFS. When I SSH'd in, I saw 0 free space on the Proxmox root filesystem and a hung backup process. Tried to kill it… no luck. Even a reboot from the shell wouldn't go through. I got lucky managing to free up a few GBs on the rpool, after which I power-cycled the node. It came back up, and everything started working.
Digging into the backup logs, I figured out the reason. Overnight, a backup job kicked off targeting PBS with 'snapshot' mode. Since everything was on ZFS, it had been working flawlessly. However, I recently added an NFS storage, moved some mountpoints of an LXC container there, and didn't double-check the backup settings.
During the night, the backup client attempted to snapshot this LXC. Obviously, that's not an option for NFS, so it automatically fell back to 'suspend' mode. It froze the container and started dumping its volume to the NFS share via a temporary folder to release the container faster. The default temp folder is /var/tmp. You can guess the rest – it filled up the entire root partition and crashed. PVE limped along for a few hours before starting to crumble.
Sure, that's on me for not thoroughly studying the docs on how the backup client works. But I was genuinely surprised that a backup job with near-default settings could so easily take down a PVE node.

PS. I'm relatively new to PVE and PBS; before this, I spent a long time working with infrastructure on the Microsoft stack. I really like PVE; I'm impressed that an open-source product can be this mature and functional. I registered on the forum and wrote this post hoping it might come in handy for someone. Also, maybe in future versions, the backup client could check free space in the temp folder before writing to it.
 
Last edited:
Thanks for sharing this painful but valuable lesson!

Just to confirm if my understanding is correct: when using storage that doesn't natively support snapshots (like NFS or standard directory storage such as EXT4/XFS), we need to pay special attention to two critical points:
  1. Temporary Space Requirements: Without snapshots, the backup process relies heavily on a temporary directory (defaulting to /var/tmp). You must ensure that the partition hosting this directory (usually the root filesystem) has enough free space to handle the backup data.
  2. Service Interruption: Because a snapshot isn't possible, the system automatically falls back to "suspend" mode. This means the LXC container or VM will be briefly frozen while the data is being dumped, which will inevitably cause a temporary service interruption.
It's definitely an easy trap to fall into, especially when you are used to how seamlessly ZFS handles this. Thanks again for taking the time to write this up and warn the community!