Windows VM Freezing During Backup

Hi everyone.

So I'm new to both PVE and PBS, but have experience with virtualization. Quick backstory is that this weekend, I moved my small home lab/server setup from just a Windows 11 machine with a couple of VMs running through VMware Workstation Pro into a PVE setup. Getting to this point has been a bit of a nightmare with nothing working the way it was supposed to and having to figure out a lot of error messages. But I persisted and it's now up and running.

I also setup PVS this weekend, using this guide to setup an SMB datastore. I'm using a SMB share that's attached to one of the VMs itself to store the backups. I know, that seems very odd, but without boring you with the details of the setup, this is the way I'm having to do things. However, I actually had everything setup and working and had successfully completed a backup job. Well, there was a power outage yesterday and since then, nothing's worked right.

I fixed multiple other issues, but the problem that I'm having now is that when PVS starts backing up the Windows VM, it causes that VM to essentially pause. And since that's where the share lives, it tries and fails to write the backup data. As soon as it gives up and moves onto the next VM, the Windows VM resumes and the share is available again so all of the other VMs successfully back up. To be clear, this didn't happen when I first set things up and it only started doing this today.

The QEMU agent is installed and working on the Windows VM and is set to freeze the file system, though turning that off didn't help. I also double checked that the backup is set to use snapshots. I've rebuilt the datastore and even gone so far as to nuke and redo the PVS setup from scratch since it's not that complicated, but this is still happening. It was working, then it suddenly wasn't and I have no idea what happened or how to fix it. All the other VMs are Ubuntu or Debian and they back up fine with no interruption to their operation.

I'm baffled and more than a bit frustrated, but I have a feeling there's an easy-ish solution to this I just don't see as I'm not super experienced with PVS. If anyone has any insights, I'd be very grateful. Thanks all!
 
You really shouldn't back up a guest that provides the very storage you're backing up to. My suggestion is to exclude this VM from the PBS backup job and back it up via normal PVE backups to a different storage.
To tell you more we'd need to see the storage and VM config. Likely more but let's start with that.

Bash:
cat /etc/pve/storage.cfg
qm config VMIDHERE --current
 
I wouldn't make my original post too large wth semi-relevant detals. I work in IT and this is definitely not something I'd do for a client in a production environment, but with the equipment I have, this is what I'm able to do at the moment. And like I said, it worked for a day for...some reason. So here's the short version:

I have 14 hard drives of various sizes in USB docks that are combined into a single NTFS volume using a Windows application called StableBit DrivePool. It sounds jank, but it actually works extremely well and has for years. My original server ran Windows 11 as the bare metal OS. Now, the USB drives are being passed through to the VM and that pool is where I'm trying to put the backups. I also use Backblaze to cloud backup most of what's on this pool volume and since Backblaze will only backup local volumes, it also resides on this machine.

Here's the info you wanted:

Bash:
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

lvmthin: seagatehdd
        thinpool seagatehdd
        vgname seagatehdd
        content rootdir,images
        nodes proxmox

pbs: backups-to-windows-smb
        datastore windows-vm-smb
        server 192.168.1.153
        content backup
        fingerprint 4e:d3:90:13:4f:04:7a:72:37:ae:0f:2b:e7:36:bd:33:19:42:99:02:c8:37:d0:94:21:2e:77:ea:42:41:42:c2
        prune-backups keep-all=1
        username root@pam

Bash:
agent: 1,fstrim_cloned_disks=1
bios: ovmf
boot: order=scsi0;net0;sata0
cores: 6
cpu: x86-64-v2-AES
efidisk0: local-lvm:vm-150-disk-1,efitype=4m,pre-enrolled-keys=1,size=4M
machine: pc-q35-9.2+pve1
memory: 16384
meta: creation-qemu=9.2.0,ctime=1752285761
name: PXA-Server
net0: virtio=BC:24:11:D8:7F:78,bridge=vmbr1,firewall=1
numa: 0
onboot: 1
ostype: win11
sata0: local:iso/virtio-win-0.1.271.iso,media=cdrom,size=709474K
scsi0: local-lvm:vm-150-disk-0,iothread=1,size=512G
scsihw: virtio-scsi-single
smbios1: uuid=59871417-d360-45f1-9204-f8e1fd1080a5
sockets: 2
startup: order=1,up=15
tablet: 1
tpmstate0: local-lvm:vm-150-disk-2,size=4M,version=v2.0
usb0: host=152d:0567
usb1: host=152d:0567
usb2: host=152d:0567
usb3: host=152d:0567
vga: qxl,memory=128
vmgenid: 225f262c-ba74-4406-a8f0-ef9f199b4f4c

If anything else is needed, ask away.

If this working for a day was some kind of fluke and it turns out this is really not doable, I suppose I could make another Windows VM with low CPU and RAM that does nothing but host StableBit and Backblaze and just exclude that from the Proxmox Backups.
 
Here's a few things that look off/non-ideal to me
  • No discard for the local-lvm disks
  • Same USB device passed multiple times
  • Two sockets but no NUMA
Not sure if any of this can cause the issue but as I'm looking at and noticing it anyways I might as well tell you about it.
In theory disabling fsfreeze should make this work, the backup just won't be consistent and can cause loops/issues. Also see here. Perhaps it just wasn't applied yet? Would you mind sharing the backup task's output?
 
Last edited:
It's not actually the same USB device, it's 4 units of the same Sabrent USB dock connected to different ports. Can you elaborate on the "no discard"?

I've attached a log of the entire backup job. The part that confuses me that you'll see is that it does appear to successfully freeze and thaw the VM, but everything chokes right as it starts trying to write the image. Then it will time out for the Windows VM and the rest will proceed normally. It didn't do this for the first day.
 

Attachments

As for discard click the text/link. Make sure fs-freeze is disabled for both the PBS guest as well as the guest providing the SMB share. The option should not be orange in the GUI. I'd back them up separately. All the USB IDs are the same. Maybe it makes sense to pass the whole port here.
 
Last edited:
So, I discovered another odd thing this morning. I can't enter the UEFI BIOS for that Windows VM now either as I get an error saying the device can't be found, even though it's there and was created when the VM was built. I'm beginning to think that despite only being a couple of days old, something's busted with this VM as a whole. I'm going to do an experiment tonight where I create a fresh Windows 11 VM with a file share and direct the backups there and see if anything changes. If so, I might just try building a fresh VM, reassigning the OS disk and see if anything changes.

I'll post my findings here. Promise, I won't be one of those people that says I'll update a thread and then doesn't. I hate when that happens. :)
 
  • Like
Reactions: Impact
Alright, I've officially given up on this issue. I tried creating a fresh Windows VM and relocating the disk to it and that made zero difference. I've also discovered that every time either the PBS or PVE servers are rebooted, the SMB link breaks and the only way to reestablish it is to remove and readd the datastore to PBS. I can't really be mad about it because this was a wholly unorthodox solution I was attempting. It's just so weird to me why it worked fine for a day, but one of life's mysteries I guess.

What I've done instead is just hooked up another external hard drive that will be PBS' exclusive repository. Since this is a home lab and these backups are really just for rollback purposes, this is sufficient. If the external drive dies, I'll just throw on another one. The data from my drive pool that needs to have multiple layers is already redundant within the pool itself, plus it goes to Backblaze. The only thing I'd want backed up independently of this is my Home Assistant and blog data. HA already backs up to the drive pool and I can setup a cron job on my blog VM to run an export from Ghost and copy that to the pool as well. The total size of all these VMs is like 700GB and they're all thin provisioned so it's much less than that. Doing the math, I'll have plenty of space for this, even with 14 days of retention as I have planned and it's easy enough to upgrade the drive later.

Impact, thank you for your help though, I appreciate you willing to offer it to someone who was trying to implement a really jank solution. Cheers mate!
 
  • Like
Reactions: Impact