VM stuck in fsfreeze after failed snapshot backup

vmuser@dotcom

New Member
May 29, 2025
6
0
1
Setup:


  • Proxmox PVE 8.4.1 (LVM-thin filesystem)
    – VM running Jenkins
  • Proxmox Backup Server 3.4.0
    – Used as backup storage
    – Backup job configured on PVE in snapshot mode

During one of the backup runs, a network error occurred and the backup process did not complete properly:
Bash:
INFO:  95% (66.9 GiB of 70.3 GiB) in 24m 17s, read: 48.3 KiB/s, write: 48.3 KiB/s
ERROR: backup write data failed: command error: protocol canceled
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 100 failed - backup write data failed: command error: protocol canceled
INFO: Failed at 2025-09-03 22:54:22
INFO: Backup job finished with errors
INFO: notified via target `mail-to-root`
TASK ERROR: job errors

At the start of the snapshot backup, fsfreeze was called inside the VM:

Bash:
Sep 03 22:30:04 jenkins qemu-ga[629]: info: guest-fsfreeze called

According to the log, the VM was “resumed” after the backup failure:

Bash:
INFO: resuming VM again

But in reality, the filesystem inside the guest never got unfrozen. The VM stayed in a frozen state and eventually crashed.


It looks like freeze and unfreeze are tied to the same script, and if the process fails, unfreeze is skipped despite the log showing “resumed.”


Questions:


  • Has anyone else encountered this mismatch (log says resumed, but guest is still frozen)?
  • Is there a way to modify the script or add a safeguard to guarantee fsfreeze --unfreeze is always executed even if the backup fails?
 
guest-agent log:

Sep 03 22:30:04 jenkins qemu-ga[629]: info: guest-fsfreeze called
Sep 04 09:20:48 jenkins qemu-ga[629]: info: guest-ping called
Sep 04 09:20:59 jenkins qemu-ga[629]: info: guest-ping called
Sep 04 09:21:09 jenkins qemu-ga[629]: info: guest-ping called
 
found finally backup log

Bash:
2025-09-03 22:30:03 INFO: Starting Backup of VM 100 (qemu)
2025-09-03 22:30:03 INFO: status = running
2025-09-03 22:30:03 INFO: VM Name: prod-jenkins
2025-09-03 22:30:03 INFO: include disk 'scsi0' 'iss1-lvt:vm-100-disk-0' 200G
2025-09-03 22:30:03 INFO: include disk 'scsi1' 'iss1-lvt:vm-100-disk-1' 1T
2025-09-03 22:30:04 INFO: backup mode: snapshot
2025-09-03 22:30:04 INFO: ionice priority: 7
2025-09-03 22:30:04 INFO: creating Proxmox Backup Server archive 'vm/100/2025-09-03T19:30:03Z'
2025-09-03 22:30:04 INFO: issuing guest-agent 'fs-freeze' command
2025-09-03 22:30:05 INFO: issuing guest-agent 'fs-thaw' command
2025-09-03 22:30:05 INFO: started backup task '868fb741-5998-4066-bee5-04c0f4556a82'
2025-09-03 22:30:05 INFO: resuming VM again
2025-09-03 22:30:05 INFO: scsi0: dirty-bitmap status: OK (872.0 MiB of 200.0 GiB dirty)                                                                      2025-09-03 22:30:05 INFO: scsi1: dirty-bitmap status: OK (69.5 GiB of 1.0 TiB dirty)
2025-09-03 22:30:05 INFO: using fast incremental mode (dirty-bitmap), 70.3 GiB dirty of 1.2 TiB total                                                        2025-09-03 22:30:08 INFO:   1% (740.0 MiB of 70.3 GiB) in 3s, read: 246.7 MiB/s, write: 246.7 MiB/s
2025-09-03 22:30:11 INFO:   2% (1.5 GiB of 70.3 GiB) in 6s, read: 257.3 MiB/s, write: 257.3 MiB/s
2025-09-03 22:30:15 INFO:   3% (2.1 GiB of 70.3 GiB) in 10s, read: 169.0 MiB/s, write: 169.0 MiB/s
2025-09-03 22:30:18 INFO:   4% (2.9 GiB of 70.3 GiB) in 13s, read: 268.0 MiB/s, write: 265.3 MiB/s
2025-09-03 22:30:21 INFO:   5% (3.8 GiB of 70.3 GiB) in 16s, read: 308.0 MiB/s, write: 306.7 MiB/s
2025-09-03 22:30:24 INFO:   6% (4.3 GiB of 70.3 GiB) in 19s, read: 176.0 MiB/s, write: 174.7 MiB/s
2025-09-03 22:30:27 INFO:   7% (5.0 GiB of 70.3 GiB) in 22s, read: 224.0 MiB/s, write: 222.7 MiB/s
2025-09-03 22:30:31 INFO:   8% (5.7 GiB of 70.3 GiB) in 26s, read: 189.0 MiB/s, write: 188.0 MiB/s
2025-09-03 22:30:36 INFO:   9% (6.6 GiB of 70.3 GiB) in 31s, read: 171.2 MiB/s, write: 171.2 MiB/s
.....

2025-09-03 22:38:42 INFO:  94% (66.1 GiB of 70.3 GiB) in 8m 37s, read: 185.0 MiB/s, write: 183.0 MiB/s
2025-09-03 22:38:49 INFO:  95% (66.9 GiB of 70.3 GiB) in 8m 44s, read: 110.3 MiB/s, write: 102.9 MiB/s
2025-09-03 22:54:22 INFO:  95% (66.9 GiB of 70.3 GiB) in 24m 17s, read: 48.3 KiB/s, write: 48.3 KiB/s
2025-09-03 22:54:22 ERROR: backup write data failed: command error: protocol canceled
2025-09-03 22:54:22 INFO: aborting backup job
2025-09-03 22:54:22 INFO: resuming VM again
2025-09-03 22:54:22 ERROR: Backup of VM 100 failed - backup write data failed: command error: protocol canceled


vm was unfreezed but why it crushed ?
vm log is full with
sd ... timing out command, I/O error, dev sda|sdb, Buffer I/O error
 
Last edited:
Issue solved, carefully read man vzdump
in my situation network failure during backup was disaster, fleecing option from vzdump solved situation
 
Hello all,

pve-manager/9.0.5
Backup Server 3.4.6

I am having a similar issue with a backup with fleecing option. In my case, I know that during the time of backup, I had a brief network issue and PVE was not able to fully communicate with PBS during this backup run. This has left the VM in PVE locked and a fleece file in place and I'm not sure the safest way to remove it.

snippet from backup log:

INFO: creating Proxmox Backup Server archive 'vm/10000/2025-09-07T20:01:16Z'
INFO: drive-virtio0: attaching fleecing image data_pool:vm-10000-fleece-0 to QEMU
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'xxx'
INFO: resuming VM again
INFO: virtio0: dirty-bitmap status: OK (156.6 GiB of 200.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 156.6 GiB dirty of 200.0 GiB total
INFO: 0% (536.0 MiB of 156.6 GiB) in 3s, read: 178.7 MiB/s, write: 176.0 MiB/s
INFO: 1% (1.6 GiB of 156.6 GiB) in 9s, read: 182.0 MiB/s, write: 164.0 MiB/s

.........

INFO: 100% (156.6 GiB of 156.6 GiB) in 7m 17s, read: 677.3 MiB/s, write: 0 B/s
INFO: Waiting for server to finish backup validation...
INFO: backup is sparse: 116.83 GiB (74%) total zero data
INFO: backup was done incrementally, reused 166.30 GiB (83%)
INFO: transferred 156.64 GiB in 440 seconds (364.5 MiB/s)
INFO: adding notes to backup
unable to open file '/etc/pve/nodes/XXX/qemu-server/1000.conf.tmp.1195655' - Permission denied
WARN: attempt to clean up fleecing images failed - unable to open file '/etc/pve/nodes/XXXX/qemu-server/10000.conf.tmp.1195655' - Permission denied
INFO: Finished Backup of VM 10000 (00:07:24)
INFO: Backup finished at 2025-09-07 22:08:40
INFO: Backup job finished successfully

Remaining Fleecing file, based on size, is the default created:
data/vm-10000-fleece-0 89.5K 89.5K 9.99T

ls: cannot access '/etc/pve/nodes/XXX/qemu-server/10000.conf.tmp.1195655': No such file or directory

Any guidance would be much appreciated :)
 
Last edited:
Is VM file system locked ?
When we faced such issue we got FS locked and 600K errors on vm disk and decided to reboot vm, unmount drive and fix errors
I think fleecing saved your vm FS
 
Last edited:
Is VM file system locked ?
When we faced such issue we got FS locked and 600K errors on vm disk and decided to reboot vm, unmount drive and fix errors
I think fleecing saved your vm FS

Nope, the fs was unlocked and VM operations continue per normal. The backup log indicates this:

INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'xxx'
INFO: resuming VM again
 
check /var/log/vzdump/qemu-xxxx.log
where xxxx is your vm_name, it is more informative

we got also during failed backup

2025-09-03 22:30:04 INFO: issuing guest-agent 'fs-freeze' command
2025-09-03 22:30:05 INFO: issuing guest-agent 'fs-thaw' command


but FS was frozen be course vzdump writing all changes first to the dest backup server and after it to local.
I think some process is locking fleecing image in your situation
 
Last edited: