Full backups after qcow snapshot?

gvs · Apr 4, 2025

I have a VM with large drives (2TB) which takes a long time to back up.
Lately, I've noticed that if I make and delete a snapshot of the VM (I do this for upgrades), it will trigger a full backup in PBS instead of the incremental snapshot and that doesn't finish inside of our backup window on weekdays...

Is this correct behavior or did I hit a bug? PBS 3.3 / PVE 8.3

Chris · Apr 4, 2025

Hi,
failed to reproduce your issue here. Taking a snapshot while the VM is running results in the dirty bitmap to be ok and only differential backups. Do you at some point of the process stop the VM? Are you always backing up to the same target (including namespace)? That would invalidate the bitmap.

david_tao · Apr 4, 2025

I was tested a VM with made and removed snapshot before doing backup, it's reported using "fast incremental mode".

Code:

INFO: starting new backup job: vzdump 90011 --notification-mode auto --notes-template '{{cluster}}, {{guestname}}, {{node}}, {{vmid}}' --remove 0 --storage PBS --node pve65 --mode snapshot
INFO: Starting Backup of VM 90011 (qemu)
INFO: Backup started at 2025-04-04 17:06:01
INFO: status = running
INFO: VM Name: WinSrv2025Std-x64
INFO: include disk 'scsi0' 'local-zfs2:vm-90011-disk-0' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/90011/2025-04-04T09:06:01Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '9b909c87-2a91-4dc3-9999-35ba87fedc50'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: OK (156.0 MiB of 100.0 GiB dirty)
INFO: using fast incremental mode (dirty-bitmap), 156.0 MiB dirty of 100.0 GiB total
INFO: 100% (156.0 MiB of 156.0 MiB) in 3s, read: 52.0 MiB/s, write: 46.7 MiB/s
INFO: Waiting for server to finish backup validation...
INFO: backup was done incrementally, reused 99.86 GiB (99%)
INFO: transferred 156.00 MiB in 5 seconds (31.2 MiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 90011 (00:00:06)
INFO: Backup finished at 2025-04-04 17:06:07
INFO: Backup job finished successfully
INFO: notified via target `mail-to-root`
TASK OK

I guess may your VM lost some backup between this two times full backup job, it because after the test end, I removed 5 backups (Incremental) belongs to the VM, and next time I do the backup again, the job was started as full backup, and the log reported "dirty-bitmap status: existing bitmap was invalid and has been cleared".

Code:

INFO: starting new backup job: vzdump 90011 --remove 0 --notification-mode auto --notes-template '{{cluster}}, {{guestname}}, {{node}}, {{vmid}}' --node pve65 --storage PBS --mode snapshot
INFO: Starting Backup of VM 90011 (qemu)
INFO: Backup started at 2025-04-04 17:08:52
INFO: status = running
INFO: VM Name: WinSrv2025Std-x64
INFO: include disk 'scsi0' 'local-zfs2:vm-90011-disk-0' 100G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/90011/2025-04-04T09:08:52Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task '7589188e-413c-43ab-97be-51d9a1c56926'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO:   2% (2.8 GiB of 100.0 GiB) in 3s, read: 956.0 MiB/s, write: 262.7 MiB/s
   .
   .
   .
INFO:  92% (92.2 GiB of 100.0 GiB) in 27s, read: 8.6 GiB/s, write: 0 B/s
INFO: 100% (100.0 GiB of 100.0 GiB) in 30s, read: 2.6 GiB/s, write: 1.3 MiB/s
INFO: Waiting for server to finish backup validation...
INFO: backup is sparse: 77.57 GiB (77%) total zero data
INFO: backup was done incrementally, reused 97.43 GiB (97%)
INFO: transferred 100.00 GiB in 31 seconds (3.2 GiB/s)
INFO: adding notes to backup
INFO: Finished Backup of VM 90011 (00:00:32)
INFO: Backup finished at 2025-04-04 17:09:24
INFO: Backup job finished successfully
INFO: notified via target `mail-to-root`
TASK OK

How about your VM backup log output?

gvs · Apr 4, 2025

Chris said:
Hi,
failed to reproduce your issue here. Taking a snapshot while the VM is running results in the dirty bitmap to be ok and only differential backups. Do you at some point of the process stop the VM? Are you always backing up to the same target (including namespace)? That would invalidate the bitmap.

Yes! I do stop the VM. Removing the snapshot of the 2TB disk times out if I don't
Is that the cause? Will a VM stop always trigger a full backup?

And yes, always same target, same settings

gvs · Apr 4, 2025

I get:
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/105/2025-04-03T22:00:03Z'
INFO: started backup task '45b88572-f119-4353-861d-64b7f3fd8165'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: created new
INFO: scsi1: dirty-bitmap status: created new

Chris · Apr 4, 2025

gvs said:
Is that the cause? Will a VM stop always trigger a full backup?

Yes, stopping the VM will always result in invalidating the dirty bitmap. There is an issue [0] which discusses possible improvement to avoid this limitation, but there is no way around this at the moment.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=3233

fluxX04 · Apr 4, 2025

gvs said:
Will a VM stop always trigger a full backup?

See reply above + it is still an incremental backup but the whole disk(s) needs to be read.

gvs · Apr 4, 2025

Chris said:
Yes, stopping the VM will always result in invalidating the dirty bitmap. There is an issue [0] which discusses possible improvement to avoid this limitation, but there is no way around this at the moment.

[0] https://bugzilla.proxmox.com/show_bug.cgi?id=3233

Then my core problem becomes. what is going wrong that I cannot remove a snapshot from large running VM's within the time limits

Chris · Apr 4, 2025

gvs said:
Then my core problem becomes. what is going wrong that I cannot remove a snapshot from large running VM's within the time limits

Please share the snapshot delete task log, the VM config obtained via qm config <VMID> --current as well as the storage config cat /etc/pve/storage.cfg.

gvs · Apr 4, 2025

Chris said:
Please share the snapshot delete task log, the VM config obtained via qm config <VMID> --current as well as the storage config cat /etc/pve/storage.cfg.

Config:
balloon: 0

Code:

boot: order=scsi0;ide2;net0
cores: 8
ide2: none,media=cdrom
memory: 49152
meta: creation-qemu=6.1.0,ctime=1639999078
name: zimbra
net0: virtio=B6:79:01:BF:8E:E6,bridge=vmbr0
numa: 1
onboot: 1
ostype: l26
parent: Ticket22893
scsi0: Pool1:105/vm-105-disk-0.qcow2,cache=writeback,size=20G
scsi1: Pool1:105/vm-105-disk-1.qcow2,cache=writeback,size=2T
scsihw: virtio-scsi-pci
smbios1: uuid=c3380739-06c2-4741-9fb3-866f07ea8285
sockets: 1
vmgenid: 135e71cb-bed8-419f-9b2d-343528b453c2

Storage:
dir: local

Code:

        path /var/lib/vz
 content vztmpl,iso,backup

lvmthin: local-lvm
 thinpool data
 vgname pve
 content rootdir,images

nfs: Pool2
 export /mnt/Pool1/VM
 path /mnt/pve/Pool2
 server 10.0.210.12
 content iso,images
 options vers=4.1
 prune-backups keep-all=1

zfspool: LocalZFS
 disable
 pool VM
 content rootdir,images
 mountpoint /VM
 nodes prox00
 sparse 0

nfs: Pool1
 export /mnt/Pool1/VM
 path /mnt/pve/Pool1
 server 10.0.210.11
 content iso,images
 options vers=4.1
 prune-backups keep-all=1

nfs: backup
 disable
 export /export/backup
 path /mnt/pve/backup
 server 192.168.210.71
 content iso,backup
 prune-backups keep-all=1

pbs: PBS
 datastore eindhoven
 server 142.132.150.159
 content backup
 fingerprint xxx
 prune-backups keep-all=1
 username xxxx

zfspool: localZFSpool
 pool dpool
 content images,rootdir
 nodes prox00
sparse 1

Log:
TASK ERROR: VM 105 qmp command 'blockdev-snapshot-delete-internal-sync' failed - got timeout

Chris · Apr 4, 2025

gvs said:

Config:
balloon: 0

Code:

boot: order=scsi0;ide2;net0
cores: 8
ide2: none,media=cdrom
memory: 49152
meta: creation-qemu=6.1.0,ctime=1639999078
name: zimbra
net0: virtio=B6:79:01:BF:8E:E6,bridge=vmbr0
numa: 1
onboot: 1
ostype: l26
parent: Ticket22893
scsi0: Pool1:105/vm-105-disk-0.qcow2,cache=writeback,size=20G
scsi1: Pool1:105/vm-105-disk-1.qcow2,cache=writeback,size=2T
scsihw: virtio-scsi-pci
smbios1: uuid=c3380739-06c2-4741-9fb3-866f07ea8285
sockets: 1
vmgenid: 135e71cb-bed8-419f-9b2d-343528b453c2

Storage:
dir: local

Code:

        path /var/lib/vz
 content vztmpl,iso,backup

lvmthin: local-lvm
 thinpool data
 vgname pve
 content rootdir,images

nfs: Pool2
 export /mnt/Pool1/VM
 path /mnt/pve/Pool2
 server 10.0.210.12
 content iso,images
 options vers=4.1
 prune-backups keep-all=1

zfspool: LocalZFS
 disable
 pool VM
 content rootdir,images
 mountpoint /VM
 nodes prox00
 sparse 0

nfs: Pool1
 export /mnt/Pool1/VM
 path /mnt/pve/Pool1
 server 10.0.210.11
 content iso,images
 options vers=4.1
 prune-backups keep-all=1

nfs: backup
 disable
 export /export/backup
 path /mnt/pve/backup
 server 192.168.210.71
 content iso,backup
 prune-backups keep-all=1

pbs: PBS
 datastore eindhoven
 server 142.132.150.159
 content backup
 fingerprint xxx
 prune-backups keep-all=1
 username xxxx

zfspool: localZFSpool
 pool dpool
 content images,rootdir
 nodes prox00
sparse 1

Log:
TASK ERROR: VM 105 qmp command 'blockdev-snapshot-delete-internal-sync' failed - got timeout

So what I feared, you are running the VM with the qcow2 on NFS. A possible workaround is https://forum.proxmox.com/threads/g...est-disks-on-local-storage.113854/post-494013

gvs · Apr 4, 2025

Chris said:
So what I feared, you are running the VM with the qcow2 on NFS. A possible workaround is https://forum.proxmox.com/threads/g...est-disks-on-local-storage.113854/post-494013

Yes... Thank you for the link, but architecturally, what is the best setup to avoid this for a cluster with TrueNAS servers?

Search

Search

Full backups after qcow snapshot?

gvs

New Member

Chris

Proxmox Staff Member

david_tao

Member

gvs

New Member

gvs

New Member

Chris

Proxmox Staff Member

fluxX04

Renowned Member

gvs

New Member

Chris

Proxmox Staff Member

gvs

New Member

Chris

Proxmox Staff Member

gvs

New Member

We value your privacy