Backup error - zfs error: cannot create snapshot dataset already exists

Hugo Matos

Member
Mar 9, 2016
3
1
8
47
Hi,

Does anyone have this error?

INFO: starting new backup job: vzdump 130 --compress lzo --node local02 --storage local --mode snapshot --remove 0
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp28291 for temporary files
INFO: Starting Backup of VM 130 (lxc)
INFO: status = running
INFO: CT Name:
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
no lock found trying to remove 'backup' lock
ERROR: Backup of VM 130 failed - zfs error: cannot create snapshot 'rpool/data/subvol-130-disk-2@vzdump': dataset already exists
INFO: Backup job finished with errors
TASK ERROR: job errors

I have no backup in the Backup tab but I have a vzdump in rpool/data

root@local02:~# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
rpool/data/subvol-130-disk-2@vzdump 137G - 246G -


Does anyone know how to solve this?

Thanks
 

Attachments

  • Screenshot from 2019-03-26 12-15-33.png
    Screenshot from 2019-03-26 12-15-33.png
    21.4 KB · Views: 45
You probably have an incomplete backup (something happened or you did something during a previous backup).

Just remove it with
Code:
zfs destroy rpool/data/subvol-130-disk-2@vzdump
and then try to backup again.
 
Hi,
I had the same problem on PBS and I simply solved by destroying the incomplete backup.

But the real problem is that BPS did not say anything about this problem in the log neither in report mails : I just discovered it by hand checking the PBS backup list. Is there a way to have an alert when something equivalent happened ?

Thank PROXMOX you for PBS which is a very nice piece of software and will certainly replace some of my backup scripts :)
 
the above does not fix it for me:

root@proxmox4:~# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
RAIDZ/vm-200-disk-0@__migration__ 21.5G - 80.8G -

root@proxmox4:~# zfs destroy rpool/data/RAIDZ/vm-200-disk-0@__migration__
cannot open 'rpool/data/RAIDZ/vm-200-disk-0': dataset does not exist
root@proxmox4:~#

What did I do wrong?
 
  • Like
Reactions: Moses93
the above does not fix it for me:

root@proxmox4:~# zfs list -t snapshot
NAME USED AVAIL REFER MOUNTPOINT
RAIDZ/vm-200-disk-0@__migration__ 21.5G - 80.8G -

root@proxmox4:~# zfs destroy rpool/data/RAIDZ/vm-200-disk-0@__migration__
cannot open 'rpool/data/RAIDZ/vm-200-disk-0': dataset does not exist
root@proxmox4:~#

What did I do wrong?

you need to do zfs destroy RAIDZ/vm-200-disk-0@__migration__ since the name of the pool is different
 
  • Like
Reactions: Curt Hall
I ran into this issue on a backup with multiple disks, which each had the @vzdump snapshot. I was able to delete all of them using:
Code:
zfs destroy -r tank@vzdump

Maybe there could be an option to automatically destroy any existing @vzdump snapshot prior to backup. I've had this happen several times.
 
  • Like
Reactions: Gilles
You probably have an incomplete backup (something happened or you did something during a previous backup).

Just remove it with
Code:
zfs destroy rpool/data/subvol-130-disk-2@vzdump
and then try to backup again.

can the remnants of an incomplete backup be automatically removed? Or how do I automate/script to figure out if there are any such remnants that should be removed?
 
Hi,
can the remnants of an incomplete backup be automatically removed? Or how do I automate/script to figure out if there are any such remnants that should be removed?
it should only be possible after a hard failure where cleanup couldn't complete and it will be removed when the next backup is taken.

If that is not enough, you need some script to check for the vzdump snapshot on the volumes and in container configs and remove those, making sure it doesn't belong to an active backup of course!
 
Im sorry im bumping this but i have issue with failed backup of my container:

INFO: including mount point rootfs ('/') in backup
INFO: found old vzdump snapshot (force removal)
zfs error: cannot destroy snapshot rpool/data/subvol-226-disk-0@vzdump: dataset is busy
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
no lock found trying to remove 'backup' lock
ERROR: Backup of VM 226 failed - zfs error: cannot create snapshot 'rpool/data/subvol-226-disk-0@vzdump': dataset already exists
INFO: Failed at 2025-02-17 02:03:19

real Issue is i cant destroy rpool/data/subvol-226-disk-0@vzdump

root@p40:~# zfs destroy -r rpool/data/subvol-226-disk-0@vzdump
cannot destroy snapshot rpool/data/subvol-226-disk-0@vzdump: dataset is busy

Not even turning off container and trying to destroy dataset at that point works and rebooting host is just not feasible... any other suggestions?
 
Not directly helping, but possibly help avoid getting into such a situation again:

1. Nowadays, I separate out my data and back that up using rsync in as-is form, i.e. not as an image
2. I script the setup of most, if not all of my applications

It is more managable instead of large sized images which can't be picked apart easily
 
Last edited:
Hi,
Im sorry im bumping this but i have issue with failed backup of my container:

INFO: including mount point rootfs ('/') in backup
INFO: found old vzdump snapshot (force removal)
zfs error: cannot destroy snapshot rpool/data/subvol-226-disk-0@vzdump: dataset is busy
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
no lock found trying to remove 'backup' lock
ERROR: Backup of VM 226 failed - zfs error: cannot create snapshot 'rpool/data/subvol-226-disk-0@vzdump': dataset already exists
INFO: Failed at 2025-02-17 02:03:19

real Issue is i cant destroy rpool/data/subvol-226-disk-0@vzdump

root@p40:~# zfs destroy -r rpool/data/subvol-226-disk-0@vzdump
cannot destroy snapshot rpool/data/subvol-226-disk-0@vzdump: dataset is busy

Not even turning off container and trying to destroy dataset at that point works and rebooting host is just not feasible... any other suggestions?
please post the output of pveversion -v and zpool status -v.
 
root@p40:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
pve-kernel-5.15: 7.4-14
proxmox-kernel-6.8: 6.8.12-1
proxmox-kernel-6.8.12-1-pve-signed: 6.8.12-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
pve-kernel-5.4: 6.4-20
pve-kernel-5.15.158-1-pve: 5.15.158-1
pve-kernel-5.15.116-1-pve: 5.15.116-1
pve-kernel-5.4.203-1-pve: 5.4.203-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 16.2.11+ds-2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown: residual config
ifupdown2: 3.2.0-1+pmx9
intel-microcode: 3.20231114.1~deb11u1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.2
libpve-guest-common-perl: 5.1.4
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libqb0: 1.0.5-1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.5.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.3
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.13-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.2-2
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.4
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

root@p40:~# zpool status -v
pool: rpool
state: ONLINE
status: Some supported and requested features are not enabled on the pool.
The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: scrub repaired 0B in 03:04:12 with 0 errors on Sun Feb 9 03:28:14 2025
config:

NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ata-VK1600GEYJU_BTWA5432053U1P6KGN-part3 ONLINE 0 0 0
ata-VK1600GEYJU_BTWA542203H31P6KGN-part3 ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
ata-VK1600GEYJU_BTWA540202NC1P6KGN ONLINE 0 0 0
ata-VK1600GEYJU_BTWA543200X81P6KGN ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
ata-VK1600GEYJU_BTWA54430A2N1P6KGN ONLINE 0 0 0
ata-VK1600GEYJU_BTWA544307EY1P6KGN ONLINE 0 0 0
mirror-3 ONLINE 0 0 0
ata-VK1600GEYJU_BTWA543200X21P6KGN ONLINE 0 0 0
ata-VK1600GEYJU_BTWA5443057Q1P6KGN ONLINE 0 0 0
mirror-4 ONLINE 0 0 0
ata-VK1600GEYJU_BTWA54430A3A1P6KGN ONLINE 0 0 0
ata-VK1600GEYJU_BTWA543201LW1P6KGN ONLINE 0 0 0
mirror-5 ONLINE 0 0 0
ata-VK1600GEYJU_BTWA5412053L1P6KGN ONLINE 0 0 0
ata-VK1600GEYJU_BTWA542202L01P6KGN ONLINE 0 0 0

errors: No known data errors
 
Can you check if there is still a backup task for the container running, e.g. ps faxl | grep vzdump? What is the output of fgrep -e vzsnap -e vzdump /proc/*/mounts?
 
Can you check if there is still a backup task for the container running, e.g. ps faxl | grep vzdump? What is the output of fgrep -e vzsnap -e vzdump /proc/*/mounts?

Nothing seems to be running. I was also checking this things first.

root@p40:~# ps faxl | grep vzdump
0 0 3964496 3964410 20 0 8860 1280 pipe_r S+ pts/4 0:00 \_ grep vzdump

root@p40:~# fgrep -e vzsnap -e vzdump /proc/*/mounts


but there is some unmount vzsnap process running if this could matter.

root@p40:~# ps faxl | grep vzsnap
0 0 3972010 3964410 20 0 8860 1280 pipe_r S+ pts/4 0:00 | \_ grep vzsnap
4 0 1807729 1 20 0 4032 1920 taskq_ D ? 0:07 umount -l -d /mnt/vzsnap0/
 
  • Like
Reactions: fiona
Nothing seems to be running. I was also checking this things first.

root@p40:~# ps faxl | grep vzdump
0 0 3964496 3964410 20 0 8860 1280 pipe_r S+ pts/4 0:00 \_ grep vzdump

root@p40:~# fgrep -e vzsnap -e vzdump /proc/*/mounts


but there is some unmount vzsnap process running if this could matter.

root@p40:~# ps faxl | grep vzsnap
0 0 3972010 3964410 20 0 8860 1280 pipe_r S+ pts/4 0:00 | \_ grep vzsnap
4 0 1807729 1 20 0 4032 1920 taskq_ D ? 0:07 umount -l -d /mnt/vzsnap0/
Yes, that very likely matters. This is where the snapshot is/was mounted. However, the task is in uninterruptible D state unfortunately, meaning it is blocked on IO or something low-level in the kernel. Does it work if you run
Code:
umount -d /mnt/vzsnap0/
?
 
In addition to the test @fiona suggested - could you please also check if you have any messages from zfs/kernel which indicate why the umount hangs?
* `dmesg`
* `journalctl -b` (or restrict it to the time when the backup originally started with `--since` - see `man journalctl`)

Thanks!
 
  • Like
Reactions: fiona
What could also be interesting is the output of grep '' /proc/1807729/task/*/stack. Best to get this before attempting the umount command.
 
Last edited:
What could also be interesting is the output of grep '' /proc/1807729/task/*/stack. Best to get this before attempting the umount command.

root@p40:/mnt# grep '' /proc/1807729/task/*/stack
[<0>] taskq_wait_outstanding+0xc4/0x110 [spl]
[<0>] arc_remove_prune_callback+0xaf/0x100 [zfs]
[<0>] zfs_umount+0x2d/0x110 [zfs]
[<0>] zpl_put_super+0x2c/0x50 [zfs]
[<0>] generic_shutdown_super+0x7c/0x180
[<0>] kill_anon_super+0x18/0x50
[<0>] zpl_kill_sb+0x1a/0x30 [zfs]
[<0>] deactivate_locked_super+0x35/0xc0
[<0>] deactivate_super+0x46/0x60
[<0>] cleanup_mnt+0xc6/0x170
[<0>] __cleanup_mnt+0x12/0x20
[<0>] task_work_run+0x61/0xa0
[<0>] syscall_exit_to_user_mode+0x25a/0x260
[<0>] do_syscall_64+0x8d/0x170
[<0>] entry_SYSCALL_64_after_hwframe+0x78/0x80


This error is i think from failed backup in dmesg:

[Sun Feb 16 02:51:51 2025] BUG: unable to handle page fault for address: 0000000200000000
[Sun Feb 16 02:51:51 2025] #PF: supervisor instruction fetch in kernel mode
[Sun Feb 16 02:51:51 2025] #PF: error_code(0x0010) - not-present page
[Sun Feb 16 02:51:51 2025] PGD 0 P4D 0
[Sun Feb 16 02:51:51 2025] Oops: 0010 [#1] PREEMPT SMP PTI
[Sun Feb 16 02:51:51 2025] CPU: 10 PID: 750 Comm: arc_prune Tainted: P O 6.8.8-1-pve #1
[Sun Feb 16 02:51:51 2025] Hardware name: Supermicro X9DRD-7LN4F(-JBOD)/X9DRD-EF/X9DRD-7LN4F, BIOS 3.3 08/23/2018
[Sun Feb 16 02:51:51 2025] RIP: 0010:0x200000000
[Sun Feb 16 02:51:51 2025] Code: Unable to access opcode bytes at 0x1ffffffd6.
[Sun Feb 16 02:51:51 2025] RSP: 0018:ffffafe5c1d47cd0 EFLAGS: 00010246
[Sun Feb 16 02:51:51 2025] RAX: 0000000200000000 RBX: ffff982bca734000 RCX: 0000000000000000
[Sun Feb 16 02:51:51 2025] RDX: 0000000000000000 RSI: ffffafe5c1d47d30 RDI: ffff9830b0605b80
[Sun Feb 16 02:51:51 2025] RBP: ffffafe5c1d47d80 R08: 0000000000000000 R09: 0000000000000000
[Sun Feb 16 02:51:51 2025] R10: 0000000000000000 R11: 0000000000000000 R12: ffffafe5c1d47d94
[Sun Feb 16 02:51:51 2025] R13: 0000000000031b63 R14: ffff9830b0605b80 R15: ffff982bca7340f8
[Sun Feb 16 02:51:51 2025] FS: 0000000000000000(0000) GS:ffff98483f400000(0000) knlGS:0000000000000000
[Sun Feb 16 02:51:51 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Feb 16 02:51:51 2025] CR2: 0000000200000000 CR3: 000000082ce36001 CR4: 00000000001726f0
[Sun Feb 16 02:51:51 2025] Call Trace:
[Sun Feb 16 02:51:51 2025] <TASK>
[Sun Feb 16 02:51:51 2025] ? show_regs+0x6d/0x80
[Sun Feb 16 02:51:51 2025] ? __die+0x24/0x80
[Sun Feb 16 02:51:51 2025] ? page_fault_oops+0x176/0x500
[Sun Feb 16 02:51:51 2025] ? spl_kmem_cache_free+0x137/0x1f0 [spl]
[Sun Feb 16 02:51:51 2025] ? kmem_cache_free+0x331/0x3f0
[Sun Feb 16 02:51:51 2025] ? do_user_addr_fault+0x2f9/0x6b0
[Sun Feb 16 02:51:51 2025] ? exc_page_fault+0x83/0x1b0
[Sun Feb 16 02:51:51 2025] ? asm_exc_page_fault+0x27/0x30
[Sun Feb 16 02:51:51 2025] ? zfs_prune+0xa4/0x4c0 [zfs]
[Sun Feb 16 02:51:51 2025] ? __schedule+0x409/0x15e0
[Sun Feb 16 02:51:51 2025] zpl_prune_sb+0x35/0x60 [zfs]
[Sun Feb 16 02:51:51 2025] arc_prune_task+0x22/0x40 [zfs]
[Sun Feb 16 02:51:51 2025] taskq_thread+0x282/0x4c0 [spl]
[Sun Feb 16 02:51:51 2025] ? finish_task_switch.isra.0+0x8c/0x310
[Sun Feb 16 02:51:51 2025] ? __pfx_default_wake_function+0x10/0x10
[Sun Feb 16 02:51:51 2025] ? __pfx_taskq_thread+0x10/0x10 [spl]
[Sun Feb 16 02:51:51 2025] kthread+0xf2/0x120
[Sun Feb 16 02:51:51 2025] ? __pfx_kthread+0x10/0x10
[Sun Feb 16 02:51:51 2025] ret_from_fork+0x47/0x70
[Sun Feb 16 02:51:51 2025] ? __pfx_kthread+0x10/0x10
[Sun Feb 16 02:51:51 2025] ret_from_fork_asm+0x1b/0x30
[Sun Feb 16 02:51:51 2025] </TASK>
[Sun Feb 16 02:51:51 2025] Modules linked in: iptable_nat nf_nat iptable_mangle uas usb_storage act_police cls_basic sch_ingress sch_htb 8021q garp mrp veth ebt_arp ebtable_filter ebtables ip6table_raw ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables iptable_raw xt_mac ipt_REJECT nf_reject_ipv4 xt_mark xt_set xt_physdev xt_addrtype xt_comment xt_multiport xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xt_tcpudp iptable_filter ip_set_hash_net ip_set sctp ip6_udp_tunnel udp_tunnel scsi_transport_iscsi nf_tables softdog sunrpc binfmt_misc bonding tls nfnetlink_log nfnetlink intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_ssif kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel crypto_simd cryptd rapl acpi_ipmi mei_me ipmi_si intel_cstate pcspkr mei mgag200 ipmi_devintf ioatdma ipmi_msghandler input_leds acpi_pad joydev mac_hid vhost_net vhost vhost_iotlb tap efi_pstore dmi_sysfs ip_tables x_tables
[Sun Feb 16 02:51:51 2025] autofs4 zfs(PO) spl(O) btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid0 mlx4_ib ib_uverbs ib_core mlx4_en raid1 ses enclosure hid_generic usbkbd usbmouse usbhid hid igb mpt3sas nvme i2c_algo_bit crc32_pclmul mlx4_core ahci dca raid_class ehci_pci nvme_core i2c_i801 libahci ehci_hcd lpc_ich i2c_smbus scsi_transport_sas nvme_auth wmi
[Sun Feb 16 02:51:51 2025] CR2: 0000000200000000
[Sun Feb 16 02:51:51 2025] ---[ end trace 0000000000000000 ]---
[Sun Feb 16 02:51:51 2025] RIP: 0010:0x200000000
[Sun Feb 16 02:51:51 2025] Code: Unable to access opcode bytes at 0x1ffffffd6.
[Sun Feb 16 02:51:51 2025] RSP: 0018:ffffafe5c1d47cd0 EFLAGS: 00010246
[Sun Feb 16 02:51:51 2025] RAX: 0000000200000000 RBX: ffff982bca734000 RCX: 0000000000000000
[Sun Feb 16 02:51:51 2025] RDX: 0000000000000000 RSI: ffffafe5c1d47d30 RDI: ffff9830b0605b80
[Sun Feb 16 02:51:51 2025] RBP: ffffafe5c1d47d80 R08: 0000000000000000 R09: 0000000000000000
[Sun Feb 16 02:51:51 2025] R10: 0000000000000000 R11: 0000000000000000 R12: ffffafe5c1d47d94
[Sun Feb 16 02:51:51 2025] R13: 0000000000031b63 R14: ffff9830b0605b80 R15: ffff982bca7340f8
[Sun Feb 16 02:51:51 2025] FS: 0000000000000000(0000) GS:ffff98483f400000(0000) knlGS:0000000000000000
[Sun Feb 16 02:51:51 2025] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sun Feb 16 02:51:51 2025] CR2: 0000000200000000 CR3: 000000082ce36001 CR4: 00000000001726f0
[Sun Feb 16 02:51:51 2025] note: arc_prune[750] exited with irqs disabled



root@p40:/mnt# umount -l -d /mnt/vzsnap0/
umount: /mnt/vzsnap0/: not mounted.

root@p40:/mnt# kill -9 1807729

root@p40:/mnt# ps faxl | grep vzsnap
0 0 15594 3964410 20 0 8860 1280 pipe_r S+ pts/4 0:00 | \_ grep vzsnap
4 0 1807729 1 20 0 4032 1920 taskq_ D ? 0:07 umount -l -d /mnt/vzsnap0/
 
  • Like
Reactions: fiona