[SOLVED] how do I remove a ceph vzdump snapshot

RobFantini

Famous Member
May 24, 2012
2,013
102
133
Boston,Mass
snapshot backup for one of many lxc fails with:
Code:
2446: Aug 23 21:13:43 ERROR: Backup of VM 2446 failed - rbd snapshot 'vm-2446-disk-1' error: rbd: failed to create snapshot: (17) File exists

so I can list the snapshot
Code:
sys3  ~ # rbd --pool ceph-lxc snap ls vm-2446-disk-1
SNAPID NAME       SIZE
   124 vzdump 12288 MB

reading the manpage I still have not been able to figure out how to remove the snap.
Code:
sys3  ~ # rbd --pool ceph-lxc snap rm  vm-2446-disk-1 vzdump
rbd: too many arguments
sys3  ~ # rbd --pool ceph-lxc snap rm  vzdump
rbd: snap name was not specified
sys3  ~ # rbd --pool ceph-lxc snap rm  124
rbd: snap name was not specified

Could someone please suggest the correct syntax to 'snap rm' ?


thanks
 
  • Like
Reactions: fips
I had a similar issue yesterday.

as far as i recall - last week the host was restarted wile backups were in progress.

future backups of that vm failed. i got it fixed , this info may help someone else or not

Code:
[*]INFO: starting new backup job: vzdump 9001 --compress lzo --remove 0 --storage bkup-nfs --node sys8 --mode snapshot
[*]INFO: Starting Backup of VM 9001 (lxc)
[*]INFO: status = running
[*]INFO: CT Name: p4test[*]INFO: found old vzdump snapshot (force removal)[*]rbd: sysfs write failed
can't unmap rbd volume vm-9001-disk-1: rbd: sysfs write failed
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
no lock found trying to remove 'backup'  lock
ERROR: Backup of VM 9001 failed - rbd snapshot 'vm-9001-disk-1' error: rbd: failed to create snapshot: (17) File exists
INFO: Backup job finished with errors
TASK ERROR: job errors
so snapshot already exists solved by doing this as earlier part of thread showed.
Code:
# rbd snap rm  ceph/vm-9001-disk-1@vzdump
Removing snap: 100% complete...done.
this time that did not completely solve the issue as the mount point symlink also got in the way
Code:
[*]INFO: starting new backup job: vzdump 9001 --compress lzo --remove 0 --storage bkup-nfs --node sys8 --mode snapshot]
[*]INFO: Starting Backup of VM 9001 (lxc)
[*]INFO: status = running
[*]INFO: CT Name: p4test
[*]INFO: found old vzdump snapshot (force removal)
[*]rbd: sysfs write failed
can't unmap rbd volume vm-9001-disk-1: rbd: sysfs write failed
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
no lock found trying to remove 'backup'  lock
ERROR: Backup of VM 9001 failed - rbd snapshot 'vm-9001-disk-1' error: rbd: failed to create snapshot: (17) File exists
INFO: Backup job finished with errors
TASK ERROR: job errors
solved with removing snapshot again then
Code:
# ls -lRa /dev/rbd/ceph
/dev/rbd/ceph:
total 0
drwxr-xr-x 2 root root 120 Sep  1 15:13 ./
drwxr-xr-x 3 root root  60 Aug 25 04:51 ../
lrwxrwxrwx 1 root root  10 Aug 25 04:54 vm-213-disk-1 -> ../../rbd1
lrwxrwxrwx 1 root root  10 Aug 25 15:14 vm-7520-disk-1 -> ../../rbd2
lrwxrwxrwx 1 root root  10 Aug 23 19:18 vm-9001-disk-1 -> ../../rbd0
lrwxrwxrwx 1 root root  10 Aug 25 15:14 vm-9001-disk-1\@vzdump -> ../../rbd3

# rm /dev/rbd/ceph/vm-9001-disk-1\@vzdump
/bin/rm: remove symbolic link '/dev/rbd/ceph/vm-9001-disk-1@vzdump'? y

I am not certain those notes are perfect. they are at least hints of things to check for if a backup got interrupted by a system reboot during a backup and left behind a snapshot and mount point.
 
Last edited:
it turns out the backup did not work.
Code:
INFO: starting new backup job: vzdump 9001 --remove 0 --compress lzo --mode snapshot --storage bkup-nfs --node sys8
INFO: Starting Backup of VM 9001 (lxc)
INFO: status = running
INFO: CT Name: p4test
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd4
INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-lxc-9001-2018_09_01-21_27_31.tar.lzo'
INFO: remove vzdump snapshot
rbd: sysfs write failed
can't unmap rbd volume vm-9001-disk-1: rbd: sysfs write failed
ERROR: Backup of VM 9001 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*'
'--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/bkup/vzdumptmp1154657' ./etc/vzdump/pct.conf
'--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored ./ | lzop
>/mnt/pve/bkup-nfs/dump/vzdump-lxc-9001-2018_09_01-21_27_31.tar.dat' failed:
interrupted by signal
INFO: Backup job finished with errors
TASK ERROR: job errors
Sun Sep 2 11:09:39 EDT 2018
 
Last edited:
at console on node. this may be related to 'operator doing something wrong' or a bug .
Code:
[1416690.813983]
                 Assertion failure in rbd_queue_workfn() at line 4035:
                 
                        rbd_assert(op_type == OBJ_OP_READ || rbd_dev->spec->snap_id == CEPH_NOSNAP);

[1416690.815887] ------------[ cut here ]------------
[1416690.816243] kernel BUG at drivers/block/rbd.c:4035!
[1416690.816677] invalid opcode: 0000 [#2] SMP PTI
[1416690.817066] Modules linked in: udp_diag tcp_diag inet_diag ipt_REJECT nf_reject_ipv4 xt_multiport veth rbd libceph nfsv3 nfs_acl nfs lockd grace fscache ip_set ip6table_filter ip6_tables xfs libcrc32c
iptable_filter bonding lz4 lz4_compress softdog binfmt_misc nfnetlink_log nfnetlink intel_rapl sb_edac ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel pcbc mxm_wmi mgag200 ttm aesni_intel drm_kms_helper aes_x86_64 crypto_simd glue_helper drm cryptd snd_pcm snd_timer snd intel_cstate i2c_algo_bit soundcore fb_sys_fops
syscopyarea input_leds joydev sysfillrect intel_rapl_perf pcspkr sysimgblt lpc_ich mei_me mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi mac_hid sch_fq_codel vhost_net vhost tap ib_iser
rdma_cm
[1416690.820633]  iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor
zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid ixgbe mdio ahci i2c_i801 libahci isci libsas mpt3sas igb(O) raid_class dca scsi_transport_sas ptp pps_core
[1416690.822443] CPU: 29 PID: 478894 Comm: kworker/29:0 Tainted: P      D    O     4.15.18-1-pve #1
[1416690.823064] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[1416690.823683] Workqueue: rbd rbd_queue_workfn [rbd]
[1416690.824313] RIP: 0010:rbd_queue_workfn+0x462/0x4f0 [rbd]
[1416690.824980] RSP: 0018:ffffbb66ee87be18 EFLAGS: 00010286
[1416690.825669] RAX: 0000000000000086 RBX: ffff9b58805c2800 RCX: 0000000000000006
[1416690.826396] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9b511fcd6490
[1416690.827104] RBP: ffffbb66ee87be60 R08: 0000000000000000 R09: 000000000000085e
[1416690.827765] R10: 0000000000000254 R11: 00000000ffffffff R12: ffff9b508adc1140
[1416690.828435] R13: ffff9b5322897080 R14: 0000000000000000 R15: 0000000000001000
[1416690.829133] FS:  0000000000000000(0000) GS:ffff9b511fcc0000(0000) knlGS:0000000000000000
[1416690.829874] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1416690.830592] CR2: 000055921bdc5448 CR3: 0000000929e0a003 CR4: 00000000001626e0
[1416690.831317] Call Trace:
[1416690.832026]  ? __schedule+0x3e8/0x870
[1416690.832712]  process_one_work+0x1e0/0x400
[1416690.833444]  worker_thread+0x4b/0x420
[1416690.834195]  kthread+0x105/0x140
[1416690.834888]  ? process_one_work+0x400/0x400
[1416690.835594]  ? kthread_create_worker_on_cpu+0x70/0x70
[1416690.836311]  ret_from_fork+0x35/0x40
[1416690.836998] Code: 00 48 83 78 20 fe 0f 84 6a fc ff ff 48 c7 c1 a8 18 de c0 ba c3 0f 00 00 48 c7 c6 b0 2c de c0 48 c7 c7 90 0d de c0 e8 ae 1c f1 cb <0f> 0b 48 8b 75 d0 4d 89 d0 44 89 f1 4c 89 fa 48 89
df 4c 89 55
[1416690.838481] RIP: rbd_queue_workfn+0x462/0x4f0 [rbd] RSP: ffffbb66ee87be18
[1416690.839258] ---[ end trace e2df66044f68ca99 ]---
from https pve i tried to restart node . that did not work. had to use ipmi reset.

after that
Code:
# pct start 9001
CT is locked (snapshot-delete)

sys8  ~ # pct unlock 9001
sys8  ~ # pct start 9001

* it took 1-2 minutes for start to work.
* try a backup. - Worked.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!