[SOLVED] how do I remove a ceph vzdump snapshot

RobFantini

Famous Member
May 24, 2012
2,085
118
133
Boston,Mass
snapshot backup for one of many lxc fails with:
Code:
2446: Aug 23 21:13:43 ERROR: Backup of VM 2446 failed - rbd snapshot 'vm-2446-disk-1' error: rbd: failed to create snapshot: (17) File exists

so I can list the snapshot
Code:
sys3  ~ # rbd --pool ceph-lxc snap ls vm-2446-disk-1
SNAPID NAME       SIZE
   124 vzdump 12288 MB

reading the manpage I still have not been able to figure out how to remove the snap.
Code:
sys3  ~ # rbd --pool ceph-lxc snap rm  vm-2446-disk-1 vzdump
rbd: too many arguments
sys3  ~ # rbd --pool ceph-lxc snap rm  vzdump
rbd: snap name was not specified
sys3  ~ # rbd --pool ceph-lxc snap rm  124
rbd: snap name was not specified

Could someone please suggest the correct syntax to 'snap rm' ?


thanks
 
  • Like
Reactions: fips
I had a similar issue yesterday.

as far as i recall - last week the host was restarted wile backups were in progress.

future backups of that vm failed. i got it fixed , this info may help someone else or not

Code:
[*]INFO: starting new backup job: vzdump 9001 --compress lzo --remove 0 --storage bkup-nfs --node sys8 --mode snapshot
[*]INFO: Starting Backup of VM 9001 (lxc)
[*]INFO: status = running
[*]INFO: CT Name: p4test[*]INFO: found old vzdump snapshot (force removal)[*]rbd: sysfs write failed
can't unmap rbd volume vm-9001-disk-1: rbd: sysfs write failed
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
no lock found trying to remove 'backup'  lock
ERROR: Backup of VM 9001 failed - rbd snapshot 'vm-9001-disk-1' error: rbd: failed to create snapshot: (17) File exists
INFO: Backup job finished with errors
TASK ERROR: job errors
so snapshot already exists solved by doing this as earlier part of thread showed.
Code:
# rbd snap rm  ceph/vm-9001-disk-1@vzdump
Removing snap: 100% complete...done.
this time that did not completely solve the issue as the mount point symlink also got in the way
Code:
[*]INFO: starting new backup job: vzdump 9001 --compress lzo --remove 0 --storage bkup-nfs --node sys8 --mode snapshot]
[*]INFO: Starting Backup of VM 9001 (lxc)
[*]INFO: status = running
[*]INFO: CT Name: p4test
[*]INFO: found old vzdump snapshot (force removal)
[*]rbd: sysfs write failed
can't unmap rbd volume vm-9001-disk-1: rbd: sysfs write failed
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
snapshot create failed: starting cleanup
no lock found trying to remove 'backup'  lock
ERROR: Backup of VM 9001 failed - rbd snapshot 'vm-9001-disk-1' error: rbd: failed to create snapshot: (17) File exists
INFO: Backup job finished with errors
TASK ERROR: job errors
solved with removing snapshot again then
Code:
# ls -lRa /dev/rbd/ceph
/dev/rbd/ceph:
total 0
drwxr-xr-x 2 root root 120 Sep  1 15:13 ./
drwxr-xr-x 3 root root  60 Aug 25 04:51 ../
lrwxrwxrwx 1 root root  10 Aug 25 04:54 vm-213-disk-1 -> ../../rbd1
lrwxrwxrwx 1 root root  10 Aug 25 15:14 vm-7520-disk-1 -> ../../rbd2
lrwxrwxrwx 1 root root  10 Aug 23 19:18 vm-9001-disk-1 -> ../../rbd0
lrwxrwxrwx 1 root root  10 Aug 25 15:14 vm-9001-disk-1\@vzdump -> ../../rbd3

# rm /dev/rbd/ceph/vm-9001-disk-1\@vzdump
/bin/rm: remove symbolic link '/dev/rbd/ceph/vm-9001-disk-1@vzdump'? y

I am not certain those notes are perfect. they are at least hints of things to check for if a backup got interrupted by a system reboot during a backup and left behind a snapshot and mount point.
 
Last edited:
it turns out the backup did not work.
Code:
INFO: starting new backup job: vzdump 9001 --remove 0 --compress lzo --mode snapshot --storage bkup-nfs --node sys8
INFO: Starting Backup of VM 9001 (lxc)
INFO: status = running
INFO: CT Name: p4test
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
/dev/rbd4
INFO: creating archive '/mnt/pve/bkup-nfs/dump/vzdump-lxc-9001-2018_09_01-21_27_31.tar.lzo'
INFO: remove vzdump snapshot
rbd: sysfs write failed
can't unmap rbd volume vm-9001-disk-1: rbd: sysfs write failed
ERROR: Backup of VM 9001 failed - command 'set -o pipefail && tar cpf - --totals --one-file-system -p --sparse --numeric-owner --acls --xattrs '--xattrs-include=user.*'
'--xattrs-include=security.capability' '--warning=no-file-ignored' '--warning=no-xattr-write' --one-file-system '--warning=no-file-ignored' '--directory=/bkup/vzdumptmp1154657' ./etc/vzdump/pct.conf
'--directory=/mnt/vzsnap0' --no-anchored '--exclude=lost+found' --anchored ./ | lzop
>/mnt/pve/bkup-nfs/dump/vzdump-lxc-9001-2018_09_01-21_27_31.tar.dat' failed:
interrupted by signal
INFO: Backup job finished with errors
TASK ERROR: job errors
Sun Sep 2 11:09:39 EDT 2018
 
Last edited:
at console on node. this may be related to 'operator doing something wrong' or a bug .
Code:
[1416690.813983]
                 Assertion failure in rbd_queue_workfn() at line 4035:
                 
                        rbd_assert(op_type == OBJ_OP_READ || rbd_dev->spec->snap_id == CEPH_NOSNAP);

[1416690.815887] ------------[ cut here ]------------
[1416690.816243] kernel BUG at drivers/block/rbd.c:4035!
[1416690.816677] invalid opcode: 0000 [#2] SMP PTI
[1416690.817066] Modules linked in: udp_diag tcp_diag inet_diag ipt_REJECT nf_reject_ipv4 xt_multiport veth rbd libceph nfsv3 nfs_acl nfs lockd grace fscache ip_set ip6table_filter ip6_tables xfs libcrc32c
iptable_filter bonding lz4 lz4_compress softdog binfmt_misc nfnetlink_log nfnetlink intel_rapl sb_edac ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul
crc32_pclmul ghash_clmulni_intel pcbc mxm_wmi mgag200 ttm aesni_intel drm_kms_helper aes_x86_64 crypto_simd glue_helper drm cryptd snd_pcm snd_timer snd intel_cstate i2c_algo_bit soundcore fb_sys_fops
syscopyarea input_leds joydev sysfillrect intel_rapl_perf pcspkr sysimgblt lpc_ich mei_me mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi mac_hid sch_fq_codel vhost_net vhost tap ib_iser
rdma_cm
[1416690.820633]  iw_cm ib_cm ib_core sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) btrfs xor
zstd_compress raid6_pq hid_generic usbmouse usbkbd usbhid hid ixgbe mdio ahci i2c_i801 libahci isci libsas mpt3sas igb(O) raid_class dca scsi_transport_sas ptp pps_core
[1416690.822443] CPU: 29 PID: 478894 Comm: kworker/29:0 Tainted: P      D    O     4.15.18-1-pve #1
[1416690.823064] Hardware name: Supermicro X9DR3-F/X9DR3-F, BIOS 3.2a 07/09/2015
[1416690.823683] Workqueue: rbd rbd_queue_workfn [rbd]
[1416690.824313] RIP: 0010:rbd_queue_workfn+0x462/0x4f0 [rbd]
[1416690.824980] RSP: 0018:ffffbb66ee87be18 EFLAGS: 00010286
[1416690.825669] RAX: 0000000000000086 RBX: ffff9b58805c2800 RCX: 0000000000000006
[1416690.826396] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff9b511fcd6490
[1416690.827104] RBP: ffffbb66ee87be60 R08: 0000000000000000 R09: 000000000000085e
[1416690.827765] R10: 0000000000000254 R11: 00000000ffffffff R12: ffff9b508adc1140
[1416690.828435] R13: ffff9b5322897080 R14: 0000000000000000 R15: 0000000000001000
[1416690.829133] FS:  0000000000000000(0000) GS:ffff9b511fcc0000(0000) knlGS:0000000000000000
[1416690.829874] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[1416690.830592] CR2: 000055921bdc5448 CR3: 0000000929e0a003 CR4: 00000000001626e0
[1416690.831317] Call Trace:
[1416690.832026]  ? __schedule+0x3e8/0x870
[1416690.832712]  process_one_work+0x1e0/0x400
[1416690.833444]  worker_thread+0x4b/0x420
[1416690.834195]  kthread+0x105/0x140
[1416690.834888]  ? process_one_work+0x400/0x400
[1416690.835594]  ? kthread_create_worker_on_cpu+0x70/0x70
[1416690.836311]  ret_from_fork+0x35/0x40
[1416690.836998] Code: 00 48 83 78 20 fe 0f 84 6a fc ff ff 48 c7 c1 a8 18 de c0 ba c3 0f 00 00 48 c7 c6 b0 2c de c0 48 c7 c7 90 0d de c0 e8 ae 1c f1 cb <0f> 0b 48 8b 75 d0 4d 89 d0 44 89 f1 4c 89 fa 48 89
df 4c 89 55
[1416690.838481] RIP: rbd_queue_workfn+0x462/0x4f0 [rbd] RSP: ffffbb66ee87be18
[1416690.839258] ---[ end trace e2df66044f68ca99 ]---
from https pve i tried to restart node . that did not work. had to use ipmi reset.

after that
Code:
# pct start 9001
CT is locked (snapshot-delete)

sys8  ~ # pct unlock 9001
sys8  ~ # pct start 9001

* it took 1-2 minutes for start to work.
* try a backup. - Worked.