[SOLVED] Backup (vzsnap) fails after Update to ceph 17.2.6

ubu

Renowned Member
Nov 24, 2017
478
107
83
50
Since i updated my cluster yesterday backups fail with:

INFO: create storage snapshot 'vzdump' Creating snap: 10% complete...2023-06-01T11:44:38.520+0200 7f7b86ffd700 -1 librbd::SnapshotCreateRequest: failed to allocate snapshot id: (95) Operation not supported Creating snap: 10% complete...failed. snapshot create failed: starting cleanup

journal:
Jun 01 11:47:18 g1prox3 pvedaemon[16156]: <root@pam> starting task UPID:g1prox3:00094E7A:0002C413:64786926:vzdump:202191:root@pam: Jun 01 11:47:18 g1prox3 pvedaemon[609914]: INFO: starting new backup job: vzdump 202191 --notes-template '{{guestname}}' --mode snapshot --remove 0 --node g1prox3 --storage pbs-backup Jun 01 11:47:18 g1prox3 pvedaemon[609914]: INFO: Starting Backup of VM 202191 (lxc) Jun 01 11:47:19 g1prox3 pvedaemon[609914]: snapshot create failed: starting cleanup Jun 01 11:47:19 g1prox3 kernel: rbd: rbd0: object map is invalid Jun 01 11:47:19 g1prox3 pvedaemon[609914]: no lock found trying to remove 'backup' lock Jun 01 11:47:19 g1prox3 pvedaemon[609914]: ERROR: Backup of VM 202191 failed - rbd snapshot 'vm-202191-disk-0' error: Creating snap: 10% complete...failed. Jun 01 11:47:19 g1prox3 pvedaemon[609914]: INFO: Backup job finished with errors Jun 01 11:47:19 g1prox3 pvedaemon[609914]: job errors Jun 01 11:47:19 g1prox3 pvedaemon[16156]: <root@pam> end task UPID:g1prox3:00094E7A:0002C413:64786926:vzdump:202191:root@pam: job errors

Update Log:
Start-Date: 2023-05-31 18:20:48 Commandline: apt full-upgrade Install: proxmox-kernel-helper:amd64 (7.4-1, automatic), pve-kernel-5.15.107-2-pve:amd64 (5.15.107-2, automatic) Upgrade: librados2:amd64 (17.2.5-pve1, 17.2.6-pve1), libwebpmux3:amd64 (0.6.1-2.1, 0.6.1-2.1+deb11u1), ceph-fuse:amd64 (17.2.5-pve1, 17.2.6-pve1), ceph-volume:amd64 (17.2.5-pve1, 17.2.6-pve1), pve-firmware:amd64 (3.6-4, 3.6-5), ceph-mgr-modules-core:amd64 (17.2.5-pve1, 17.2.6-pve1), zfs-zed:amd64 (2.1.9-pve1, 2.1.11-pve1), libnode72:amd64 (12.22.12~dfsg-1~deb11u3, 12.22.12~dfsg-1~deb11u4), libnvpair3linux:amd64 (2.1.9-pve1, 2.1.11-pve1), ceph-base:amd64 (17.2.5-pve1, 17.2.6-pve1), python3-ceph-common:amd64 (17.2.5-pve1, 17.2.6-pve1), librbd1:amd64 (17.2.5-pve1, 17.2.6-pve1), pve-ha-manager:amd64 (3.6.0, 3.6.1), librgw2:amd64 (17.2.5-pve1, 17.2.6-pve1), libuutil3linux:amd64 (2.1.9-pve1, 2.1.11-pve1), ceph-common:amd64 (17.2.5-pve1, 17.2.6-pve1), libwebpdemux2:amd64 (0.6.1-2.1, 0.6.1-2.1+deb11u1), libzpool5linux:amd64 (2.1.9-pve1, 2.1.11-pve1), libssh-gcrypt-4:amd64 (0.9.5-1+deb11u1, 0.9.7-0+deb11u1), libpostproc55:amd64 (7:4.3.5-0+deb11u1, 7:4.3.6-0+deb11u1), proxmox-ve:amd64 (7.3-1, 7.4-1), ceph-mds:amd64 (17.2.5-pve1, 17.2.6-pve1), ceph-mgr:amd64 (17.2.5-pve1, 17.2.6-pve1), ceph-mon:amd64 (17.2.5-pve1, 17.2.6-pve1), ceph-osd:amd64 (17.2.5-pve1, 17.2.6-pve1), python3-cephfs:amd64 (17.2.5-pve1, 17.2.6-pve1), libcephfs2:amd64 (17.2.5-pve1, 17.2.6-pve1), nodejs:amd64 (12.22.12~dfsg-1~deb11u3, 12.22.12~dfsg-1~deb11u4), libavcodec58:amd64 (7:4.3.5-0+deb11u1, 7:4.3.6-0+deb11u1), libavutil56:amd64 (7:4.3.5-0+deb11u1, 7:4.3.6-0+deb11u1), libwebp6:amd64 (0.6.1-2.1, 0.6.1-2.1+deb11u1), libradosstriper1:amd64 (17.2.5-pve1, 17.2.6-pve1), libswscale5:amd64 (7:4.3.5-0+deb11u1, 7:4.3.6-0+deb11u1), distro-info-data:amd64 (0.51+deb11u3, 0.51+deb11u4), python3-rbd:amd64 (17.2.5-pve1, 17.2.6-pve1), python3-rgw:amd64 (17.2.5-pve1, 17.2.6-pve1), libssl1.1:amd64 (1.1.1n-0+deb11u4, 1.1.1n-0+deb11u5), ceph:amd64 (17.2.5-pve1, 17.2.6-pve1), libswresample3:amd64 (7:4.3.5-0+deb11u1, 7:4.3.6-0+deb11u1), nodejs-doc:amd64 (12.22.12~dfsg-1~deb11u3, 12.22.12~dfsg-1~deb11u4), pve-kernel-5.15:amd64 (7.4-1, 7.4-3), libzfs4linux:amd64 (2.1.9-pve1, 2.1.11-pve1), libavformat58:amd64 (7:4.3.5-0+deb11u1, 7:4.3.6-0+deb11u1), pve-firewall:amd64 (4.3-1, 4.3-2), libsqlite3-mod-ceph:amd64 (17.2.5-pve1, 17.2.6-pve1), python3-ceph-argparse:amd64 (17.2.5-pve1, 17.2.6-pve1), libpq5:amd64 (13.10-0+deb11u1, 13.11-0+deb11u1), zfsutils-linux:amd64 (2.1.9-pve1, 2.1.11-pve1), openssl:amd64 (1.1.1n-0+deb11u4, 1.1.1n-0+deb11u5), python3-rados:amd64 (17.2.5-pve1, 17.2.6-pve1), linux-libc-dev:amd64 (5.10.178-3, 5.10.179-1), libavfilter7:amd64 (7:4.3.5-0+deb11u1, 7:4.3.6-0+deb11u1) Remove: pve-kernel-helper:amd64 (7.3-3) End-Date: 2023-05-31 18:22:19
 
Can you create snapshots manually?
Code:
rbd -p {pool} snapshot create vm-202191-disk-0@testsnap
Any errors that might give more details?
 
Same error

root@g1prox2:~# rbd -p ceph_pool_ssd snap create vm-202179-disk-0@testsnap Creating snap: 10% complete...2023-06-01T15:09:40.864+0200 7fd577fff700 -1 librbd::SnapshotCreateRequest: failed to allocate snapshot id: (95) Operation not supported Creating snap: 10% complete...failed. rbd: failed to create snapshot: (95) Operation not supported
 
Could the excl LOCK be important

root@g1prox2:~# rbd ls $POOL -l NAME SIZE PARENT FMT PROT LOCK vm-202083-disk-0 300 GiB 2 vm-202084-disk-0 300 GiB 2 vm-202085-disk-0 300 GiB 2 vm-202178-disk-0 40 GiB 2 excl vm-202179-disk-0 30 GiB 2 excl
 
Could the excl LOCK be important
That should be active once the VM is running. And I can still create snapshots with it.

Does it only affect that one image or all of them?
 
All images on ceph can NOT be snapshoted, zfs based systems can be backuped
 
Last edited:
Did this happen right after / very quickly after you updated to 17.2.6?

I did find something similar on the Ceph User list, but there it affected an ec-pool. Could you try to create a new pool and see if snapshots work there?

I am also a bit confused, you have a Ceph FS running? On that same Ceph cluster? Did you create it manually? Because I don't see the *_data and *_metadata pools that should be there if you created it with the PVE tooling.
 
1. Yes it apperead right after the update, i found out the next mornig when the nightly backup had failed.

2. I did find that also.

3. Yes the cephs pool was done manually (a loog time ago, i think back then the gui did not support creating cephfs)

4. I can try to create a new pool
 
snap on new ceph pool is working (WTF ???)

Created new pool and moved all Disks

Thank you for the help

How do i close the thread?
 
Last edited:
Okay, so AFAIU the pools are already a bit older? Looks like there might be a bug in 17.2.6 that triggers only sometimes.

Do you still have the old pool(s)? What if you created a new RBD image? Still the same problem?

If the pools are still there, you could consider opening a bug report on the Ceph bug tracker. They will have more of an idea what data to gather to narrow down the cause.
I haven't found any issue there that looks like this problem.

I marked the thread as solved.
 
  • Like
Reactions: flames
@ubu did you have CephFS enabled on the pool by any chance?
I had and disabling this on that pool fixed it for me.
 
Thanks, yes i can try that, there is still cephfs on that pools

I did try, created a new lxc on the old pool and snapshot on the old pool still does not work

@Florius Yes on the old pool is a cephfs, thanks for the info
 
Looking at the bug tracker there hasn't been any update on this for 22 days.
Is there a walk through on how to create new pools and move data to the new pool in a production environment? I was hoping there would be an update to proxmox to have this fixed but it seems some manual intervention is needed.

Or should I try the v8 upgrade, see if the issue still persists?
 
Hi, I seem to be encountering the same issue after bumping from V7 to V8. Have there been any updates or should I work towards migrating my data?
 
I have the same Problem after upgrading CEPH from 16.x to 16.2.13 so without a major Update.

Is there another Solution to fix this problem without recreating the CEPH Pool?

i have around 30TB of Data and 100+ VMs in that Pool and want to avoid migrating all of them to a new pool.

(Tested creating a new pool and migrated one VM, on the new pool everything works as expected.)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!