[SOLVED] PVE 5.1, Ceph Luminous - Linked clone fails

Jun 8, 2016
344
75
93
48
Johannesburg, South Africa
We are running Proxmox VE 5.1 with Ceph Luminous and are booted off kernel 4.13.4-1-pve.

We have templates, from which we typically create linked clones. This consolidates disc usage and speeds up access, as there is a higher probability of the common base being cached. When attempting to create linked clones, after upgrading everything, we receive the following error:

Code:
create linked clone of drive scsi0 (virtuals:base-191-disk-1)
clone base-191-disk-1: base-191-disk-1 snapname __base__ to vm-234-disk-1
rbd: failed to update image features: 2017-11-22 13:11:13.010595 7f742bba8d00 -1 librbd::Operations: one or more requested features are already disabled(22) Invalid argument
TASK ERROR: clone failed: could not disable krbd-incompatible image features of rbd volume vm-234-disk-1: rbd: failed to update image features: 2017-11-22 13:11:13.010595 7f742bba8d00 -1 librbd::Operations: one or more requested features are already disabled(22) Invalid argument

Looks like PVE is attempting to disable a feature which is already disabled...


We have working VMs which are currently running, using KRBD, off linked clones:
Code:
[admin@kvm5c ~]# rbd ls -l
NAME                                     SIZE PARENT                       FMT PROT LOCK
base-191-disk-1                        40960M                                2
base-191-disk-1@__base__               40960M                                2 yes
<snip>
vm-223-disk-1                          40960M rbd/base-191-disk-1@__base__   2

Our /etc/pve/storage.cfg file:
Code:
rbd: virtuals
        monhost 10.254.1.3;10.254.1.4;10.254.1.5
        content images,rootdir
        pool rbd
        krbd 1
        username admin

RBD image information, for the template, the protected clone source image and a working VM:
Code:
[admin@kvm5c ~]# rbd info base-191-disk-1
rbd image 'base-191-disk-1':
        size 40960 MB in 10240 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.0849e42ae8944a
        format: 2
        features: layering
        flags:
[admin@kvm5c ~]# rbd info base-191-disk-1@__base__
rbd image 'base-191-disk-1':
        size 40960 MB in 10240 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.0849e42ae8944a
        format: 2
        features: layering
        flags:
        protected: True
[admin@kvm5c ~]# rbd info vm-223-disk-1
rbd image 'vm-223-disk-1':
        size 40960 MB in 10240 objects
        order 22 (4096 kB objects)
        block_name_prefix: rbd_data.5c1e8e3d1b58ba
        format: 2
        features: layering
        flags:
        parent: rbd/base-191-disk-1@__base__
        overlap: 40960 MB

PS: When running 'apt-get update; apt-get -y dist-upgrade;' we observe an odd notification which we haven't seen before:
Code:
The following packages have been kept back:
  pve-qemu-kvm
 
I worked around the bug in the GUI by creating a full clone, then deleting the RBD image (rbd rm vm-234-disk-1) and then manually creating a linked clone (rbd clone rbd/base-191-disk-1@__base__ rbd/vm-234-disk-1). I finally edited the 234.conf file in the /etc/pve/local/qemu-server location and updated the virtual disc definition:
scsi0: virtuals:base-191-disk-1/vm-234-disk-1,discard=on,size=40G

Please fix this bug, I really don't want to give other staff access to open the possibility of them accidentally deleting the wrong image...
 
please include your 'pveversion -v' output and the content of /etc/pve/storage.cfg, as well as any custom ceph settings you are using.
 
Thanks Fabian, hope you spot something that I've overlooked...


/etc/pve/storage.cfg:
Code:
dir: local
        path /var/lib/vz
        maxfiles 0
        shared
        content backup,iso,vztmpl

rbd: virtuals
        monhost 10.254.1.3;10.254.1.4;10.254.1.5
        content images,rootdir
        pool rbd
        krbd 1
        username admin


/etc/ceph/ceph.conf:
Code:
[global]
         debug ms = 0/0
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 10.254.1.0/24
         mon allow pool delete = true
         filestore xattr use omap = true
         fsid = a3f1c21f-f883-48e0-9bd2-4f869c72b17d
         keyring = /etc/pve/priv/$cluster.$name.keyring
         osd journal size = 20480
         osd pool default min size = 1
         public network = 10.254.1.0/24

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
         host = kvm5b
         mon addr = 10.254.1.3:6789

[mon.2]
         host = kvm5c
         mon addr = 10.254.1.4:6789

[mon.3]
         host = kvm5d
         mon addr = 10.254.1.5:6789
[mds]
         mds data = /var/lib/ceph/mds/$cluster-$id
         keyring = /var/lib/ceph/mds/$cluster-$id/keyring

[mds.kvm5b]
         host = kvm5b

[mds.kvm5c]
         host = kvm5c

[mds.kvm5d]
         host = kvm5d


pveversion -v:
Code:
proxmox-ve: 5.1-26 (running kernel: 4.13.4-1-pve)
pve-manager: 5.1-36 (running version: 5.1-36/131401db)
pve-kernel-4.13.4-1-pve: 4.13.4-26
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-15
qemu-server: 5.0-17
pve-firmware: 2.0-3
libpve-common-perl: 5.0-20
libpve-guest-common-perl: 2.0-13
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-16
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.1-12
pve-qemu-kvm: 2.9.0-5~pve4
pve-container: 2.0-17
pve-firewall: 3.0-3
pve-ha-manager: 2.0-3
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.0-2
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.7.3-pve1~bpo9
ceph: 12.2.1-1~bpo80+1
 
Is there any workaround for this other than re-creating all of the base template images?

I did a Proxmox 4.4 to 5.1 upgrade which involved upgrading Ceph from Hammer through Jewel to Luminous. All of my templates which are used as the bases for a number of linked clones can no longer be linked cloned as this error pops up. I don't want to have to re-create all of my templates if I can avoid it, but at the moment I have to create full clones which is seriously degrading my storage capacity.

Any fix coming in the near future to fix this or a workaround which anyone can recommend?

Thanks in advance
 
  • Like
Reactions: David Herselman
You don't need to convert anything, my 2nd post provides the necessary steps on how to delete the full clone disc and how to subsequently manually create a cloned image from the protected template base image.

Would have expected Proxmox to have squashed this bug, not allot of people using templates when running Ceph or people have held back upgrading to PVE 5.1 and/or Ceph Luminous...
 
Any updates on this? I get the same error when trying to create a linked clone.

Code:
create linked clone of drive scsi0 (CEPH-SSD-Pool:base-101-disk-1)
clone base-101-disk-1: base-101-disk-1 snapname __base__ to vm-102-disk-1
rbd: failed to update image features: (22) Invalid argument
TASK ERROR: clone failed: error with cfs lock 'storage-CEPH-SSD-Pool': could not disable krbd-incompatible image features of rbd volume vm-102-disk-1: rbd: failed to update image features: (22) Invalid argument

Code:
root@pve2:~# pveversion -v
proxmox-ve: 5.1-38 (running kernel: 4.13.13-5-pve)
pve-manager: 5.1-43 (running version: 5.1-43/bdb08029)
pve-kernel-4.13.13-5-pve: 4.13.13-38
libpve-http-server-perl: 2.0-8
lvm2: 2.02.168-pve6
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-19
qemu-server: 5.0-20
pve-firmware: 2.0-3
libpve-common-perl: 5.0-25
libpve-guest-common-perl: 2.0-14
libpve-access-control: 5.0-7
libpve-storage-perl: 5.0-17
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-3
pve-docs: 5.1-16
pve-qemu-kvm: 2.9.1-6
pve-container: 2.0-18
pve-firewall: 3.0-5
pve-ha-manager: 2.0-4
ksm-control-daemon: not correctly installed
glusterfs-client: 3.8.8-1
lxc-pve: 2.1.1-2
lxcfs: 2.0.8-1
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
openvswitch-switch: 2.7.0-2
ceph: 12.2.2-pve1
 
We're inching towards this bug report having been open for 3 months now. A little concerning as this appears to relate to pure Proxmox code and would not appear to be something that's waiting on 'upstream'.

Agree & <bump> for the fix request. Very surprised more Proxmox + Ceph users haven't run into and reported this.
 
Agree & <bump> for the fix request. Very surprised more Proxmox + Ceph users haven't run into and reported this.

it is on our TODO list - once there is progress, the bug will be updated.
 
Using the multi-threaded kernel RBD module is necessary to providing an acceptable performance level for guests.

Proxmox could consider setting the required image-features integer by summing the IDs:
Code:
rbd create bar -s 1024 --image-format=2 --image-features=3


or you can set the default in the Ceph configuration file:
Code:
pico /etc/ceph/ceph.conf
[global]
         rbd default features = 3
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!