rbd map: "RBD image feature set mismatch"

escoreal

Renowned Member
Dec 22, 2010
81
3
73
Hello,

I am testing PVE 5 with Ceph (12.1) and wanted to "map" a ceph volume but I get an error. Is this a bug? Did that work with another versions of PVE or Ceph?

Thanks,
esco

Code:
# rbd map <ceph-pool>/foo
rbd: sysfs write failed
RBD image feature set mismatch. Try disabling features unsupported by the kernel with "rbd feature disable".
In some cases useful info is found in syslog - try "dmesg | tail".
rbd: map failed: (6) No such device or address
# dmesg |tail -n1
[1355258.253726] rbd: image foo: image uses unsupported features: 0x38
 
how did you create that image? the rbd kernel driver does not support all the image features and tunables of the newer releases.
 
Just plain installation of PVE 5 with Ceph 12.1 (pveceph install..) and VMs (with volumes) created from the PVE web interface. Nothing special. So you can't reproduce this?
 
Just plain installation of PVE 5 with Ceph 12.1 (pveceph install..) and VMs (with volumes) created from the PVE web interface. Nothing special. So you can't reproduce this?

volumes created for VMs are not created with the rbd map limitations in mind - Qemu accesses them using librbd, which supports all the new features. volumes created for containers should be created with a reduced feature set, because PVE maps them using rbd map and mounts them before handing the mounted volume over to the container. if you want VMs to use the kernel rbd driver, you can set the krbd flag on the storage. but note that this will potentially reduce performance compared to the librbd-based configuration.
 
I don't "want" VMs to use the kernel rbd driver. The question was if this is a bug? So, if I understand this right the rbd kernel module is not up to date? Will this stay this way?

I just want easy direct access from the host to the volumes. If this will stay this way I could script some workaround with "rbd-nbd" and "ln".
 
I don't "want" VMs to use the kernel rbd driver. The question was if this is a bug? So, if I understand this right the rbd kernel module is not up to date? Will this stay this way?

I just want easy direct access from the host to the volumes. If this will stay this way I could script some workaround with "rbd-nbd" and "ln".

no, this is not a bug. the rbd kernel module is uptodate - it's just always a bit behind librbd regarding newly introduced features, because it is not directly maintained by the ceph project but needs to go through the regular kernel development process. if you want your VM volumes to support rbd map, you need to disable certain features on the associated rbd images (like PVE does for container volumes).
 
ok, So I added "rbd default features = 5" to the ceph.conf. Default was 61

So I only have "layering" (1) and "exclusive-lock" (4). "object-map" (8), "fast-diff" (16) and "deep-flatten" (32) are now disabled by default.
 
I'm bringing this thread back because I just experienced this same sort of behavior, and wanted to elaborate on what happened so the devs can take notice.

I have a testing cluster of 4 nodes in my home lab. I have Ceph set up across 3 nodes with different hard drives. I have a file storage VM running Nas4Free in HA with it's storage on the Ceph.

We had a power outage and when the cluster came back up I had to reboot each of the nodes again because the cluster ended up in an inconsistent state. After the reboot of each of the nodes then the cluster was working properly. However the HA filestore wasn't coming up, and on further examination there were log reports showing there was a problem with mounting the hard drives from Ceph. I detached the ceph hard drives and attempted to rbd mount one of them and got this error:

RBD image feature set mismatch. You can disable features unsupported by the kernel with "rbd feature disable test/vm-100-disk-1 object-map fast-diff deep-flatten".
The Ceph cluster reports that everything is working fine and there are no errors.

This was working before the power outage, and I realize this is an edge case... but I'm wanting to learn, so what led to RBD seeing an image feature set mismatch? Is there a corruption in the configuration somewhere that I should look for?
 
Forgot to mention, following the instructions in the original error and running this code fixes the issue:
rbd feature disable test/vm-100-disk-1 object-map fast-diff deep-flatten

So what would have caused that image to report those unsupported features?
 
Hi,

Use NBD for mapping. Its slovest than KRBD but working with all new moder featuers:

Code:
modprobe nbd
rbd-nbd  -m mon1,mon2,mon3 --user CEPHUSER -k /etc/pve/priv/ceph/whatever.keyring map pool/image
rbd-nbd unmap /dev/nbd0
 
Are you indicating that following your suggestion will get Proxmox to switch to using nbd? At what point did my installation change to needing this?

I started out with the latest version 5 download and installed each node with the no-subscription repository and have kept them updated... so was there an update at some point that I applied that could have caused this situation on the next boot?
 
Are you indicating that following your suggestion will get Proxmox to switch to using nbd? At what point did my installation change to needing this?

I started out with the latest version 5 download and installed each node with the no-subscription repository and have kept them updated... so was there an update at some point that I applied that could have caused this situation on the next boot?
No. I simply wrote, if you want to mount RBD image with extra feature sets, you can do this on this way. QEMU using libvirt for direct connection to RBD images, and LXC doenst have enabled these features by default, because its also using KRBD.
 
Ok, I get that, and thank you.

However, what I'm after is why I might have received these errors in the first place when I was not attempting to use any extra features.
 
Sure...

So I fixed the vm disk in question already using the command I mentioned before, so here's the fixed image info:

rbd image 'vm-100-disk-1':
size 8192 MB in 2048 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.2beb5b643c9869
format: 2
features: layering, exclusive-lock
flags:
create_timestamp: Sun May 13 17:25:46 2018
But I have a couple other less important VM disks that have the same issue, and I can see from the output that the image shows the unsupported features:

rbd image 'vm-110-disk-1':
size 40960 MB in 10240 objects
order 22 (4096 kB objects)
block_name_prefix: rbd_data.24208174b0dc51
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Wed May 9 12:50:14 2018​

So, thinking back, I believe I had used the "Move disk" button on the VM Hardware page to migrate these from a local ZFS storage to the Ceph storage I had set up later. In addition, I don't think I had taken the whole cluster down between that time and now...

So, would the "Move disk" command possibly have resulted in this inconsistency issue?
 
KRBD doesn't support all features yet, this is hard for LXC containers, as those can't use librbd. For VMs this is no problem, as qemu can use librbd and access the images directly. On a move disk the features are disabled, at least in newer versions of PVE.
 
Interestingly, the VMs that were affected were all qemu. I have one lxc, and it came up without error.

I can certainly run the command on the affected disks and resolve the issue, but I'm a little concerned that this error only showed up after a reboot and not before.
 
Interestingly, the VMs that were affected were all qemu. I have one lxc, and it came up without error.
Qemu can use both ways, so it depends on how your storage is configured. The checkbox "krbd" (when ticked), activates the use of mapped images through the kernel. To use a ceph pool, there should be two storages configured, one that uses krbd and one that doesn't. Both can point to the same pool.

I can certainly run the command on the affected disks and resolve the issue, but I'm a little concerned that this error only showed up after a reboot and not before.
There must have happend something before that copied a image or configured a VM differently. In any case, please update to the lastest version, as not only PVE but also Ceph has newer packages available.
 
After upgrading a ceph cluster tonight, creating new images (qemu & lxc) creates volumes with object-map, deep-flatten and layering features enabled. This hinders the vm to start - with the message described above. Today I have upgraded ceph again (new update available) - but no result.
After removing the features on the disk image with rbd feature disable pool/image I were able to start. The problem we will face is automatic creation of images. I don't care about the (possibly missing) features on the disk image, but about not being able to automate image creation. How can I override the default image settings for rbd devices?

Thanks
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!