Unable to access ceph-based disks with krbd option enabled

Waschbüsch

Renowned Member
Dec 15, 2014
93
8
73
Munich
Hi all,

I just upgraded my cluster to Proxmox VE 6.1 and wanted to give the updated krbd integration a spin.

So, I set my rbd storage to use krbd and then tried to migrate a vm to a different node. This is what I got:

Code:
2019-12-08 11:49:22 starting migration of VM 211 to node 'srv01' (192.168.1.1)
2019-12-08 11:49:22 starting VM 211 on remote node 'srv01'
2019-12-08 11:49:27 can't map rbd volume vm-211-disk-1: rbd: sysfs write failed
2019-12-08 11:49:27 ERROR: online migrate failure - command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=srv01' root@192.168.1.1 qm start 211 --skiplock --migratedfrom srv05 --migration_type secure --stateuri unix --machine pc-i440fx-2.11' failed: exit code 255
2019-12-08 11:49:27 aborting phase 2 - cleanup resources
2019-12-08 11:49:27 migrate_cancel
2019-12-08 11:49:28 ERROR: migration finished with problems (duration 00:00:06)
TASK ERROR: migration problems

The same happens if I add a new rbd storage (for the same cluster, just with the krbd switch turned on) and try to move the disk from rbd to krbd storage:

Code:
can't map rbd volume vm-211-disk-1: rbd: sysfs write failed (500)

Any and all operations work if krbd is not involved.

Are there any prerequisites to using krbd I need to be aware of?
 
Update:
The same is true when adding a new HDD to an existing VM or when creating an altogether new VM.
So, this is not limited to migration.

If (as the error message suggests it might be) this is to do with wrong permissions, where would I start looking?

ceph auth is set to optional:

Code:
     auth_client_required = none
     auth_cluster_required = none
     auth_service_required = none
     auth_supported = cephx

And I don't know what other permissions might be involved here?
Specifically: Which ceph and/ or system user is used and what permissions are needed?
 
OK. Resetting ceph.conf to the default auth stuff:

Code:
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx

solved the problem, but I would still be very glad if someone could point out why that is?
AFAIK ceph auth does incur a (small) performance hit and seeing as this is a private, separate network, I really don't need the added security.
 
Is there a keyring with the storage name under /etc/pve/priv/ceph?
 
When you disable cephx, this keyring needs to be deleted. Our tooling will use it if it find one.
 
This results in new / different error. :-(

I tried the following:

- Setup a new storage for ceph rbd using the krbd flag named 'kdisks'
- removed
Code:
/etc/pve/priv/ceph/kdisks.keyring
- Tried to add a new disk to a VM using the new storage

/etc/pve/ceph.conf auth stuff reads like this:

Code:
     auth_client_required = none
     auth_cluster_required = none
     auth_service_required = none
     auth_supported = cephx

This resulted in the following error:

Code:
update VM 199: -scsi4 kdisks:32
TASK ERROR: error with cfs lock 'storage-kdisks': rbd error: rbd: listing images failed: (95) Operation not supported
 
Had the same problem with 6.2.
removed /etc/pve/priv/ceph/<pool-name>.keyring
Works now.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!