RBD storage is not active after Ceph update from 16.2.15 to 17.2.7

sikipiki

New Member
Mar 25, 2024
10
0
1
I have Proxmox VE 7.4 on 3 physical servers.
After update Cepth from 16.2.15 to 17.2.7 the RBD store is inactive but enabled. Cepth healt is OK.
All VMs are working, but I can't migrate them, create new ones and if I turn them off, they can't be turned on.

For example when migrating or taking snapshopt I get these errors:
Code:
TASK ERROR: rbd error: rbd: error opening pool 'device_health_metrics': (2) No such file or directory
Code:
rbd snapshot 'vm-200-disk-0' error: rbd: error opening pool 'device_health_metrics': (2) No such file or directory

When I enter the command "pve7to8 --full | more", I get the following error
Code:
could not get usage stats for pool 'device_health_metrics' mounting container failed

This is summary of RBD storage
1711385158372.png


How can I activate RBD storage?
 
The pool device-health_metrics is renamed to .mgr in the upgrade process.

This is an internal pool that is used by Ceph MGR to store data about the cluster. Do not use it for RBD.

To access the VM images again edit /etc/pve/storage.cfg and change the pool name.

Then create a new pool (call it rbd) and migrate the VM images to the pool.
 
Thank you for your answer.

To access the VM images again edit /etc/pve/storage.cfg and change the pool name.

Now I have this settings
Code:
rbd: rbdfs
        content images,rootdir
        krbd 0
        pool device_health_metrics

And now I have to rename pool device_health_metrics to .mgr? So new settings will be
Code:
rbd: rbdfs
        content images,rootdir
        krbd 0
        pool .mgr

Is it right?

And then create a new pool and storage with name "rbd". And than migrate VM disk to the new storage (rbd storage).

1711537789545.png
 
I rename the pool to .mgr and now is storege named rbdfs again active.

But I can't move disk to new rbd storage.
1711538750735.png

I get this error
Code:
create full clone of drive scsi0 (rbdfs:vm-800-disk-0)
TASK ERROR: storage migration failed: rbd error: rbd: error opening image vm-800-disk-0: (2) No such file or directory

And when I start CT then it failed with this error message
Code:
()
task started by HA resource agent
run_buffer: 322 Script exited with status 2
lxc_init: 844 Failed to run lxc.hook.pre-start for container "403"
__lxc_start: 2027 Failed to initialize container "403"
TASK ERROR: startup for container '403' failed
 
Code:
root@server2:/# rbd -p .mgr ls -l
rbd: error opening vm-100-disk-0: (2) No such file or directory
rbd: error opening vm-101-disk-0: (2) No such file or directory
rbd: error opening vm-101-disk-1: (2) No such file or directory
rbd: error opening vm-102-disk-0: (2) No such file or directory
rbd: error opening vm-102-disk-1: (2) No such file or directory
rbd: error opening vm-103-disk-0: (2) No such file or directory
rbd: error opening vm-103-disk-1: (2) No such file or directory
rbd: error opening vm-103-state-test: (2) No such file or directory
rbd: error opening vm-104-disk-2: (2) No such file or directory
rbd: error opening vm-105-disk-1: (2) No such file or directory
rbd: error opening vm-106-disk-2: (2) No such file or directory
rbd: error opening vm-107-disk-0: (2) No such file or directory
rbd: error opening vm-107-disk-1: (2) No such file or directory
rbd: error opening vm-108-disk-0: (2) No such file or directory
rbd: error opening vm-108-disk-1: (2) No such file or directory
rbd: error opening vm-200-disk-0: (2) No such file or directory
rbd: error opening vm-201-disk-1: (2) No such file or directory
rbd: error opening vm-220-disk-1: (2) No such file or directory
rbd: error opening vm-400-disk-0: (2) No such file or directory
rbd: error opening vm-401-disk-0: (2) No such file or directory
rbd: error opening vm-401-disk-1: (2) No such file or directory
rbd: error opening vm-402-disk-0: (2) No such file or directory
rbd: error opening vm-403-disk-0: (2) No such file or directory
rbd: error opening vm-800-disk-0: (2) No such file or directory
rbd: error opening vm-901-disk-0: (2) No such file or directory
rbd: error opening vm-903-disk-0: (2) No such file or directory
rbd: error opening vm-903-disk-1: (2) No such file or directory
rbd: error opening vm-907-disk-0: (2) No such file or directory
rbd: error opening vm-907-disk-1: (2) No such file or directory
NAME  SIZE  PARENT  FMT  PROT  LOCK
rbd: listing images failed: (2) No such file or directory

Code:
root@server2:/# ceph df
--- RAW STORAGE ---
CLASS    SIZE    AVAIL     USED  RAW USED  %RAW USED
ssd    10 TiB  6.3 TiB  4.2 TiB   4.2 TiB      39.68
TOTAL  10 TiB  6.3 TiB  4.2 TiB   4.2 TiB      39.68

--- POOLS ---
POOL             ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
.mgr              1  256  1.4 TiB  370.19k  4.1 TiB  42.69    1.8 TiB
cephfs_data       3   32  5.3 GiB    1.35k   16 GiB   0.28    1.8 TiB
cephfs_metadata   4   16   16 MiB       27   49 MiB      0    1.8 TiB
rbd               5   32      0 B        0      0 B      0    1.8 TiB


Lot of rows like this
1711539562955.png
 
  • Like
Reactions: jsterr
But why was the device_health_metrics pool used for RBDs in the first place?
If the cluster is old enough. Back then, it was possible to select that pool as RBD storage.

And how do I delete VM images from .mgr?
Or delete the pool and restart the active MGR. It will create it from scratch. You will lose the SMART data history that Ceph collected though.
 
If the cluster is old enough. Back then, it was possible to select that pool as RBD storage.
Yes, cluster is old.

Or delete the pool and restart the active MGR. It will create it from scratch. You will lose the SMART data history that Ceph collected though.
After backup restore (to an external disk) i will try delete the pool and restart the active MGR. SMART data history will no longer be important.

Thank you.
 
Last edited:
I have the same problem ?
rename the pool to .mgr and now is storege named cy again active
vm can not foud.
please help me?
1712755756742.png

1712755781331.png
 
I have the same problem ?
rename the pool to .mgr and now is storege named cy again active
vm can not foud.
please help me?
https://pve.proxmox.com/wiki/Ceph_P...ages_are_stored_on_pool_device_health_metrics

After the upgrade, the images will most likely be broken, at least it was the case when we ran into the situation the last time -> restore from backup :-/
Well, not much that can be done now. Please do check the upgrade guides and the known issues section in the future. Then you can act before the update :)
 
Well, not much that can be done now. Please do check the upgrade guides and the known issues section in the future. Then you can act before the update :)
Maybe the Known issues section should be above the Upgrade section. :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!