Ceph replace NVMe SSDs used for db/wall

Mar 4, 2026
2
0
1
I have a Ceph cluster with 5 nodes. Each node has 8 HDDs, and each node also has 2 NVMe SSDs used for db/wall. I need to replace these NVMe SSDs on two nodes because they are nearing the end of their lifespan. Is it possible to replace these SSDs without losing the db/wall of each node? I saw that there's a command that exports and another that performs a kind of db/wall migration. The problem is that I can't have the new and used NVMe SSDs at the same time because I don't have M.2 connectors available. What would be the best way to avoid cluster recovery/rebalancing as much as possible, considering I can only run recovery/rebalancing at night because I have an email service that uses this storage, and if recovery/rebalancing runs during the day, the email service becomes extremely slow?
 
@licenciamento So you’re looking to remove an existing drive, replace it with a blank one, and then recreate the missing WAL/DB? At that point AFAIK its 4 OSDs are not usable. I would guess you’d need to recreate, but I’ll follow for other comments.
 
Given that you can't migrate directly to the from the old disk to the new one, I think your best bet would be to move the db/wal to the data disk (the HDD), then swap the drive, and move the db/wal to the new drive.

See
https://docs.ceph.com/en/latest/ceph-volume/lvm/migrate/
https://docs.ceph.com/en/latest/ceph-volume/lvm/newdb/


Sometimes those ceph documents can be hard to interpret, so hopefully this helps.


The basic process would be to set noout (optional, but probably a good idea). Then stop the OSD and run the migrate command, like:
Code:
ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from db wal --target vgname/data
- osd-id is the osd number. Example: 1
- osd-fsid is the OSD's UUID. That's the "LV Name" you can see in the proxmox UI with the "osd-block-" part removed. Example:
3927803c-c083-47d7-8782-0591f599f181
- target is the LVM for the data disk, from the proxmox UI that's the "LV Path" with "/dev" removed from the front. Example: ceph-37d9939b-55ea-463a-a9bf-acd84aea03f4/osd-block-3927803c-c083-47d7-8782-0591f599f181


When that is done, you can start the OSD again.

After all have been moved, you can replace the NVME disk. You need to make the LVM volumes or whatever to hold the db/wal on the new disk, and then move the db/wal back. So make the pv, then the vg named whatever you like (for example osd_dbs), and then inside that vg make the LVs for your OSDs named whatever you like (for example, osd1). Then set noout, and stop the OSD. Add the DB volume to the OSD with a command like:
Code:
ceph-volume lvm new-db --osd-id 1 --osd-fsid <uuid> --target vgname/new_db
Where target is the lv you created, like osd_dbs/osd1 using the examples I gave above.

Then migrate the DB to that new device:
Code:
ceph-volume lvm migrate --osd-id 1 --osd-fsid <uuid> --from data --target vgname/db

And then you can start the OSD again.