Replace Journal / WAL SSD drive

lweidig

Active Member
Oct 20, 2011
104
2
38
Sheboygan, WI
We have a four node Proxmox cluster with all of the nodes also providing Ceph storage services. One of the nodes is having issues with the SSD that we are using for the journal / WAL drives (this is 5.1 / bluestore). We use a command like:

Code:
pveceph createosd /dev/sdc --journal_dev /dev/sdr --wal_dev /dev/sdr

to create each of the OSD devices. This would be where /dev/sdc is the mechanical drive and /dev/sdr is the SSD drive in the example. It would be /dev/sdr that needs replacing and it has journal / wal for multiple other drives in the setup. All drives are running in hot swap bays, so we are hoping this can be accomplished cleanly with a running system, but can of course bring the node down if there are no other options.

Appreciate any advice to making this a smooth (and hopefully no downtime) replacement.
 

Alwin

Proxmox Retired Staff
Retired Staff
Aug 1, 2017
4,617
453
88
If the '/dev/sdr' drive is still ok, then it might work to clone the drive. If not then you would need to remove all bluestore OSDs first and after the replacement re-create them back. On re-creation, you can skip the parameter of the wal_dev as it will be placed on the fastest disks of the OSD (eg. wla/db -> sdr , data -> sdc).
 
If the '/dev/sdr' drive is still ok, then it might work to clone the drive. If not then you would need to remove all bluestore OSDs first and after the replacement re-create them back. On re-creation, you can skip the parameter of the wal_dev as it will be placed on the fastest disks of the OSD (eg. wla/db -> sdr , data -> sdc).

Hi there!

I'm facing the same problem here...
If I do this, i.e, destroy the OSD and re-created I will lose data???
Thanks for any help.
 

Alwin

Proxmox Retired Staff
Retired Staff
Aug 1, 2017
4,617
453
88
If I do this, i.e, destroy the OSD and re-created I will lose data???
This depends upon your setup. If by default the pool size/min_size is 3/2 and the replication is on host level, with at least three hosts, than possibly NO.
 
Last edited:

Alwin

Proxmox Retired Staff
Retired Staff
Aug 1, 2017
4,617
453
88
I'm not sure what this mean... What you mean "replication is on host level"???
If you didn't change the crush rules, than by default the copies are distributed on host level. This means, that copies of the same object will not be placed on the same node.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!