[SOLVED] How to replace 500GB HDD (CEph, OSD RBD) with 1TB one ? (safely!)

2 replicas is never save, use at least 3.
?
when i shutdown one node (on 3) everything is ok, no data loose, why did you said "never sa(f)ve" ?
i don't have 3 copies of the datas , i think 2 is enought for me (like a raid1)
 
Last edited:
There are situation where you can loose data. Please read reports in the Ceph user mailing lists for details, e.g. read here:

https://de.slideshare.net/ShapeBlue/wido-den-hollander-10-ways-to-break-your-ceph-cluster

you're right but :

2x replica can be an issue, problematic ... if you ALREADY being in a delicate situation, ie when you loose a node, or already have fiew OSD down , you MUST as soon as possible rebuild a new node, change disk (etc) ... if you consider that running with a node down is healthy ...
not me.

But you are right 3 replicas is better than 2 ... and 4 replicas is better than 3 ... and so on !

i'm not a ceph (a newbie in fact !!!) expert, so i will pay attention to this CAREFULLY (yep).
 
Last edited:
https://forum.proxmox.com/threads/change-num-replicas-on-ceph-rool-online.25904/

You will get a lot of traffic on your cluster, so be warned. And make sure that you have enough space on your OSDs before starting this.
this is weird , i take a look at the ceph conf :

Code:
[global]
 auth client required = cephx
auth cluster required = cephx
auth service required = cephx c
luster network = 10.90.80.0/24
fsid = 23f454517-238f-4336-b9d3-b9d12b2edfaf keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.90.80.0/24

i can read that i have already a min size of 2 and default size of 3 ...
BUT when i go to "pools" section , my cephPool1 is size/min = 2/2 ?

What is the meaning of this 2 contrary informations ?

And when i do :
ceph osd pool get cephPool1 size
size 2
ceph osd pool get cephPool1 min_size
size 2

i think configuration file display by proxmox is very strange ...

BUT i increase pool default size to 3 with :
ceph osd pool set cephPool1 size 3

and then get the good size : 3

Note : yes , i get warning ... cluster disk is working !
Degraded data redundancy: 60770/191583 objects degraded (31.720%), 124 pgs degraded, 124 pgs undersized
 
Last edited:
this is weird , i take a look at the ceph conf :

Code:
[global]
 auth client required = cephx
auth cluster required = cephx
auth service required = cephx c
luster network = 10.90.80.0/24
fsid = 23f454517-238f-4336-b9d3-b9d12b2edfaf keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.90.80.0/24

i can read that i have already a min size of 2 and default size of 3 ...
BUT when i go to "pools" section , my cephPool1 is size/min = 2/2 ?

What is the meaning of this 2 contrary informations ?

And when i do :
ceph osd pool get cephPool1 size
size 2
ceph osd pool get cephPool1 min_size
size 2

i think configuration file display by proxmox is very strange ...

BUT i increase pool default size to 3 with :
ceph osd pool set cephPool1 size 3

and then get the good size : 3

Note : yes , i get warning ... cluster disk is working !
Degraded data redundancy: 60770/191583 objects degraded (31.720%), 124 pgs degraded, 124 pgs undersized


The config file just sets defaults, when you make a pool you can override these as you must have done to create a 2/2 pool.

Doing what you did was correct and once the sync is finished your get a health good message.

On the upgrade of the disks from 500GB -> 1TB, how much data does each disk currently have? Are they near full hence you wanting to move to 1TB, or is there only a small bit of data currently?
 
To replace older drives with newer ones in a Ceph cluster, add the new one, wait for the rebalancing to finish. Then stop and out the old drive. Wait for the rebalancing to finish and remove the drive. Repeat till all the drives have been replaced. This is a fairly common scenario.

As for replica count, I too suggest using a replica of 3 although it will cut down usable storage space.
 
To replace older drives with newer ones in a Ceph cluster, add the new one, wait for the rebalancing to finish. Then stop and out the old drive. Wait for the rebalancing to finish and remove the drive. Repeat till all the drives have been replaced. This is a fairly common scenario.

As for replica count, I too suggest using a replica of 3 although it will cut down usable storage space.

I upgrade to 3 replicas today it's ok. But less space ... ;-(

i will follow your advice, but can i put :
node 1 : 2 x 2to (HDD)
node 2 : 2 x 2to (HDD)
node 3 : 2 1to (HDD )
?
 
The config file just sets defaults, when you make a pool you can override these as you must have done to create a 2/2 pool.

Doing what you did was correct and once the sync is finished your get a health good message.

On the upgrade of the disks from 500GB -> 1TB, how much data does each disk currently have? Are they near full hence you wanting to move to 1TB, or is there only a small bit of data currently?
i want to have 2to to 3to for our user.
i think i must put 6x2 To ... as Ceph eat 2/3 for his replicas ...
we plan to have 200 user with 10GB to 50GB eache.

(we don't earn money , we help people for having non google and stuff!)
 
To replace older drives with newer ones in a Ceph cluster, add the new one, wait for the rebalancing to finish. Then stop and out the old drive. Wait for the rebalancing to finish and remove the drive. Repeat till all the drives have been replaced. This is a fairly common scenario.

As for replica count, I too suggest using a replica of 3 although it will cut down usable storage space.

i follow this tutorial :
https://access.redhat.com/documenta...ml/administration_guide/changing_an_osd_drive

it's working but it' take TOO long for decrease percent of degraded data reduncy.
i need to upgrade 6 drive of 500GB with 2TB.
(all my SATA slot are full)
IS this the only way to upgrade ?
 
i follow this tutorial :
https://access.redhat.com/documenta...ml/administration_guide/changing_an_osd_drive

it's working but it' take TOO long for decrease percent of degraded data reduncy.
i need to upgrade 6 drive of 500GB with 2TB.
(all my SATA slot are full)
IS this the only way to upgrade ?
It is probably too late for a reply, but in case someone else looking for info.
As far as I know, this is the safest way to upgrade without impacting cluster performance to a point it becomes unusable during re-balancing. You can speed up recovery greatly with tweaks such as increasing Backfill and number of Active recovery options if your network and hardware can take it. On a large cluster with enough resource, recovery is fast and almost unnoticeable to users.

Good practice is tweak it and see what fits your environment. Make incremental small changes and take notes. Almost all these tweaks can be applied runtime and monitor the changes.
 
Replacing a disk is super simple, and can be performed even from the gui:

1. down (stop) and out an osd (will probably already be in this state for a failed drive)
2. remove it from the tree and crush map ("destroy" without removing partition in the gui)
2.bis : cluster ceph in "noout" mode
3. replace disk, reboot (verify bios disk presence, and fdisk -l !)
4. create new osd
4.bis : cluster ceph in "unset noout" mode
5. profit ==> ceph -w in order to see in realtime rebuild !


TIPS : in order to increase rebuild backfill/recovery,
ceph tell 'osd.*' injectargs '--osd-max-backfills 16'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 4'

By default :

ceph tell 'osd.*' injectargs '--osd-max-backfills 1'
ceph tell 'osd.*' injectargs '--osd-recovery-max-active 3'
 
I'm in the same situation but my disks are quite 90% full, ?'ve got a 3 node cluster with 6x1Tb disks on each node, I've got to replace une disk of each node with a 4Tb disk.

Anyone have experienced removing a disk ona a cluster full at 90 % and with 'backfill_toofull' label active fo 4 PGS over 301?

Thanks
 
@stefanobertoli, remove one osd to replace it will not increase usage of other ones when using noout flag.
Missing PGs will be replicated to the new osd. You will be in warning state during the recovery but cluster is still functionnal.

But, waiting for 90% full is very dangerous. If you lose one or more osd, you will broke your cluster

I am not sure the rebalance will be enough to use more on the 4T osd in order to clear the toofull flag. Let us know what happens
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!