Impact of Changing Ceph Pool hdd-pool size from 2/2 to 3/2

rtsx · Oct 16, 2025

Scenario

I have a Proxmox VE 8.3.1 cluster with 12 nodes, using CEPH as distributed storage. The cluster consists of 96 OSDs, distributed across 9 servers with SSDs and 3 with HDDs. Initially, my setup had only two servers with HDDs, and now I need to add a third node with HDDs so the pool can remain consistent.

However, I’m not sure about the impact of this change, my google-fu wasn’t strong enough to make me feel confident.
(Note: I have production VMs running.)

Questions and Help Request:

What is the impact of this change?
What would be the recommendation for my scenario?

Pool:

Bash:

pool #|Name|Size|# of Placement group|Optimal # of PGs|Autoscaler Mode| CRUSH RULE(ID)| Used %

3 - | ssd-pool| 3/2| 1024| N/A| On| ssd-replicated-rule(1)| 20.78 TiB(36.21%)
4 - | hdd-pool| 2/2| 512| 512| On| hdd-replicated-rule(1)| 114.83 TiB(48.47%)
6 - | .mgr| 3/2| 1| N/A| On| replicated_rule(0)| 444.39 MiB(0.00%)

OSD Tree and Crushmap in attachment.

Configuration:

Bash:

[global]
    auth_client_required = cephx
    auth_cluster_required = cephx
    auth_service_required = cephx
    cluster_network = X.X.X.X/24
    fsid = 52d10d07-2f32-41e7-b8cf-7d7282af69a2
    mon_allow_pool_delete = true
    mon_host = X.X.X.X X.X.X.X X.X.X.X X.X.X.X X.X.X.X X.X.X.X
    ms_bind_ipv4 = true
    ms_bind_ipv6 = false
    osd_pool_default_min_size = 2
    osd_pool_default_size = 3
    public_network = X.X.X.X/24

[client]
    keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
    keyring = /etc/pve/ceph/$cluster.$name.keyring

[mon.pve114]
    public_addr = X.X.X.X

[mon.pve115]
    public_addr = X.X.X.X

[mon.pve117]
    public_addr = X.X.X.X

[mon.pve118]
    public_addr = X.X.X.X

[mon.pve119]
    public_addr = X.X.X.X

[mon.pve142]
    public_addr = X.X.X.X

Configuration Database:

Used(%) HDDs:

Used(%) SSD:

References:
https://docs.ceph.com/en/reef/rados/operations/pools/#setting-the-number-of-rados-object-replicas

Any help from the community would be greatly appreciated!

I can provide logs or additional command outputs if needed.

Thanks in advance for your support!

c0ntrol · Oct 16, 2025

UP! I have the same ask

SteveITS · Oct 16, 2025

Changing the HDD pool to 3/2 will make another replica of all blocks so will increase used disk space by 50%. With 114 TB that will probably take a while to complete copying the "additional" 57 TB.

> 12 nodes
Do you have another to have an odd number? Add a QDevice? It's recommended to have an odd number. Otherwise if 6 go down (dead switch, etc.) neither side has over 50% for quorum and all 12 will reboot to try to recover.

rtsx · Oct 17, 2025

1 - I did a quick test in a lab environment with a small 3-node cluster (each node having 3 OSDs).
When I changed the pool from 2/2 to 3/2, the cluster froze for a short moment right after applying the new size.

I’m pretty sure it happened because of the poor SSD performance, I sliced one SSD into small parts across 3 VMs just for testing.

After a while, the cluster recovered by itself and the VM continued to run normally.

I’m now setting up a more reliable lab (no slow OSD alerts) to compare results and will also fix the quorum setup before making any changes in production.

Do you have another to have an odd number? Add a QDevice?

2 - Yes, I do have another node
About the QDevice, can it be just a simple Linux host running the quorum service only (to make the cluster odd = 13) and nothing else?
I’d like to keep things as simple as possible

Thanks again for the great insights!

Johannes S · Oct 17, 2025

rtsx said:
About the QDevice, can it be just a simple Linux host running the quorum service only (to make the cluster odd = 13) and nothing else?

This would work for corosync:
https://pve.proxmox.com/wiki/Cluster_Manager#_corosync_external_vote_support

I'm not sure regarding Ceph too, it's quorum is independent from corosync.

The easiest route would propably to add the Linux host as another ProxmoxVE-Node but without vms or osds on it, so basically just for maintaining quorum.

SteveITS · Oct 17, 2025

The QDevice can be anything. Note adding it allows passwordless SSH to it from cluster nodes, so maybe not a PBS server. It doesn't have to be local either IIRC. Note the doc bits about removing it before adding or removing cluster nodes.

Proxmox recommends 3 Ceph Monitors, and Manager on the Monitor nodes. I would keep an odd number so 3 or 5.
https://pve.proxmox.com/wiki/Deploy_Hyper-Converged_Ceph_Cluster#pve_ceph_monitors

Johannes S · Oct 17, 2025

SteveITS said:
Note adding it allows passwordless SSH to it from cluster nodes, so maybe not a PBS server. It doesn't have to be local either IIRC. Note the doc bits about removing it before adding or removing cluster nodes.

@aaron described his setup for his private infrastructure here, he installs PBS parallel to a single-node PVE which is NOT added to the cluster so he can run the qdevice in a Debian container:

Post in thread 'Planning advice'

Aug 13, 2025

What I do in some personal infra is the following:

2x PVE nodes with local ZFS storage (same name)
1x PBS + PVE side by side bare metal.

The 2x PVE nodes are clustered. To be able to use HA I make sure that the VMs all have the Replication enabled. For Mailservers and other VMs where any data loss is painful, I replicate with the shortest possible interval of 1 minute. Other VMs, like a DNS server, are replicated with longer intervals.

On the PBS server I have one LXC container running which is providing the external part of the QDevice, so that the 2x PVE nodes get their 3rd vote and...

The result is that the passwordless ssh from the cluster can only get access to the qdevice-lxc, not the PBS. I think that's a quite elegant solution.

rtsx · Oct 30, 2025

I’m pleased to update this topic by confirming that the resize to 3/2 was successfully completed.
When applying the configuration, the behavior was the same as in the lab scenario, the Ceph cluster entered recovery/rebalance mode and, after a long process (around 5 days, mainly due to some OSDs showing latency above 500 ms), it finally completed successfully.

The utilization of the HDD OSDs increased from 40–49% to around 60–69%.

I’m also sharing a useful calculator to help estimate the required space (always consider at least an additional 8 TB for safety).
https://www.virtualizationhowto.com/2024/09/ceph-storage-calculator-to-find-capacity-and-cost/
As a rule of thumb, never apply any changes if the OSD utilization (even on a single node) exceeds 80–85% — you’ll thank yourself later.

Search

Search

Impact of Changing Ceph Pool hdd-pool size from 2/2 to 3/2

rtsx

New Member

Scenario

Attachments

c0ntrol

New Member

SteveITS

Active Member

rtsx

New Member

Johannes S

Distinguished Member

SteveITS

Active Member

Johannes S

Distinguished Member

Post in thread 'Planning advice'

rtsx

New Member

We value your privacy

Impact of Changing Ceph Pool hdd-pool size from 2/2 to 3/2

New Member

Scenario​

Attachments

New Member

Active Member

New Member

Distinguished Member

Active Member

Distinguished Member

New Member

We value your privacy

Scenario