Ceph With 2 nodes with OSD and 1 node only for quorum

eurowerfr · Jul 19, 2024

Hi,

It's not really a Proxmox question, but on the Ceph support available on proxmox.

I know that it's not the ideal, but's it's a question :

I have 3 servers :
- server A : Proxmox with Ceph : this server has OSD
- server B : Proxmox with Ceph : this server has OSD
- server C : Proxmox with Ceph : this server is only used for the quorum of the cluster (no OSD on it).

All the installation is OK.
The replication is : 2/1 so = osd_pool_default_min_size = 1 osd_pool_default_size = 2

The goal is replication of 2 (1 per server), but if 1 OSD server is down (ie B), I want to continue to use data with the A

I write data on the CephFS : it's OK !

But, if I power down the server B, I have to wait 1800 seconds (30 minutes), until my cluster be available.
On the ceph.log I have, after 1800 seconds :

2024-07-19T18:44:14.967069+0200 mon.nas1 (mon.0) 119376 : cluster 3 Health check update: 369 slow ops, oldest one blocked for 1803 sec, daemons [osd.21,osd.23,mon.hypernas1] have slow ops. (SLOW_OPS)
2024-07-19T18:44:14.967198+0200 mon.nas1 (mon.0) 119377 : cluster 1 osd.1 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967208+0200 mon.nas1 (mon.0) 119378 : cluster 1 osd.3 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967215+0200 mon.nas1 (mon.0) 119379 : cluster 1 osd.5 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967219+0200 mon.nas1 (mon.0) 119380 : cluster 1 osd.7 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967223+0200 mon.nas1 (mon.0) 119381 : cluster 1 osd.9 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967228+0200 mon.nas1 (mon.0) 119382 : cluster 1 osd.11 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967234+0200 mon.nas1 (mon.0) 119383 : cluster 1 osd.13 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967239+0200 mon.nas1 (mon.0) 119384 : cluster 1 osd.15 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967243+0200 mon.nas1 (mon.0) 119385 : cluster 1 osd.17 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967249+0200 mon.nas1 (mon.0) 119386 : cluster 1 osd.19 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967255+0200 mon.nas1 (mon.0) 119387 : cluster 1 osd.20 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.967261+0200 mon.nas1 (mon.0) 119388 : cluster 1 osd.22 marked down after no beacon for 900.615915 seconds
2024-07-19T18:44:14.969288+0200 mon.nas1 (mon.0) 119389 : cluster 3 Health check failed: 12 osds down (OSD_DOWN)
2024-07-19T18:44:14.969298+0200 mon.nas1 (mon.0) 119390 : cluster 3 Health check failed: 1 host (12 osds) down (OSD_HOST_DOWN)

I have modify the OSD mon_osd_report_timeout from 900 seconds to 60 seconds, but its' not OK.

Do you know how to reduce this 1800 secondes time out before my cluster is available with only once node ?

Thanks very much ...

gurubert · Jul 20, 2024

Do not use Ceph with only two nodes. Use DRBD or Gluster in such small setups.

eurowerfr · Jul 21, 2024

gurubert said:
Do not use Ceph with only two nodes. Use DRBD or Gluster in such small setups.

Thank you, but why ? If quorum is ok, why don't use Ceph with only two nodes with OSD (my ceph is 3 nodes : 2 with OSD and 1 for only quorum).

Ceph autorize a min_size of 1 but in the code, the availibility is only after 30 minutes only in this case compare to a 3 nodes with OSD ? It's the question

UdoB · Jul 22, 2024

Yes, you can set size=2/min_size=1 and Ceph will run. It is not recommended as it is error prone. If you try go this way: please document here in this thread what you did and how the system behaves when (not: if) any error (hardware or software) occurs.

Redhat:

https://docs.redhat.com/en/document...nsiderations-for-red-hat-ceph-storage_install

“ Number of nodes | Minimum of 3 nodes required. “

Suse:

https://documentation.suse.com/ses/7.1/html/ses-all/storage-bp-hwreq.html

“ 2.3.1 Minimum cluster configuration - At least four physical nodes (OSD nodes) ”

Thomas Krenn:

https://www.thomas-krenn.com/de/wiki/Ceph_Perfomance_Guide_-_Sizing_&_Testing

“ Nodes: die Minimalanzahl an Nodes für den Einsatz von Ceph beträgt 3. Datenträger: jeder dieser Nodes benötigt mindestens 4 Storage-Datenträger (OSDs). ”

And Proxmox:

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_recommendations_for_a_healthy_ceph_cluster

" To build a hyper-converged Proxmox + Ceph Cluster, you must use at least three(preferably) identical servers for the setup. "

Basically everybody says the same: two nodes are not sufficient.

gurubert · Jul 22, 2024

eurowerfr said:
Thank you, but why ?

Because Ceph's overhead will deliver sub-par performance compared to DRBD on two nodes.
Not to speak about the issues with consistency.

spirit · Jul 22, 2024

you need 3 nodes to have 3 monitors to manage ceph quorum. (ceph quorum != proxmox corosync quorum).

Then you can creates osd only on 2 nodes with replicat2.

Search

Search

Ceph With 2 nodes with OSD and 1 node only for quorum

eurowerfr

Renowned Member

gurubert

Famous Member

eurowerfr

Renowned Member

UdoB

Famous Member

gurubert

Famous Member

spirit

Distinguished Member