Ceph and node reboot

lweidig · Feb 9, 2018

EVERY time we need to restart one of our nodes in our cluster we are faced with this HORRIFIC impact on disk I/O while the Ceph pools need to be "rebuilt". It virtually consumes all of the resources and we need to know how to prevent this. I am simply talking about issuing a 'reboot' after applying a new kernel or similar change.

After the system reports stuff like:

PGs
activating:8
activating+degraded:64
active+clean:386
active+clean+inconsistent:1
peering:53

It takes sometime HOURS to clear up from this and all the time the entire cluster performs horribly. This cannot be the way things are supposed to be. Is something not being handled correctly by the system? We have a pool with size of 3, min of 2 and NEVER reboot more than a single node at a time in a 4 node cluster.

tom · Feb 9, 2018

My Ceph Clusters are healthy again in a few seconds (after rebooting one node).

Please tell more/all details about your Ceph cluster.

lweidig · Feb 9, 2018

Four nodes with 8-10 osd's per node. Two of the OSDs are SSD and the others 10K 600GB SAS. The SSD drives are being used for the DB / WAL. Storage capacity about 30% at this point. The nodes are interconnected to each other with dual 10Gbps Intel adapters running LACP for the storage network. They have multiple 1Gbps adapters front facing for access. RAM anywhere from 64GB - 192GB per node. Running Proxmox 5.1 and every node has a monitor running.

Other information?

udo · Feb 9, 2018

lweidig said:
Four nodes with 8-10 osd's per node. Two of the OSDs are SSD and the others 10K 600GB SAS. The SSD drives are being used for the DB / WAL. Storage capacity about 30% at this point. The nodes are interconnected to each other with dual 10Gbps Intel adapters running LACP for the storage network. They have multiple 1Gbps adapters front facing for access. RAM anywhere from 64GB - 192GB per node. Running Proxmox 5.1 and every node has a monitor running.

Other information?

Hi,
you can limit the rebuild impact with the right setting in ceph.conf (can set on the fly with inject)

Code:

[osd]
osd max backfills = 1
osd recovery max active = 1

Udo

udo · Feb 9, 2018

lweidig said:
Four nodes ...

and every node has a monitor running.

Hi,
that's not a good choice (but not the issue).
You must use an odd number of Mons and for normal sized ceph-cluster three mons are the right choice (and enough).

Perhaps the SSDs are not the right one? What kind of SSD do you use? And why 2 and not 4 (every node) and why the SSD is an osd? And not only partitions for the DB/WAL for the hdd-osd?

Udo

fabian · Feb 12, 2018

how long do your nodes take to reboot? does Ceph think the OSDs are gone for good before your node returns? a short downtime of an OSD/host should not trigger rebalancing..

Search

Search

Ceph and node reboot

lweidig

Active Member

tom

Proxmox Staff Member

lweidig

Active Member

udo

Distinguished Member

udo

Distinguished Member

fabian

Proxmox Staff Member