Ceph and node reboot

lweidig

Active Member
Oct 20, 2011
104
2
38
Sheboygan, WI
EVERY time we need to restart one of our nodes in our cluster we are faced with this HORRIFIC impact on disk I/O while the Ceph pools need to be "rebuilt". It virtually consumes all of the resources and we need to know how to prevent this. I am simply talking about issuing a 'reboot' after applying a new kernel or similar change.

After the system reports stuff like:

PGs
activating:8
activating+degraded:64
active+clean:386
active+clean+inconsistent:1
peering:53

It takes sometime HOURS to clear up from this and all the time the entire cluster performs horribly. This cannot be the way things are supposed to be. Is something not being handled correctly by the system? We have a pool with size of 3, min of 2 and NEVER reboot more than a single node at a time in a 4 node cluster.
 
My Ceph Clusters are healthy again in a few seconds (after rebooting one node).

Please tell more/all details about your Ceph cluster.
 
Four nodes with 8-10 osd's per node. Two of the OSDs are SSD and the others 10K 600GB SAS. The SSD drives are being used for the DB / WAL. Storage capacity about 30% at this point. The nodes are interconnected to each other with dual 10Gbps Intel adapters running LACP for the storage network. They have multiple 1Gbps adapters front facing for access. RAM anywhere from 64GB - 192GB per node. Running Proxmox 5.1 and every node has a monitor running.

Other information?
 
Four nodes with 8-10 osd's per node. Two of the OSDs are SSD and the others 10K 600GB SAS. The SSD drives are being used for the DB / WAL. Storage capacity about 30% at this point. The nodes are interconnected to each other with dual 10Gbps Intel adapters running LACP for the storage network. They have multiple 1Gbps adapters front facing for access. RAM anywhere from 64GB - 192GB per node. Running Proxmox 5.1 and every node has a monitor running.

Other information?
Hi,
you can limit the rebuild impact with the right setting in ceph.conf (can set on the fly with inject)
Code:
[osd]
osd max backfills = 1
osd recovery max active = 1
Udo
 
Four nodes ...

and every node has a monitor running.
Hi,
that's not a good choice (but not the issue).
You must use an odd number of Mons and for normal sized ceph-cluster three mons are the right choice (and enough).

Perhaps the SSDs are not the right one? What kind of SSD do you use? And why 2 and not 4 (every node) and why the SSD is an osd? And not only partitions for the DB/WAL for the hdd-osd?

Udo
 
how long do your nodes take to reboot? does Ceph think the OSDs are gone for good before your node returns? a short downtime of an OSD/host should not trigger rebalancing..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!