ceph failure scenario

czechsys

Renowned Member
Nov 18, 2015
483
53
93
Hi,

we are testing some failure scenarios and spotted unexpected long delay on disk access availability:

Scenario: 3 x pve hosts, every has mgr,mon,2 osds, replica 3/2

1] hard power loss on one node
result: ~ 24s before disks available (grace period >= 20 seconds)

I thought, that node fail will not make such impact, so found those defaults:

osd_hearbeat_grace = 20 #changed to 10
osd_hearbeat_interval = 6 #changed to 3

New test and result is again more than 20 seconds. So, some parameter need to tune.

Anybody tunned it? And how? Or majority is running with defaults? 20+ sec all cluster disks non-availability looks as huge gap...
 
I think you might also be over suffering as you have only a small amount of test OSD's. So during the one node going down the other disks will be hit hard during the peering stage.

With more OSD's and nodes being peering progress will be staggered more across hardware.


How many PGs do you have set in the pool?
 
I think you might also be over suffering as you have only a small amount of test OSD's. So during the one node going down the other disks will be hit hard during the peering stage.

With more OSD's and nodes being peering progress will be staggered more across hardware.


How many PGs do you have set in the pool?

Currently 2 pools, every 128 pgs, so total 256.
Disks are ssds, so the hit isn't such hard.