Ceph Cluster using RBD down

Bruno Emanuel · Apr 29, 2016

We have 4 nodes using ceph and monitoring storage.
When 1 of then is turned off the storage down.
The network is working and its comunicating, but the storage stops.

wosp · Apr 30, 2016

What is the status op CEPH when you shut down one node (ceph -s), is it really going into RO-mode? How many mon's do you have and how many OSD's per node? Can you post CEPH config and crush map?

hansm · Apr 30, 2016

Please post your /etc/pve/storage.cfg

Bruno Emanuel · May 2, 2016

Hi,
The First moment:
#ceph -s
cluster 901bdd67-0f28-4050-a0c9-68c45ee19dc1
health HEALTH_WARN
64 pgs degraded
64 pgs stuck degraded
64 pgs stuck unclean
64 pgs stuck undersized
64 pgs undersized
recovery 64450/127494 objects degraded (50.551%)
1 mons down, quorum 0,1,2 2,0,1
mon.0 low disk space
mon.1 low disk space
monmap e4: 4 mons at {0=192.168.xxx.x1:6789/0,1=192.168.xxx.x2:6789/0,2=192.168.xxx.x0:6789/0,3=192.168.xxx.x5:6789/0}
election epoch 2480, quorum 0,1,2 2,0,1
osdmap e845: 7 osds: 2 up, 2 in; 8 remapped pgs
pgmap v391070: 64 pgs, 1 pools, 164 GB data, 42498 objects
242 GB used, 2818 GB / 3060 GB avail
64450/127494 objects degraded (50.551%)
64 active+undersized+degraded
client io 178 kB/s wr, 44 op/s

Storage.cfg :
rbd: teste
monhost 192.168.xxx.x2;192.168.xxx.x5;192.168.xxx.x1;192.168.xxx.x0
krbd
content images,rootdir
pool rbd
username admin

When I turn off de node 192.168.xxx.x5:
#ceph -s
cluster 901bdd67-0f28-4050-a0c9-68c45ee19dc1
health HEALTH_WARN
64 pgs degraded
64 pgs stuck degraded
64 pgs stuck unclean
64 pgs stuck undersized
64 pgs undersized
4 requests are blocked > 32 sec
recovery 64450/127494 objects degraded (50.551%)
1 mons down, quorum 0,1,2 2,0,1
mon.0 low disk space
mon.1 low disk space
monmap e4: 4 mons at {0=192.168.200.11:6789/0,1=192.168.200.12:6789/0,2=192.168.200.10:6789/0,3=192.168.200.15:6789/0}
election epoch 2480, quorum 0,1,2 2,0,1
osdmap e845: 7 osds: 2 up, 2 in; 8 remapped pgs
pgmap v391130: 64 pgs, 1 pools, 164 GB data, 42498 objects
242 GB used, 2818 GB / 3060 GB avail
64450/127494 objects degraded (50.551%)
64 active+undersized+degraded

#ceph osd pool get rbd size
size: 3

# ceph osd pool get rbd min_size
min_size: 1

I restarted the node 192.168.xxx.x5, wait some time and shutdown node 192.168.xxx.x0:

# ceph -s
2016-05-02 10:55:16.389257 7f1a382d7700 0 -- :/2271373730 >> 192.168.xxx.x0:6789/0 pipe(0x7f1a3405e120 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a3405a7f0).fault
2016-05-02 10:55:34.229353 7f1a317fa700 0 -- 192.168.xxx.x1:0/2271373730 >> 192.168.xxx.x0:6789/0 pipe(0x7f1a28006e20 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a2800b0c0).fault
2016-05-02 10:55:39.517966 7f1a381d6700 0 -- 192.168.xxx.x1:0/2271373730 >> 192.168.xxx.x5:6789/0 pipe(0x7f1a28006e20 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a2800b0c0).fault
2016-05-02 10:55:46.241242 7f1a382d7700 0 -- 192.168.xxx.x1:0/2271373730 >> 192.168.xxx.x0:6789/0 pipe(0x7f1a28006e20 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a2800b0c0).fault
2016-05-02 10:55:55.265325 7f1a381d6700 0 -- 192.168.xxx.x1:0/2271373730 >> 192.168.xxx.x0:6789/0 pipe(0x7f1a28000da0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a28005040).fault
2016-05-02 10:55:57.519295 7f1a382d7700 0 -- 192.168.xxx.x1:0/2271373730 >> 192.168.xxx.x5:6789/0 pipe(0x7f1a28006e20 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a2800b0c0).fault
2016-05-02 10:56:01.265276 7f1a381d6700 0 -- 192.168.xxx.x1:0/2271373730 >> 192.168.xxx.x0:6789/0 pipe(0x7f1a28000da0 sd=4 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a28005040).fault
2016-05-02 10:56:07.265245 7f1a317fa700 0 -- 192.168.xxx.x1:0/2271373730 >> 192.168.xxx.x0:6789/0 pipe(0x7f1a28000da0 sd=3 :0 s=1 pgs=0 cs=0 l=1 c=0x7f1a28005040).fault

Bruno Emanuel · May 2, 2016

Crush and Ceph configuration attached.

wosp · May 2, 2016

You already have a unhealthy cluster before you do the reboot. Status should be HEALTH_OK before you start any maintenance work.

Code:

osdmap e845: 7 osds: 2 up, 2 in; 8 remapped pgs

So you have 7 OSD's (HDD's) and only 2 of them are online/working!? Also 1 mon is down. You have 4 mons, so when you reboot a node (with a running mon) you have 2 mons left, which is not a majority anymore.

Please first check all nodes if the OSD and mon processes are running, if not, see logfiles (why they are not started) and try to start them manually.

Bruno Emanuel · May 2, 2016

Thanks, i noted that the 4th node wasn't started monitor. After remove and recriate this works. The folder /var/lib/ceph/osd/ceph-$id wasn't created. When i recreate the folder it's ok.
After this i forced start and now all it's ok.
# pveceph status
{
"quorum_names" : [
"2",
"0",
"1",
"3"
],
"fsid" : "901bdd67-0f28-4050-a0c9-68c45ee19dc1",
"quorum" : [
0,
1,
2,
3
],
"health" : {
"detail" : [
"mon.3 addr 192.168.xxx.x5:6789/0 clock skew 5.10404s > max 0.05s (latency 0.263264s)"
],
"health" : {
"health_services" : [
{
"mons" : [
{
"last_updated" : "2016-05-02 17:31:52.899567",
"name" : "2",
"kb_total" : 3546848,
"kb_used" : 2115984,
"kb_avail" : 1230980,
"health" : "HEALTH_OK",
"store_stats" : {
"bytes_total" : 30793980,
"bytes_log" : 2980886,
"bytes_sst" : 0,
"bytes_misc" : 27813094,
"last_updated" : "0.000000"
},
"avail_percent" : 34
},
{
"health" : "HEALTH_WARN",
"kb_avail" : 782408,
"kb_used" : 2564556,
"avail_percent" : 22,
"store_stats" : {
"last_updated" : "0.000000",
"bytes_misc" : 34191260,
"bytes_log" : 3143570,
"bytes_total" : 37334830,
"bytes_sst" : 0
},
"kb_total" : 3546848,
"health_detail" : "low disk space",
"last_updated" : "2016-05-02 17:31:23.090856",
"name" : "0"
},
{
"health_detail" : "low disk space",
"kb_total" : 3546848,
"name" : "1",
"last_updated" : "2016-05-02 17:31:27.150741",
"kb_avail" : 1009388,
"health" : "HEALTH_WARN",
"kb_used" : 2337576,
"avail_percent" : 28,
"store_stats" : {
"bytes_sst" : 0,
"bytes_total" : 35662769,
"bytes_log" : 3369694,
"bytes_misc" : 32293075,
"last_updated" : "0.000000"
}
},
{
"kb_used" : 2071456,
"health" : "HEALTH_OK",
"kb_avail" : 91831792,
"store_stats" : {
"last_updated" : "0.000000",
"bytes_misc" : 25529912,
"bytes_sst" : 0,
"bytes_log" : 3422445,
"bytes_total" : 28952357
},
"avail_percent" : 92,
"last_updated" : "2016-05-02 17:31:23.489128",
"name" : "3",
"kb_total" : 98952796
}
]
}
]
},
"overall_status" : "HEALTH_WARN",
"summary" : [
{
"severity" : "HEALTH_WARN",
"summary" : "mon.0 low disk space"
},
{
"summary" : "mon.1 low disk space",
"severity" : "HEALTH_WARN"
},
{
"summary" : "Monitor clock skew detected ",
"severity" : "HEALTH_WARN"
}
],
"timechecks" : {
"mons" : [
{
"skew" : 0,
"latency" : 0,
"name" : "2",
"health" : "HEALTH_OK"
},
{
"skew" : 0.000372,
"name" : "0",
"latency" : 0.263403,
"health" : "HEALTH_OK"
},
{
"health" : "HEALTH_OK",
"skew" : 0.00036,
"latency" : 0.263221,
"name" : "1"
},
{
"health" : "HEALTH_WARN",
"name" : "3",
"latency" : 0.263264,
"details" : "clock skew 5.10404s > max 0.05s",
"skew" : -5.104037
}
],
"epoch" : 2602,
"round" : 4,
"round_status" : "finished"
}
},
"monmap" : {
"fsid" : "901bdd67-0f28-4050-a0c9-68c45ee19dc1",
"modified" : "2016-05-02 16:59:29.970959",
"mons" : [
{
"rank" : 0,
"addr" : "192.168.xxx.x0:6789/0",
"name" : "2"
},
{
"rank" : 1,
"name" : "0",
"addr" : "192.168.xxx.x1:6789/0"
},
{
"name" : "1",
"addr" : "192.168.xxx.x2:6789/0",
"rank" : 2
},
{
"addr" : "192.168.xxx.x5:6789/0",
"name" : "3",
"rank" : 3
}
],
"epoch" : 6,
"created" : "2016-04-18 17:37:54.143234"
},
"mdsmap" : {
"by_rank" : [],
"up" : 0,
"in" : 0,
"max" : 0,
"epoch" : 1
},
"osdmap" : {
"osdmap" : {
"num_remapped_pgs" : 0,
"num_osds" : 7,
"epoch" : 1361,
"full" : false,
"num_in_osds" : 6,
"nearfull" : false,
"num_up_osds" : 6
}
},
"pgmap" : {
"version" : 399841,
"read_bytes_sec" : 16389,
"pgs_by_state" : [
{
"state_name" : "active+clean",
"count" : 64
}
],
"write_bytes_sec" : 126863,
"bytes_used" : 529655365632,
"bytes_total" : 4464561246208,
"op_per_sec" : 27,
"data_bytes" : 176576141414,
"bytes_avail" : 3934905880576,
"num_pgs" : 64
},
"election_epoch" : 2602
}

Bruno Emanuel · May 2, 2016

osd stat
osdmap e1410: 7 osds: 7 up, 4 in; 38 remapped pgs

osd tree
ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY
-1 4.33992 root default
-2 0.53998 host srvl00pmx002
0 0.26999 osd.0 up 1.00000 1.00000
1 0.26999 osd.1 up 0 1.00000
-3 0.53998 host srvl00pmx004
2 0.26999 osd.2 up 1.00000 1.00000
3 0.26999 osd.3 up 0 1.00000
-4 0.53998 host srvl00pmx001
4 0.26999 osd.4 up 1.00000 1.00000
5 0.26999 osd.5 up 0 1.00000
-5 2.71999 host srvl00pmx005
7 2.71999 osd.7 up 1.00000 1.00000

wosp · May 2, 2016

See this for a fix of the clock skew errors, because they will come back: https://forum.proxmox.com/threads/pve-4-1-systemd-timesyncd-and-ceph-clock-skew.27043/

Search

Search

Ceph Cluster using RBD down

Bruno Emanuel

Member

wosp

Renowned Member

hansm

Well-Known Member

Bruno Emanuel

Member

Bruno Emanuel

Member

Attachments

wosp

Renowned Member

Bruno Emanuel

Member

Bruno Emanuel

Member

wosp

Renowned Member