CEPH no longer healthy!

proxtest

Active Member
Mar 19, 2014
108
0
36
After 5 month in production i have done the upgrade last weekend and now i'm stuck with errors on ceph pg's!

HEALTH_ERR 8 pgs inconsistent; 42 scrub errors
pg 11.56d is active+clean+inconsistent, acting [25,0,22]
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]

If i do a ceph pg repair 11.56d i got this:

osd.25 [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head
data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0

But the pg is repaired?

HEALTH_ERR 7 pgs inconsistent; 41 scrub errors
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]
41 scrub errors

But after a new deep scrub starts a new pg error number appears. :-(
There is no disc error in the message log and its happen on differnet nodes.

Before i have done the upgrade there was no such error for 5 months!

Can u take a look at this issues please?

pveversion -v
proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.16-1-pve: 4.4.16-64
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-44
qemu-server: 4.0-86
pve-firmware: 1.1-9
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-57
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-2
pve-container: 1.0-73
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
ceph: 0.94.9-1~bpo80+1

2016-09-24 07:00:00.000332 mon.0 10.11.12.1:6789/0 451636 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 08:00:00.000285 mon.0 10.11.12.1:6789/0 454999 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:00:00.000284 mon.0 10.11.12.1:6789/0 458322 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:38:34.085968 osd.28 10.11.12.5:6804/2680 700 : cluster [ERR] 11.53e shard 20: soid 11/72be2d3e/rbd_data.61777238e1f29.000000000002686c/head data_digest 0xe84d5b90 != known data_digest 0xaa9daaf7 from auth shard 0
2016-09-24 09:38:34.086013 osd.28 10.11.12.5:6804/2680 701 : cluster [ERR] 11.53e shard 20: soid 11/53693d3e/rbd_data.60954238e1f29.000000000003f3cd/head data_digest 0x8e16c3db != known data_digest 0xaf95bf97 from auth shard 0
2016-09-24 09:38:42.798446 osd.28 10.11.12.5:6804/2680 702 : cluster [ERR] 11.53e deep-scrub 0 missing, 2 inconsistent objects
2016-09-24 09:38:42.798450 osd.28 10.11.12.5:6804/2680 703 : cluster [ERR] 11.53e deep-scrub 2 errors
2016-09-24 09:38:54.013721 mon.0 10.11.12.1:6789/0 460401 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:00:00.000357 mon.0 10.11.12.1:6789/0 461480 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:27:27.185377 osd.5 10.11.12.1:6820/3894 740 : cluster [ERR] 11.108 shard 19: soid 11/4e747108/rbd_data.60954238e1f29.000000000005b7ee/head data_digest 0x84cb518f != known data_digest 0xa5e375f3 from auth shard 5
2016-09-24 10:27:43.860144 osd.5 10.11.12.1:6820/3894 741 : cluster [ERR] 11.108 repair 0 missing, 1 inconsistent objects
2016-09-24 10:27:43.860338 osd.5 10.11.12.1:6820/3894 742 : cluster [ERR] 11.108 repair 1 errors, 1 fixed
2016-09-24 10:27:54.034706 mon.0 10.11.12.1:6789/0 463003 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 40 scrub errors
2016-09-24 10:29:59.591817 osd.19 10.11.12.4:6820/3621 342 : cluster [ERR] 11.55b shard 10: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:29:59.591858 osd.19 10.11.12.4:6820/3621 343 : cluster [ERR] 11.55b shard 19: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:30:05.012307 osd.19 10.11.12.4:6820/3621 344 : cluster [ERR] 11.55b deep-scrub 0 missing, 1 inconsistent objects
2016-09-24 10:30:05.012386 osd.19 10.11.12.4:6820/3621 345 : cluster [ERR] 11.55b deep-scrub 2 errors
2016-09-24 10:30:54.035562 mon.0 10.11.12.1:6789/0 463171 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:00:00.000346 mon.0 10.11.12.1:6789/0 464788 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:11:07.758065 osd.25 10.11.12.5:6812/3135 806 : cluster [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0
2016-09-24 11:11:19.922190 osd.25 10.11.12.5:6812/3135 807 : cluster [ERR] 11.56d repair 0 missing, 1 inconsistent objects
2016-09-24 11:11:19.922201 osd.25 10.11.12.5:6812/3135 808 : cluster [ERR] 11.56d repair 1 errors, 1 fixed
2016-09-24 11:11:54.046966 mon.0 10.11.12.1:6789/0 465449 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 41 scrub errors
 
I got one root cause on osd.20 while i was watching the console (happen while i was repairing pg's):

2016-09-24 15:40:00.354459 7f7b8c0d0700 0 filestore(/var/lib/ceph/osd/ceph-20) error (1) Operation not permit
ted not handled on operation 0x4b33600 (21965951.0.0, or op 0, counting from 0) -9> 2016-09-24 15:40:00.354474 7f7b8c0d0700 0 filestore(/var/lib/ceph/osd/ceph-20) unexpected error code
.......
2016-09-24 15:40:00.362629 7f7b8c0d0700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transactio
n(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f7b8c0d0700 time 2016-09-24 15:40:00.358207 os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")
.......
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
.......
log_file /var/log/ceph/ceph-osd.20.log
--- end dump of recent events ---
2016-09-24 15:40:00.406984 7f7b8c0d0700 -1 *** Caught signal (Aborted) **
in thread 7f7b8c0d0700

I start the osd again:

2016-09-24 15:41:20.621249 7fcd0f32b880 1 journal _open /dev/disk/by-partlabel/journal-20 fd 20: 14999879680 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-09-24 15:41:20.635886 7fcd0f32b880 0 filestore(/var/lib/ceph/osd/ceph-20) error (1) Operation not permitted not
handled on operation 0x4d3c01b (21965951.0.0, or op 0, counting from 0)
2016-09-24 15:41:20.635896 7fcd0f32b880 0 filestore(/var/lib/ceph/osd/ceph-20) unexpected error code
2016-09-24 15:41:20.635898 7fcd0f32b880 0 filestore(/var/lib/ceph/osd/ceph-20) transaction dump:

So i restart the node and did a deep-scrub on the whole cluster and got 40 pg's are inconsistent. The errors are spread over the whole cluster, so it was not only one node affected.

HEALTH_ERR 40 pgs inconsistent; 119 scrub errors
pg 11.178 is active+clean+inconsistent, acting [22,17,4]
pg 11.149 is active+clean+inconsistent, acting [15,1,23]
pg 11.4f is active+clean+inconsistent, acting [29,13,21]
pg 11.3f is active+clean+inconsistent, acting [22,8,14]
pg 11.16 is active+clean+inconsistent, acting [1,21,14]
pg 11.1c is active+clean+inconsistent, acting [21,7,1]
pg 11.1e is active+clean+inconsistent, acting [22,25,3]
pg 11.1a is active+clean+inconsistent, acting [21,29,7]
pg 11.1b is active+clean+inconsistent, acting [17,22,3]
pg 11.5d3 is active+clean+inconsistent, acting [17,10,21]
pg 11.5ae is active+clean+inconsistent, acting [29,9,21]
pg 11.589 is active+clean+inconsistent, acting [13,26,21]
pg 11.58a is active+clean+inconsistent, acting [6,22,13]
pg 11.531 is active+clean+inconsistent, acting [24,19,13]
pg 11.527 is active+clean+inconsistent, acting [1,23,14]
pg 11.4fc is active+clean+inconsistent, acting [8,13,21]
pg 11.4a2 is active+clean+inconsistent, acting [6,15,21]
pg 11.3f0 is active+clean+inconsistent, acting [20,24,8]
pg 11.3f9 is active+clean+inconsistent, acting [20,17,11]
pg 11.3ef is active+clean+inconsistent, acting [21,2,10]
pg 11.3de is active+clean+inconsistent, acting [21,11,4]
pg 11.3c2 is active+clean+inconsistent, acting [23,26,9]
pg 11.3b5 is active+clean+inconsistent, acting [19,2,28]
pg 11.3b7 is active+clean+inconsistent, acting [22,27,2]
pg 11.36d is active+clean+inconsistent, acting [21,4,10]
pg 11.334 is active+clean+inconsistent, acting [23,12,0]
pg 11.336 is active+clean+inconsistent, acting [21,2,25]
pg 11.331 is active+clean+inconsistent, acting [13,5,19]
pg 11.30c is active+clean+inconsistent, acting [9,20,17]
pg 11.2b1 is active+clean+inconsistent, acting [12,20,27]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.299 is active+clean+inconsistent, acting [8,14,22]
pg 11.288 is active+clean+inconsistent, acting [20,10,29]
pg 11.289 is active+clean+inconsistent, acting [24,17,21]
pg 11.264 is active+clean+inconsistent, acting [21,11,14]
pg 11.253 is active+clean+inconsistent, acting [19,7,16]
pg 11.22a is active+clean+inconsistent, acting [19,10,28]
pg 11.1ff is active+clean+inconsistent, acting [9,29,19]
pg 11.1f9 is active+clean+inconsistent, acting [9,23,17]
pg 11.18b is active+clean+inconsistent, acting [19,9,24]
119 scrub errors

So i did a repair on all pg's and they all gone active&clean!

After that i have done another deep-scrub and today i have done one again. No more errors and ceph is healty now. But i not clearly understand whats happen, i think there are filesystem errors and some brings the osd to crash. (Some other have the same problems with crashing osd)

Not clear for my why they use xfs where u are not able to do a online filesystem check? I want to check the filesystems but this is a lot of work with xfs if u have to unmount every osd for a simple check. :-(
I have to study about a forced fs check on node startup over alls osd's but i'm not sure this will work with unflushed ssd journals.

I will do the deep-scrub on all active pg's weekly now! It looks like it's have to be done regulary.

Another cause on osd.18 was this:

2016-09-23 07:39:02.163001 7f3d51169700 5 -- op tracker -- seq: 446343, time: 2016-09-23 07:39:02.163000, even
t: commit_sent, op: osd_op(client.12733432.0:139191 rbd_data.61732238e1f29.0000000000009806 [set-alloc-hint object_size 4194304 write_size 4194304,write 2022912~512] 11.8ce6af7a snapc 55=[] ack+ondisk+write+known_if_redirected e46795) -1> 2016-09-23 07:39:02.163088 7f3d51169700 5 -- op tracker -- seq: 446346, time: 2016-09-23 07:39:02.163087, even t: done, op: osd_repop_reply(client.12733432.0:139191 11.37a ondisk, result = 0) 0> 2016-09-23 07:39:02.179778 7f3d61a2a700 -1 *** Caught signal (Aborted) **
in thread 7f3d61a2a700

All VM's are running and i did a fs check inside also with no error. Happy not have lost data! :)

Oh for the record:
2 scripts to scrub, find and repair pg's:

http://eturnerx.blogspot.co.at/2015/02/howto-deep-scrub-on-all-ceph-placement.html

Be careful with the deep-scrub! U get heavy io on your osd's!