CEPH no longer healthy!

proxtest

Active Member
Mar 19, 2014
108
0
36
After 5 month in production i have done the upgrade last weekend and now i'm stuck with errors on ceph pg's!

HEALTH_ERR 8 pgs inconsistent; 42 scrub errors
pg 11.56d is active+clean+inconsistent, acting [25,0,22]
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]

If i do a ceph pg repair 11.56d i got this:

osd.25 [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head
data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0

But the pg is repaired?

HEALTH_ERR 7 pgs inconsistent; 41 scrub errors
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]
41 scrub errors

But after a new deep scrub starts a new pg error number appears. :-(
There is no disc error in the message log and its happen on differnet nodes.

Before i have done the upgrade there was no such error for 5 months!

Can u take a look at this issues please?

pveversion -v
proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.16-1-pve: 4.4.16-64
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-44
qemu-server: 4.0-86
pve-firmware: 1.1-9
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-57
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-2
pve-container: 1.0-73
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
ceph: 0.94.9-1~bpo80+1

2016-09-24 07:00:00.000332 mon.0 10.11.12.1:6789/0 451636 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 08:00:00.000285 mon.0 10.11.12.1:6789/0 454999 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:00:00.000284 mon.0 10.11.12.1:6789/0 458322 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:38:34.085968 osd.28 10.11.12.5:6804/2680 700 : cluster [ERR] 11.53e shard 20: soid 11/72be2d3e/rbd_data.61777238e1f29.000000000002686c/head data_digest 0xe84d5b90 != known data_digest 0xaa9daaf7 from auth shard 0
2016-09-24 09:38:34.086013 osd.28 10.11.12.5:6804/2680 701 : cluster [ERR] 11.53e shard 20: soid 11/53693d3e/rbd_data.60954238e1f29.000000000003f3cd/head data_digest 0x8e16c3db != known data_digest 0xaf95bf97 from auth shard 0
2016-09-24 09:38:42.798446 osd.28 10.11.12.5:6804/2680 702 : cluster [ERR] 11.53e deep-scrub 0 missing, 2 inconsistent objects
2016-09-24 09:38:42.798450 osd.28 10.11.12.5:6804/2680 703 : cluster [ERR] 11.53e deep-scrub 2 errors
2016-09-24 09:38:54.013721 mon.0 10.11.12.1:6789/0 460401 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:00:00.000357 mon.0 10.11.12.1:6789/0 461480 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:27:27.185377 osd.5 10.11.12.1:6820/3894 740 : cluster [ERR] 11.108 shard 19: soid 11/4e747108/rbd_data.60954238e1f29.000000000005b7ee/head data_digest 0x84cb518f != known data_digest 0xa5e375f3 from auth shard 5
2016-09-24 10:27:43.860144 osd.5 10.11.12.1:6820/3894 741 : cluster [ERR] 11.108 repair 0 missing, 1 inconsistent objects
2016-09-24 10:27:43.860338 osd.5 10.11.12.1:6820/3894 742 : cluster [ERR] 11.108 repair 1 errors, 1 fixed
2016-09-24 10:27:54.034706 mon.0 10.11.12.1:6789/0 463003 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 40 scrub errors
2016-09-24 10:29:59.591817 osd.19 10.11.12.4:6820/3621 342 : cluster [ERR] 11.55b shard 10: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:29:59.591858 osd.19 10.11.12.4:6820/3621 343 : cluster [ERR] 11.55b shard 19: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:30:05.012307 osd.19 10.11.12.4:6820/3621 344 : cluster [ERR] 11.55b deep-scrub 0 missing, 1 inconsistent objects
2016-09-24 10:30:05.012386 osd.19 10.11.12.4:6820/3621 345 : cluster [ERR] 11.55b deep-scrub 2 errors
2016-09-24 10:30:54.035562 mon.0 10.11.12.1:6789/0 463171 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:00:00.000346 mon.0 10.11.12.1:6789/0 464788 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:11:07.758065 osd.25 10.11.12.5:6812/3135 806 : cluster [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0
2016-09-24 11:11:19.922190 osd.25 10.11.12.5:6812/3135 807 : cluster [ERR] 11.56d repair 0 missing, 1 inconsistent objects
2016-09-24 11:11:19.922201 osd.25 10.11.12.5:6812/3135 808 : cluster [ERR] 11.56d repair 1 errors, 1 fixed
2016-09-24 11:11:54.046966 mon.0 10.11.12.1:6789/0 465449 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 41 scrub errors
 
I got one root cause on osd.20 while i was watching the console (happen while i was repairing pg's):

2016-09-24 15:40:00.354459 7f7b8c0d0700 0 filestore(/var/lib/ceph/osd/ceph-20) error (1) Operation not permit
ted not handled on operation 0x4b33600 (21965951.0.0, or op 0, counting from 0) -9> 2016-09-24 15:40:00.354474 7f7b8c0d0700 0 filestore(/var/lib/ceph/osd/ceph-20) unexpected error code
.......
2016-09-24 15:40:00.362629 7f7b8c0d0700 -1 os/FileStore.cc: In function 'unsigned int FileStore::_do_transactio
n(ObjectStore::Transaction&, uint64_t, int, ThreadPool::TPHandle*)' thread 7f7b8c0d0700 time 2016-09-24 15:40:00.358207 os/FileStore.cc: 2761: FAILED assert(0 == "unexpected error")
.......
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
.......
log_file /var/log/ceph/ceph-osd.20.log
--- end dump of recent events ---
2016-09-24 15:40:00.406984 7f7b8c0d0700 -1 *** Caught signal (Aborted) **
in thread 7f7b8c0d0700

I start the osd again:

2016-09-24 15:41:20.621249 7fcd0f32b880 1 journal _open /dev/disk/by-partlabel/journal-20 fd 20: 14999879680 bytes, block size 4096 bytes, directio = 1, aio = 1
2016-09-24 15:41:20.635886 7fcd0f32b880 0 filestore(/var/lib/ceph/osd/ceph-20) error (1) Operation not permitted not
handled on operation 0x4d3c01b (21965951.0.0, or op 0, counting from 0)
2016-09-24 15:41:20.635896 7fcd0f32b880 0 filestore(/var/lib/ceph/osd/ceph-20) unexpected error code
2016-09-24 15:41:20.635898 7fcd0f32b880 0 filestore(/var/lib/ceph/osd/ceph-20) transaction dump:

So i restart the node and did a deep-scrub on the whole cluster and got 40 pg's are inconsistent. The errors are spread over the whole cluster, so it was not only one node affected.

HEALTH_ERR 40 pgs inconsistent; 119 scrub errors
pg 11.178 is active+clean+inconsistent, acting [22,17,4]
pg 11.149 is active+clean+inconsistent, acting [15,1,23]
pg 11.4f is active+clean+inconsistent, acting [29,13,21]
pg 11.3f is active+clean+inconsistent, acting [22,8,14]
pg 11.16 is active+clean+inconsistent, acting [1,21,14]
pg 11.1c is active+clean+inconsistent, acting [21,7,1]
pg 11.1e is active+clean+inconsistent, acting [22,25,3]
pg 11.1a is active+clean+inconsistent, acting [21,29,7]
pg 11.1b is active+clean+inconsistent, acting [17,22,3]
pg 11.5d3 is active+clean+inconsistent, acting [17,10,21]
pg 11.5ae is active+clean+inconsistent, acting [29,9,21]
pg 11.589 is active+clean+inconsistent, acting [13,26,21]
pg 11.58a is active+clean+inconsistent, acting [6,22,13]
pg 11.531 is active+clean+inconsistent, acting [24,19,13]
pg 11.527 is active+clean+inconsistent, acting [1,23,14]
pg 11.4fc is active+clean+inconsistent, acting [8,13,21]
pg 11.4a2 is active+clean+inconsistent, acting [6,15,21]
pg 11.3f0 is active+clean+inconsistent, acting [20,24,8]
pg 11.3f9 is active+clean+inconsistent, acting [20,17,11]
pg 11.3ef is active+clean+inconsistent, acting [21,2,10]
pg 11.3de is active+clean+inconsistent, acting [21,11,4]
pg 11.3c2 is active+clean+inconsistent, acting [23,26,9]
pg 11.3b5 is active+clean+inconsistent, acting [19,2,28]
pg 11.3b7 is active+clean+inconsistent, acting [22,27,2]
pg 11.36d is active+clean+inconsistent, acting [21,4,10]
pg 11.334 is active+clean+inconsistent, acting [23,12,0]
pg 11.336 is active+clean+inconsistent, acting [21,2,25]
pg 11.331 is active+clean+inconsistent, acting [13,5,19]
pg 11.30c is active+clean+inconsistent, acting [9,20,17]
pg 11.2b1 is active+clean+inconsistent, acting [12,20,27]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.299 is active+clean+inconsistent, acting [8,14,22]
pg 11.288 is active+clean+inconsistent, acting [20,10,29]
pg 11.289 is active+clean+inconsistent, acting [24,17,21]
pg 11.264 is active+clean+inconsistent, acting [21,11,14]
pg 11.253 is active+clean+inconsistent, acting [19,7,16]
pg 11.22a is active+clean+inconsistent, acting [19,10,28]
pg 11.1ff is active+clean+inconsistent, acting [9,29,19]
pg 11.1f9 is active+clean+inconsistent, acting [9,23,17]
pg 11.18b is active+clean+inconsistent, acting [19,9,24]
119 scrub errors

So i did a repair on all pg's and they all gone active&clean!

After that i have done another deep-scrub and today i have done one again. No more errors and ceph is healty now. But i not clearly understand whats happen, i think there are filesystem errors and some brings the osd to crash. (Some other have the same problems with crashing osd)

Not clear for my why they use xfs where u are not able to do a online filesystem check? I want to check the filesystems but this is a lot of work with xfs if u have to unmount every osd for a simple check. :-(
I have to study about a forced fs check on node startup over alls osd's but i'm not sure this will work with unflushed ssd journals.

I will do the deep-scrub on all active pg's weekly now! It looks like it's have to be done regulary.

Another cause on osd.18 was this:

2016-09-23 07:39:02.163001 7f3d51169700 5 -- op tracker -- seq: 446343, time: 2016-09-23 07:39:02.163000, even
t: commit_sent, op: osd_op(client.12733432.0:139191 rbd_data.61732238e1f29.0000000000009806 [set-alloc-hint object_size 4194304 write_size 4194304,write 2022912~512] 11.8ce6af7a snapc 55=[] ack+ondisk+write+known_if_redirected e46795) -1> 2016-09-23 07:39:02.163088 7f3d51169700 5 -- op tracker -- seq: 446346, time: 2016-09-23 07:39:02.163087, even t: done, op: osd_repop_reply(client.12733432.0:139191 11.37a ondisk, result = 0) 0> 2016-09-23 07:39:02.179778 7f3d61a2a700 -1 *** Caught signal (Aborted) **
in thread 7f3d61a2a700

All VM's are running and i did a fs check inside also with no error. Happy not have lost data! :)

Oh for the record:
2 scripts to scrub, find and repair pg's:

http://eturnerx.blogspot.co.at/2015/02/howto-deep-scrub-on-all-ceph-placement.html

Be careful with the deep-scrub! U get heavy io on your osd's!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!