After 5 month in production i have done the upgrade last weekend and now i'm stuck with errors on ceph pg's!
HEALTH_ERR 8 pgs inconsistent; 42 scrub errors
pg 11.56d is active+clean+inconsistent, acting [25,0,22]
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]
If i do a ceph pg repair 11.56d i got this:
osd.25 [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head
data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0
But the pg is repaired?
HEALTH_ERR 7 pgs inconsistent; 41 scrub errors
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]
41 scrub errors
But after a new deep scrub starts a new pg error number appears. :-(
There is no disc error in the message log and its happen on differnet nodes.
Before i have done the upgrade there was no such error for 5 months!
Can u take a look at this issues please?
pveversion -v
proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.16-1-pve: 4.4.16-64
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-44
qemu-server: 4.0-86
pve-firmware: 1.1-9
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-57
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-2
pve-container: 1.0-73
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
ceph: 0.94.9-1~bpo80+1
2016-09-24 07:00:00.000332 mon.0 10.11.12.1:6789/0 451636 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 08:00:00.000285 mon.0 10.11.12.1:6789/0 454999 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:00:00.000284 mon.0 10.11.12.1:6789/0 458322 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:38:34.085968 osd.28 10.11.12.5:6804/2680 700 : cluster [ERR] 11.53e shard 20: soid 11/72be2d3e/rbd_data.61777238e1f29.000000000002686c/head data_digest 0xe84d5b90 != known data_digest 0xaa9daaf7 from auth shard 0
2016-09-24 09:38:34.086013 osd.28 10.11.12.5:6804/2680 701 : cluster [ERR] 11.53e shard 20: soid 11/53693d3e/rbd_data.60954238e1f29.000000000003f3cd/head data_digest 0x8e16c3db != known data_digest 0xaf95bf97 from auth shard 0
2016-09-24 09:38:42.798446 osd.28 10.11.12.5:6804/2680 702 : cluster [ERR] 11.53e deep-scrub 0 missing, 2 inconsistent objects
2016-09-24 09:38:42.798450 osd.28 10.11.12.5:6804/2680 703 : cluster [ERR] 11.53e deep-scrub 2 errors
2016-09-24 09:38:54.013721 mon.0 10.11.12.1:6789/0 460401 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:00:00.000357 mon.0 10.11.12.1:6789/0 461480 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:27:27.185377 osd.5 10.11.12.1:6820/3894 740 : cluster [ERR] 11.108 shard 19: soid 11/4e747108/rbd_data.60954238e1f29.000000000005b7ee/head data_digest 0x84cb518f != known data_digest 0xa5e375f3 from auth shard 5
2016-09-24 10:27:43.860144 osd.5 10.11.12.1:6820/3894 741 : cluster [ERR] 11.108 repair 0 missing, 1 inconsistent objects
2016-09-24 10:27:43.860338 osd.5 10.11.12.1:6820/3894 742 : cluster [ERR] 11.108 repair 1 errors, 1 fixed
2016-09-24 10:27:54.034706 mon.0 10.11.12.1:6789/0 463003 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 40 scrub errors
2016-09-24 10:29:59.591817 osd.19 10.11.12.4:6820/3621 342 : cluster [ERR] 11.55b shard 10: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:29:59.591858 osd.19 10.11.12.4:6820/3621 343 : cluster [ERR] 11.55b shard 19: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:30:05.012307 osd.19 10.11.12.4:6820/3621 344 : cluster [ERR] 11.55b deep-scrub 0 missing, 1 inconsistent objects
2016-09-24 10:30:05.012386 osd.19 10.11.12.4:6820/3621 345 : cluster [ERR] 11.55b deep-scrub 2 errors
2016-09-24 10:30:54.035562 mon.0 10.11.12.1:6789/0 463171 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:00:00.000346 mon.0 10.11.12.1:6789/0 464788 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:11:07.758065 osd.25 10.11.12.5:6812/3135 806 : cluster [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0
2016-09-24 11:11:19.922190 osd.25 10.11.12.5:6812/3135 807 : cluster [ERR] 11.56d repair 0 missing, 1 inconsistent objects
2016-09-24 11:11:19.922201 osd.25 10.11.12.5:6812/3135 808 : cluster [ERR] 11.56d repair 1 errors, 1 fixed
2016-09-24 11:11:54.046966 mon.0 10.11.12.1:6789/0 465449 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 41 scrub errors
HEALTH_ERR 8 pgs inconsistent; 42 scrub errors
pg 11.56d is active+clean+inconsistent, acting [25,0,22]
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]
If i do a ceph pg repair 11.56d i got this:
osd.25 [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head
data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0
But the pg is repaired?
HEALTH_ERR 7 pgs inconsistent; 41 scrub errors
pg 11.55b is active+clean+inconsistent, acting [19,10,4]
pg 11.53e is active+clean+inconsistent, acting [28,0,20]
pg 11.4a5 is active+clean+inconsistent, acting [8,20,13]
pg 11.43f is active+clean+inconsistent, acting [19,17,10]
pg 11.37c is active+clean+inconsistent, acting [5,19,11]
pg 11.296 is active+clean+inconsistent, acting [6,29,20]
pg 11.21f is active+clean+inconsistent, acting [26,4,21]
41 scrub errors
But after a new deep scrub starts a new pg error number appears. :-(
There is no disc error in the message log and its happen on differnet nodes.
Before i have done the upgrade there was no such error for 5 months!
Can u take a look at this issues please?
pveversion -v
proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.16-1-pve: 4.4.16-64
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-44
qemu-server: 4.0-86
pve-firmware: 1.1-9
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-57
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-2
pve-container: 1.0-73
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
ceph: 0.94.9-1~bpo80+1
2016-09-24 07:00:00.000332 mon.0 10.11.12.1:6789/0 451636 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 08:00:00.000285 mon.0 10.11.12.1:6789/0 454999 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:00:00.000284 mon.0 10.11.12.1:6789/0 458322 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 39 scrub errors
2016-09-24 09:38:34.085968 osd.28 10.11.12.5:6804/2680 700 : cluster [ERR] 11.53e shard 20: soid 11/72be2d3e/rbd_data.61777238e1f29.000000000002686c/head data_digest 0xe84d5b90 != known data_digest 0xaa9daaf7 from auth shard 0
2016-09-24 09:38:34.086013 osd.28 10.11.12.5:6804/2680 701 : cluster [ERR] 11.53e shard 20: soid 11/53693d3e/rbd_data.60954238e1f29.000000000003f3cd/head data_digest 0x8e16c3db != known data_digest 0xaf95bf97 from auth shard 0
2016-09-24 09:38:42.798446 osd.28 10.11.12.5:6804/2680 702 : cluster [ERR] 11.53e deep-scrub 0 missing, 2 inconsistent objects
2016-09-24 09:38:42.798450 osd.28 10.11.12.5:6804/2680 703 : cluster [ERR] 11.53e deep-scrub 2 errors
2016-09-24 09:38:54.013721 mon.0 10.11.12.1:6789/0 460401 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:00:00.000357 mon.0 10.11.12.1:6789/0 461480 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 41 scrub errors
2016-09-24 10:27:27.185377 osd.5 10.11.12.1:6820/3894 740 : cluster [ERR] 11.108 shard 19: soid 11/4e747108/rbd_data.60954238e1f29.000000000005b7ee/head data_digest 0x84cb518f != known data_digest 0xa5e375f3 from auth shard 5
2016-09-24 10:27:43.860144 osd.5 10.11.12.1:6820/3894 741 : cluster [ERR] 11.108 repair 0 missing, 1 inconsistent objects
2016-09-24 10:27:43.860338 osd.5 10.11.12.1:6820/3894 742 : cluster [ERR] 11.108 repair 1 errors, 1 fixed
2016-09-24 10:27:54.034706 mon.0 10.11.12.1:6789/0 463003 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 40 scrub errors
2016-09-24 10:29:59.591817 osd.19 10.11.12.4:6820/3621 342 : cluster [ERR] 11.55b shard 10: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:29:59.591858 osd.19 10.11.12.4:6820/3621 343 : cluster [ERR] 11.55b shard 19: soid 11/53428d5b/rbd_data.67194238e1f29.00000000000000d5/head data_digest 0x68ca262c != best guess data_digest 0x72f527e from auth shard 4
2016-09-24 10:30:05.012307 osd.19 10.11.12.4:6820/3621 344 : cluster [ERR] 11.55b deep-scrub 0 missing, 1 inconsistent objects
2016-09-24 10:30:05.012386 osd.19 10.11.12.4:6820/3621 345 : cluster [ERR] 11.55b deep-scrub 2 errors
2016-09-24 10:30:54.035562 mon.0 10.11.12.1:6789/0 463171 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:00:00.000346 mon.0 10.11.12.1:6789/0 464788 : cluster [INF] HEALTH_ERR; 8 pgs inconsistent; 42 scrub errors
2016-09-24 11:11:07.758065 osd.25 10.11.12.5:6812/3135 806 : cluster [ERR] 11.56d shard 22: soid 11/6c16b56d/rbd_data.61777238e1f29.0000000000002bf7/head data_digest 0xafc360ef != known data_digest 0xe11c96cb from auth shard 0
2016-09-24 11:11:19.922190 osd.25 10.11.12.5:6812/3135 807 : cluster [ERR] 11.56d repair 0 missing, 1 inconsistent objects
2016-09-24 11:11:19.922201 osd.25 10.11.12.5:6812/3135 808 : cluster [ERR] 11.56d repair 1 errors, 1 fixed
2016-09-24 11:11:54.046966 mon.0 10.11.12.1:6789/0 465449 : cluster [INF] HEALTH_ERR; 7 pgs inconsistent; 41 scrub errors