CEPH Scrub error

INECSYS · Aug 9, 2019

After upgrade to Proxmox 5.4.13 with have the following errors:

o2019-08-09 08:44:37.335346 osd.2 osd.2 10.10.3.153:6800/2479 22 : cluster [ERR] 1.5f shard 1 soid 1:fbdcc12c:::rbd_data.5cffec6b8b4567.0000000000003ac5:head : candidate had a read error
2019-08-09 08:44:42.872941 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16087 : cluster [DBG] pgmap v16091: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 60.1KiB/s rd, 204KiB/s wr, 25op/s
2019-08-09 08:44:44.893407 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16088 : cluster [DBG] pgmap v16092: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 83.2KiB/s rd, 291KiB/s wr, 34op/s
2019-08-09 08:44:46.913265 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16089 : cluster [DBG] pgmap v16093: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 78.9KiB/s rd, 269KiB/s wr, 30op/s
2019-08-09 08:44:48.932986 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16090 : cluster [DBG] pgmap v16094: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 77.2KiB/s rd, 249KiB/s wr, 27op/s
2019-08-09 08:44:50.953684 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16091 : cluster [DBG] pgmap v16095: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 91.7KiB/s rd, 308KiB/s wr, 38op/s
2019-08-09 08:45:01.104992 mon.prx01 mon.0 10.10.3.151:6789/0 3068 : cluster [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 1 scrub errors)
2019-08-09 08:45:01.105055 mon.prx01 mon.0 10.10.3.151:6789/0 3069 : cluster [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-08-09 08:45:01.105103 mon.prx01 mon.0 10.10.3.151:6789/0 3070 : cluster [INF] Cluster is now healthy
2019-08-09 08:44:52.972941 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16092 : cluster [DBG] pgmap v16096: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 44.9KiB/s rd, 156KiB/s wr, 23op/s
2019-08-09 08:44:54.993444 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16093 : cluster [DBG] pgmap v16097: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 64.4KiB/s rd, 233KiB/s wr, 32op/s
2019-08-09 08:44:57.013227 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16094 : cluster [DBG] pgmap v16098: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 45.5KiB/s rd, 173KiB/s wr, 29op/s
2019-08-09 08:44:59.032939 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16095 : cluster [DBG] pgmap v16099: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 38.3KiB/s rd, 163KiB/s wr, 26op/s
2019-08-09 08:45:01.053997 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16096 : cluster [DBG] pgmap v16100: 128 pgs: 128 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 410KiB/s rd, 337KiB/s wr, 119op/s; 338KiB/s, 0objects/s recovering
2019-08-09 08:44:56.412409 osd.2 osd.2 10.10.3.153:6800/2479 23 : cluster [ERR] 1.5f repair 0 missing, 1 inconsistent objects
2019-08-09 08:44:56.412427 osd.2 osd.2 10.10.3.153:6800/2479 24 : cluster [ERR] 1.5f repair 1 errors, 1 fixed

We already repaired a lot of errors but new one are coutinously comming.

Any idea?

majorchen · Aug 9, 2019

It seems that there is a partial PG exception, MON can not be determined correctly, and needs to be manually operated. Can refer to

https://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg/

Alwin · Aug 13, 2019

What version of ceph are you running (ceph versions)? Are any nodes under resource pressure?

INECSYS · Aug 15, 2019

Ceph version is 12.2.12 on all OSD.
No nodes are under resource pressure: Ressources usage on the nodes CPU usage between 2-3%, ram between 39-45%, HD between 20-27%

Alwin · Aug 21, 2019

INECSYS said:
ram between 39-45%

Is SWAP activated? Could be a possible cause for the scrub issues.

As deep scrubs does an object-by-object comparison, it takes a while and more error might pop up. Please check if there is anything in the OSD and kernel logs, that might indicate some hardware malfunction.

INECSYS said:
o2019-08-09 08:44:37.335346 osd.2 osd.2 10.10.3.153:6800/2479 22 : cluster [ERR] 1.5f shard 1 soid 1:fbdcc12c:::rbd_data.5cffec6b8b4567.0000000000003ac5:head : candidate had a read error

It is also interesting to check if these error pop up with only a specific PG/OSD.

INECSYS · Aug 21, 2019

Yes, the SWAP is activated: Memory usage 39,58% of 125,78Gb SWAP 18,01% of 8Gb, Memory usage 31,93% of 125,78Gb SWAP 0,07% of 8Gb, Memory usage 36,80% of 125,78Gb SWAP 79,79% of 8Gb
We didn't activate the SWAP, the nodes were installed with default parameters.
The errors were on differents OSD.
We check the error 4x a day, since 7 days we don't have any error, but no reason, we didn't change anything.

INECSYS · Aug 21, 2019

Checking teh SWAP with cat /proc/sys/vm/swappiness, the result is 60 on all nodes

Alwin · Aug 21, 2019

If it happens again, try to disable SWAP and see if the scrub errors continue. I faintly recall that there was a bug regarding scrubbing and SWAP.

lao-tseu · Aug 22, 2019

May I suggest to be cautious and not disable swap completely on a proxmox production server with ceph.

It can have some very adverse effects than ceph scrub errors...
Here is a quote from Thomas Lamprecht talking about the reasonable value to use for swappiness :
https://bugzilla.proxmox.com/show_bug.cgi?id=1323
""" ...
> A value of 0 instructs the kernel not to
> initiate swap until the amount of free and file-backed pages is less
> than the high water mark in a zone
-- https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Normally, if this happens your systems is already in a stressful situation, swapping only then may easily result in OOM (out of memory) situations triggering the OOM killer, which then may kill a VM (or other important hypervisor tasks).
Swap is not evil or bad in general, and swapping out stuff which is currently just not (often) needed (a bit) early, once you get into a territory where free and available memory becomes scarce is actually better than doing in the latest possible moment.
..."""

And now from here : https://github.com/torvalds/linux/blob/v4.15/Documentation/cgroup-v1/memory.txt
""" ...
0 swappiness really prevents from any swapping even if
there is a swap storage available. This might lead to memcg OOM killer
if there are no file pages to reclaim.
..."""

Conclusion : don't disable swap or be ready to be hit by the OOM thunder !

Out Of Memory Management : https://www.kernel.org/doc/gorman/html/understand/understand016.html
How to: Configure Swappiness : https://kx.cloudingenium.com/linux/ubuntu/configure-swappiness-ubuntu/

Alwin · Aug 23, 2019

lao-tseu said:
May I suggest to be cautious and not disable swap completely on a proxmox production server with ceph.

I did not suggest to permanently disable it. This is the admins choice.

lao-tseu said:
Here is a quote from Thomas Lamprecht talking about the reasonable value to use for swappiness :
https://bugzilla.proxmox.com/show_bug.cgi?id=1323

Yes, if you don't disable SWAP, then choose the swappiness carefully. The bug report is about what level not about deactivating it.
In his report there is also a link to an LWN post [0] to get a deeper understanding of how the reclaiming process works. In general, as Thomas said, SWAP is not bad, you just have to consider the cost a write/read from SWAP has. Best is to use NVMe SSDs that can cope very well with small writes (eg. Intel Optane).

lao-tseu said:
And now from here : https://github.com/torvalds/linux/blob/v4.15/Documentation/cgroup-v1/memory.txt

Please post the whole section or don't post it. The section talks about swappiness in a cgroup, where a swappiness of 0 disables it for the group, contrary to the global swappiness.

5.3 swappiness

Overrides /proc/sys/vm/swappiness for the particular group. The tunable
in the root cgroup corresponds to the global swappiness setting.

Please note that unlike during the global reclaim, limit reclaim
enforces that 0 swappiness really prevents from any swapping even if
there is a swap storage available. This might lead to memcg OOM killer
if there are no file pages to reclaim.

[0] https://lwn.net/Articles/690079/

Search

Search

CEPH Scrub error

INECSYS

Member

majorchen

New Member

Alwin

Proxmox Retired Staff

INECSYS

Member

Alwin

Proxmox Retired Staff

INECSYS

Member

INECSYS

Member

Alwin

Proxmox Retired Staff

lao-tseu

Member

Alwin

Proxmox Retired Staff