CEPH Scrub error

Jul 3, 2019
7
0
6
33
After upgrade to Proxmox 5.4.13 with have the following errors:

o2019-08-09 08:44:37.335346 osd.2 osd.2 10.10.3.153:6800/2479 22 : cluster [ERR] 1.5f shard 1 soid 1:fbdcc12c:::rbd_data.5cffec6b8b4567.0000000000003ac5:head : candidate had a read error
2019-08-09 08:44:42.872941 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16087 : cluster [DBG] pgmap v16091: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 60.1KiB/s rd, 204KiB/s wr, 25op/s
2019-08-09 08:44:44.893407 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16088 : cluster [DBG] pgmap v16092: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 83.2KiB/s rd, 291KiB/s wr, 34op/s
2019-08-09 08:44:46.913265 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16089 : cluster [DBG] pgmap v16093: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 78.9KiB/s rd, 269KiB/s wr, 30op/s
2019-08-09 08:44:48.932986 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16090 : cluster [DBG] pgmap v16094: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 77.2KiB/s rd, 249KiB/s wr, 27op/s
2019-08-09 08:44:50.953684 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16091 : cluster [DBG] pgmap v16095: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 91.7KiB/s rd, 308KiB/s wr, 38op/s
2019-08-09 08:45:01.104992 mon.prx01 mon.0 10.10.3.151:6789/0 3068 : cluster [INF] Health check cleared: OSD_SCRUB_ERRORS (was: 1 scrub errors)
2019-08-09 08:45:01.105055 mon.prx01 mon.0 10.10.3.151:6789/0 3069 : cluster [INF] Health check cleared: PG_DAMAGED (was: Possible data damage: 1 pg inconsistent)
2019-08-09 08:45:01.105103 mon.prx01 mon.0 10.10.3.151:6789/0 3070 : cluster [INF] Cluster is now healthy
2019-08-09 08:44:52.972941 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16092 : cluster [DBG] pgmap v16096: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 44.9KiB/s rd, 156KiB/s wr, 23op/s
2019-08-09 08:44:54.993444 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16093 : cluster [DBG] pgmap v16097: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 64.4KiB/s rd, 233KiB/s wr, 32op/s
2019-08-09 08:44:57.013227 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16094 : cluster [DBG] pgmap v16098: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 45.5KiB/s rd, 173KiB/s wr, 29op/s
2019-08-09 08:44:59.032939 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16095 : cluster [DBG] pgmap v16099: 128 pgs: 1 active+clean+scrubbing+deep+inconsistent+repair, 127 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 38.3KiB/s rd, 163KiB/s wr, 26op/s
2019-08-09 08:45:01.053997 mgr.prx01 client.22276741 10.10.3.151:0/4173536751 16096 : cluster [DBG] pgmap v16100: 128 pgs: 128 active+clean; 1.82TiB data, 5.44TiB used, 14.2TiB / 19.6TiB avail; 410KiB/s rd, 337KiB/s wr, 119op/s; 338KiB/s, 0objects/s recovering
2019-08-09 08:44:56.412409 osd.2 osd.2 10.10.3.153:6800/2479 23 : cluster [ERR] 1.5f repair 0 missing, 1 inconsistent objects
2019-08-09 08:44:56.412427 osd.2 osd.2 10.10.3.153:6800/2479 24 : cluster [ERR] 1.5f repair 1 errors, 1 fixed

We already repaired a lot of errors but new one are coutinously comming.

Any idea?
 
What version of ceph are you running (ceph versions)? Are any nodes under resource pressure?
 
Ceph version is 12.2.12 on all OSD.
No nodes are under resource pressure: Ressources usage on the nodes CPU usage between 2-3%, ram between 39-45%, HD between 20-27%
 
ram between 39-45%
Is SWAP activated? Could be a possible cause for the scrub issues.

As deep scrubs does an object-by-object comparison, it takes a while and more error might pop up. Please check if there is anything in the OSD and kernel logs, that might indicate some hardware malfunction.

o2019-08-09 08:44:37.335346 osd.2 osd.2 10.10.3.153:6800/2479 22 : cluster [ERR] 1.5f shard 1 soid 1:fbdcc12c:::rbd_data.5cffec6b8b4567.0000000000003ac5:head : candidate had a read error
It is also interesting to check if these error pop up with only a specific PG/OSD.
 
Yes, the SWAP is activated: Memory usage 39,58% of 125,78Gb SWAP 18,01% of 8Gb, Memory usage 31,93% of 125,78Gb SWAP 0,07% of 8Gb, Memory usage 36,80% of 125,78Gb SWAP 79,79% of 8Gb
We didn't activate the SWAP, the nodes were installed with default parameters.
The errors were on differents OSD.
We check the error 4x a day, since 7 days we don't have any error, but no reason, we didn't change anything.
 
If it happens again, try to disable SWAP and see if the scrub errors continue. I faintly recall that there was a bug regarding scrubbing and SWAP.
 
May I suggest to be cautious and not disable swap completely on a proxmox production server with ceph.

It can have some very adverse effects than ceph scrub errors...
Here is a quote from Thomas Lamprecht talking about the reasonable value to use for swappiness :
https://bugzilla.proxmox.com/show_bug.cgi?id=1323
""" ...
> A value of 0 instructs the kernel not to
> initiate swap until the amount of free and file-backed pages is less
> than the high water mark in a zone
-- https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Normally, if this happens your systems is already in a stressful situation, swapping only then may easily result in OOM (out of memory) situations triggering the OOM killer, which then may kill a VM (or other important hypervisor tasks).
Swap is not evil or bad in general, and swapping out stuff which is currently just not (often) needed (a bit) early, once you get into a territory where free and available memory becomes scarce is actually better than doing in the latest possible moment.

..."""


And now from here : https://github.com/torvalds/linux/blob/v4.15/Documentation/cgroup-v1/memory.txt
""" ...
0 swappiness really prevents from any swapping even if
there is a swap storage available. This might lead to memcg OOM killer
if there are no file pages to reclaim.

..."""

Conclusion : don't disable swap or be ready to be hit by the OOM thunder !

Out Of Memory Management : https://www.kernel.org/doc/gorman/html/understand/understand016.html
How to: Configure Swappiness : https://kx.cloudingenium.com/linux/ubuntu/configure-swappiness-ubuntu/
 
May I suggest to be cautious and not disable swap completely on a proxmox production server with ceph.
I did not suggest to permanently disable it. This is the admins choice. ;)
Here is a quote from Thomas Lamprecht talking about the reasonable value to use for swappiness :
https://bugzilla.proxmox.com/show_bug.cgi?id=1323
Yes, if you don't disable SWAP, then choose the swappiness carefully. The bug report is about what level not about deactivating it.
In his report there is also a link to an LWN post [0] to get a deeper understanding of how the reclaiming process works. In general, as Thomas said, SWAP is not bad, you just have to consider the cost a write/read from SWAP has. Best is to use NVMe SSDs that can cope very well with small writes (eg. Intel Optane).
Please post the whole section or don't post it. The section talks about swappiness in a cgroup, where a swappiness of 0 disables it for the group, contrary to the global swappiness.
5.3 swappiness

Overrides /proc/sys/vm/swappiness for the particular group. The tunable
in the root cgroup corresponds to the global swappiness setting.

Please note that unlike during the global reclaim, limit reclaim
enforces that 0 swappiness really prevents from any swapping even if
there is a swap storage available. This might lead to memcg OOM killer
if there are no file pages to reclaim.


[0] https://lwn.net/Articles/690079/
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!