Ceph show "slow requests are blocked" when creating / modifying CephFS

cmonty14

Well-Known Member
Mar 4, 2014
343
5
58
Hi,

I have noticed in Ceph log (ceph -w) an increase of "slow requests are blocked" when I create CephFS, e.g.
2019-10-14 16:41:32.083294 mon.ld5505 [INF] daemon mds.ld4465 assigned to filesystem cephfs as rank 0
2019-10-14 16:41:32.121895 mon.ld5505 [INF] daemon mds.ld4465 is now active in filesystem cephfs as rank 0
2019-10-14 16:41:33.090050 mon.ld5505 [WRN] Health check failed: 2 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-14 16:41:39.208907 mon.ld5505 [WRN] Health check update: 13 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-14 16:41:45.242038 mon.ld5505 [WRN] Health check update: 14 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-14 16:41:50.892561 mon.ld5505 [WRN] Health check update: 16 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-14 16:41:56.978641 mon.ld5505 [WRN] Health check update: 27 slow requests are blocked > 32 sec (REQUEST_SLOW)


Then I noticed this warning 1 pools have pg_num > pgp_num and corrected it. Immediatelly the PG_AVAILABILITY shows pgs inactive + "slow requests are blocked":
2019-10-15 07:00:25.679400 mon.ld5505 [WRN] overall HEALTH_WARN 1 pools have too many placement groups; 1 pools have pg_num > pgp_num; mons ld5505,ld5506,ld5508 are low on available space
n2019-10-15 07:48:43.053437 mon.ld5505 [INF] Health check cleared: SMALLER_PGP_NUM (was: 1 pools have pg_num > pgp_num)
2019-10-15 07:48:45.062434 mon.ld5505 [WRN] Health check failed: Reduced data availability: 1 pg inactive, 1 pg peering (PG_AVAILABILITY)
2019-10-15 07:48:52.233902 mon.ld5505 [WRN] Health check update: Reduced data availability: 3 pgs inactive, 2 pgs peering (PG_AVAILABILITY)
2019-10-15 07:49:00.020152 mon.ld5505 [WRN] Health check failed: 2 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:49:06.104454 mon.ld5505 [WRN] Health check update: 11 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:49:12.240239 mon.ld5505 [WRN] Health check update: 14 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:49:17.241812 mon.ld5505 [WRN] Health check update: 16 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:49:22.243300 mon.ld5505 [WRN] Health check update: 18 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:49:27.244978 mon.ld5505 [WRN] Health check update: 16 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:49:30.318849 mon.ld5505 [WRN] Health check update: Reduced data availability: 2 pgs inactive, 1 pg peering (PG_AVAILABILITY)
2019-10-15 07:49:32.246457 mon.ld5505 [WRN] Health check update: 13 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:49:32.379147 mon.ld5505 [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 2 pgs inactive, 1 pg peering)
2019-10-15 07:50:06.195773 mon.ld5505 [WRN] Health check update: 12 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:50:12.233853 mon.ld5505 [WRN] Health check update: 8 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:50:17.266094 mon.ld5505 [WRN] Health check update: 7 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:50:22.267746 mon.ld5505 [WRN] Health check update: 2 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:50:35.340271 mon.ld5505 [WRN] Health check update: 1 slow requests are blocked > 32 sec (REQUEST_SLOW)
2019-10-15 07:51:40.276440 mon.ld5505 [INF] Health check cleared: REQUEST_SLOW (was: 1 slow requests are blocked > 32 sec)
2019-10-15 08:00:00.000128 mon.ld5505 [WRN] overall HEALTH_WARN 1 pools have too many placement groups; mons ld5505,ld5506,ld5508 are low on available space


I don't believe in a coincident here because I was facing this situation many times in past causing my VMs using the same pool for images to enter a non-operable state because the number of slow requests was drastically increasing.

Here are the current pool stats:
root@ld3955:~# rados df
POOL_NAME USED OBJECTS CLONES COPIES MISSING_ON_PRIMARY UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR UNDER COMPR
backup 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
cephfs_data 0 B 0 0 0 0 0 0 0 0 B 0 0 B 0 B 0 B
cephfs_metadata 2.2 KiB 22 0 66 0 0 0 0 0 B 45 13 KiB 0 B 0 B
hdb_backup 195 TiB 51077703 0 153233109 0 0 0 12576632 4.3 TiB 280999680 523 TiB 0 B 0 B
hdd 2.2 TiB 585051 0 1755153 0 0 0 4424081 69 GiB 8219453 1.2 TiB 0 B 0 B

total_objects 51662776
total_used 594 TiB
total_avail 887 TiB
total_space 1.4 PiB


Can you please share some info what is causing this finding?

THX
 
Then I noticed this warning 1 pools have pg_num > pgp_num and corrected it. Immediatelly the PG_AVAILABILITY shows pgs inactive + "slow requests are blocked":
Both the pg_num and pgp_num need to have the same value, otherwise not all PGs are created/distributed in the cluster.

How does the cluster health look now (since some time has past)?
 
Hi Alwin,

after some time the number of slow requests are blocked decreased, but very slowly.

In my opinion there's a correlation between the number of PGs inactive and the number of slow requests are blocked.
I must understand what is causing the slow requests are blocked on the pools that are using specific devices in my cluster: 75 local attached HDDs 1TB.

Any change in the affected pool is running into slow requests are blocked.
E.g. I enabled balance mode "upmap" on pool cephfs_metadata that currently stores 300MB by setting
ceph config set mgr mgr/balancer/pool_ids 11,52,58

Since then the cluster is reporting slow requests are blocked between 60 and 150 request.

Can you please advise?

THX
 
E.g. I enabled balance mode "upmap" on pool cephfs_metadata that currently stores 300MB by setting
If you activated that while the PGs were being created/distributed, it will cause extra rebalance traffic.

I must understand what is causing the slow requests are blocked on the pools that are using specific devices in my cluster: 75 local attached HDDs 1TB.
Check the ceph logs, there may be information on what is going on.
 
Indeed I activated the balancer mode "upmap" when Ceph health status was green, means there was no relevant activity.

Can you please advise which logs should be checked?

Based on the output of ceph health detail I can see which OSDs are affected, and to my best knowledge the OSDs are always the same:
root@ld3955:~# ceph health detail
HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering; 1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio; 1 pools have too few placement groups; 65 slow requests are blocked > 32 sec; mons ld5505,ld5506,ld5508 are low on
available space
PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg peering
pg 58.7 is stuck peering for 323.924561, current state peering, last acting [11,60,26]
POOL_TARGET_SIZE_BYTES_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_bytes
Pools ['hdb_backup'] overcommit available storage by 1.288x due to target_size_bytes 0 on pools []
POOL_TARGET_SIZE_RATIO_OVERCOMMITTED 1 subtrees have overcommitted pool target_size_ratio
Pools ['hdb_backup'] overcommit available storage by 1.288x due to target_size_ratio 0.000 on pools []
POOL_TOO_FEW_PGS 1 pools have too few placement groups
Pool hdd has 64 placement groups, should have 512
REQUEST_SLOW 65 slow requests are blocked > 32 sec
6 ops are blocked > 1048.58 sec
37 ops are blocked > 524.288 sec
22 ops are blocked > 262.144 sec
osd.11 has blocked requests > 262.144 sec
osds 8,9,15,19,32,37,41,61,66,172,173 have blocked requests > 524.288 sec
osd.21 has blocked requests > 1048.58 sec
MON_DISK_LOW mons ld5505,ld5506,ld5508 are low on available space
mon.ld5505 has 20% avail
mon.ld5506 has 22% avail
mon.ld5508 has 20% avail


But all devices are the same, affected or not.
 
mons ld5505,ld5506,ld5508 are low on available space
Foremost, check the available space on the MONs, it seems the DB (partition) is running low. And that may halt the cluster if it happens to all. As the additional information is stored in the OSDmap, the DB on the MONs will grow.

1 pg inactive, 1 pg peering;
This possibly might come from the pg_num/pgp_num adjustment and the balancer taking care of it.

1 subtrees have overcommitted pool target_size_bytes; 1 subtrees have overcommitted pool target_size_ratio;
This may come from the auto-balancer being on
https://www.spinics.net/lists/ceph-users/msg52606.html
https://docs.ceph.com/docs/master/r...-checks/#pool-target-size-bytes-overcommitted
https://docs.ceph.com/docs/master/rados/operations/placement-groups/#specifying-pool-target-size

osd.11 has blocked requests > 262.144 sec
osds 8,9,15,19,32,37,41,61,66,172,173 have blocked requests > 524.288 sec
osd.21 has blocked requests > 1048.58 sec
All these OSDs are sure part of the cephfs pool(s) and it may be better to disable ATM. What does ceph balancer status show?
 
Hi,
please focus on the issue slow requests are blocked in this ticket only.
The other issues are a) under control (+35GB free disk space on monitor nodes) or b) addressed in ceph-user-list.

Unfortunately ceph balancer status is not responding... could be related to slow requests are blocked.
 
Here the output of ceph balancer status (works again after health check is green).
root@ld3955:~# ceph balancer status
{
"active": true,
"plans": [],
"mode": "upmap"
}
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!