@black4: I've upgrade from Hammer to Jewel about month ago. From Jewel to Luminous (4.4 pve) on wednesday and it was working fine few days. After upgrade pve 4.4 to 5.1 issues started. I must admit that when I was upgrading PVE to 5.1 CEPH health status was WARN - only "too many PGs per OSD (323 > max 200)".
ceph osd crush tunables optimal
ceph osd crush tunables hammer
cluster: id: 089d3673-5607-404d-9351-2d4004043966 health: HEALTH_ERR Reduced data availability: 127 pgs inactive Degraded data redundancy: 127 pgs unclean, 15 pgs degraded 417 slow requests are blocked > 32 sec 184 stuck requests are blocked > 4096 sec services: mon: 3 daemons, quorum 2,1,0 mgr: tw-dwt-prx-05(active), standbys: tw-dwt-prx-03, tw-dwt-prx-07 osd: 92 osds: 92 up, 92 in; 117 remapped pgs data: pools: 3 pools, 6144 pgs objects: 1412k objects, 5645 GB usage: 16969 GB used, 264 TB / 280 TB avail pgs: 2.067% pgs not active 6017 active+clean 112 activating+remapped 10 activating+degraded 5 activating+degraded+remapped
Explanation and solution:
I had a similar problem. I have Intel 10gb cards. I had all sorts of slow requests.
So I decided to upgrade my network to InfiniBand. I purchased some Mellanox cards and 10gb adapters. I install the new cards and started using the 10gb adapters with my existing 10gb fiber switch and all my slow reads went away. I've been running for two weeks without any issues. Previous to this I had slow requests multiple times a day!
So my suspicion is that Intel 10gb driver has issues... For months I tried to tweak the driver. It's nice to see everything working as expected.
I have a Mellanox switch, and I'm planning on migrating to that switch for ceph traffic.
Yes just using the Mellanox cards in Ethernet mode fixed the problem for me. I did move to using the Mellanox switches for faster connections. Since I made the switch I have seen no slow reads. Ceph works great now.
thank you for the info.I'm using CONNECTX-3 cards. Note that MCX314A-BCBT are ethernet cards. If you want to run InfiniBand you need MCX354A-FCBT.
I'm using CONNECTX-3 cards. Note that MCX314A-BCBT are ethernet cards. If you want to run InfiniBand you need MCX354A-FCBT.
# .----------------- minute (0 - 59) # | .-------------- hour (0 - 23) # | | .---------- day of month (1 - 31) # | | | .------- month (1 - 12) OR jan,feb,mar,apr ... # | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat # | | | | | 59 23 * * * root grep "slow requests are blocked" /var/log/ceph/ceph.log # Note we do logrotate at 00:00 . if you do it debian default rotate then change minute and hour .
grep "slow requests are blocked" /var/log/ceph/ceph.log