CEPH problem after upgrade to 5.1 / slow requests + stuck request

Until you have inactive (activating) pgs your cluster will be useless.
My cluster is now fixed and we are testing IO operations on it. I will post solution, but I need to translate my notes ;)
I'll be back with reply.
 
  • Like
Reactions: ignaqui
I had a similar problem. I have Intel 10gb cards. I had all sorts of slow requests.

So I decided to upgrade my network to InfiniBand. I purchased some Mellanox cards and 10gb adapters. I install the new cards and started using the 10gb adapters with my existing 10gb fiber switch and all my slow reads went away. I've been running for two weeks without any issues. Previous to this I had slow requests multiple times a day!

So my suspicion is that Intel 10gb driver has issues... For months I tried to tweak the driver. It's nice to see everything working as expected.

I have a Mellanox switch, and I'm planning on migrating to that switch for ceph traffic.
 
I don't know solution but all problematic PGs became on normal recovery track when I restarted second monitor. I also have too much PGs per OSD and maybe that was the reason. I set high enough mon_max_pg_per_osd to pass my current setup and on second monitor restart it all became on right track. It still barks about too much PGs in ceph -s command. Previously I did full node restarts without any success. I also issued to set optimal tunables and that caused major rebalance and peering issues.
 
@black4: I've upgrade from Hammer to Jewel about month ago. From Jewel to Luminous (4.4 pve) on wednesday and it was working fine few days. After upgrade pve 4.4 to 5.1 issues started. I must admit that when I was upgrading PVE to 5.1 CEPH health status was WARN - only "too many PGs per OSD (323 > max 200)".

I did pretty much the same as @black4, except I didn't have "too many PGs" message.

My troubles started when I execute this:
Code:
ceph osd crush tunables optimal

After that only ~610 pgs became active, other ~5000 became activating+remapped. I changed tunables back to hammer:
Code:
ceph osd crush tunables hammer

And now I have this:

Code:
 cluster:
    id:     089d3673-5607-404d-9351-2d4004043966
    health: HEALTH_ERR
            Reduced data availability: 127 pgs inactive
            Degraded data redundancy: 127 pgs unclean, 15 pgs degraded
            417 slow requests are blocked > 32 sec
            184 stuck requests are blocked > 4096 sec

  services:
    mon: 3 daemons, quorum 2,1,0
    mgr: tw-dwt-prx-05(active), standbys: tw-dwt-prx-03, tw-dwt-prx-07
    osd: 92 osds: 92 up, 92 in; 117 remapped pgs

  data:
    pools:   3 pools, 6144 pgs
    objects: 1412k objects, 5645 GB
    usage:   16969 GB used, 264 TB / 280 TB avail
    pgs:     2.067% pgs not active
             6017 active+clean
             112  activating+remapped
             10   activating+degraded
             5    activating+degraded+remapped
 
Just a small update. mon_max_pg_per_osd was taken by monitors but ceph status still barked about it until I restarted active mgr daemon.
 
I eventually fixed it by creating new pools withing current best practice recommendations, migrated data and deleted old ones. It still involved more than 60 hours or data movement.
 
I had a similar problem. I have Intel 10gb cards. I had all sorts of slow requests.

So I decided to upgrade my network to InfiniBand. I purchased some Mellanox cards and 10gb adapters. I install the new cards and started using the 10gb adapters with my existing 10gb fiber switch and all my slow reads went away. I've been running for two weeks without any issues. Previous to this I had slow requests multiple times a day!

So my suspicion is that Intel 10gb driver has issues... For months I tried to tweak the driver. It's nice to see everything working as expected.

I have a Mellanox switch, and I'm planning on migrating to that switch for ceph traffic.

Hello there
apologies for replying to an old post.

We have similar issue, and are currently using Intel 10G switches.

I do have 2 Mellanox switches and 10G cards in storage room.

My question: Did using Mellanox fix the issues for you ?
 
Yes just using the Mellanox cards in Ethernet mode fixed the problem for me. I did move to using the Mellanox switches for faster connections. Since I made the switch I have seen no slow reads. Ceph works great now.
 
  • Like
Reactions: RobFantini
Yes just using the Mellanox cards in Ethernet mode fixed the problem for me. I did move to using the Mellanox switches for faster connections. Since I made the switch I have seen no slow reads. Ceph works great now.

Hello
which model Mellanox cards are you using?

We've got to buy a few more and am trying t5o get a list of which are good to get.
 
I'm using CONNECTX-3 cards. Note that MCX314A-BCBT are ethernet cards. If you want to run InfiniBand you need MCX354A-FCBT.
 
I'm using CONNECTX-3 cards. Note that MCX314A-BCBT are ethernet cards. If you want to run InfiniBand you need MCX354A-FCBT.
thank you for the info.

I found a 8 InfiniiBand connect-x first version cards in storage. we'll start with those. we do not have a huge amount of data getting transferred so hopefully ConnectX-1 works ok. else we'll upgrade.
 
I'm using CONNECTX-3 cards. Note that MCX314A-BCBT are ethernet cards. If you want to run InfiniBand you need MCX354A-FCBT.

Hello , We do the following from a cronjob daily to check for a ceph issue

in a crontab
Code:
# .----------------- minute (0 - 59)
# | .-------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | |   .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | |   |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | |   |  |
59 23 *   *  * root    grep "slow requests are blocked"   /var/log/ceph/ceph.log
                               #  Note we do logrotate at 00:00 . if you do it debian default  rotate then change minute and hour .

on the average we get one instance of that happening per day.

do you mind trying that on your cluster? not necessarily the cronjob just this if you could.
Code:
grep "slow requests are blocked"   /var/log/ceph/ceph.log

before switching to IB , I'd like to see if others using a good IB set up have same issue.


thanks for the help . you've done a lot and no need for more if you can not get to it.
 
Last edited:
We switched over to Mellanox cards yesterday . Time will tell if issue is fixed.

Looks good so far.

update - 2/15/19 - using Mellanox cards totally solved our issues.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!