Ceph unstable Behaviour causing VM hanging

Alwin · Jul 10, 2020

I wanted to add, though the actual question was a different one. And I may have strayed from it.

ermanishchawla said:
Yes I know that and I have tweaked the value to understand the behaviour , just wanted to understand how we can make sure writes are continuing even when my 2 nodes failed ( without 10 minutes break)

ermanishchawla said:
Now how to remove weights from the crushmaps

Well, in our tests we did it by manually editing the crushmap.

ermanishchawla · Jul 10, 2020

can i use this
ceph osd crush reweight {name} {weight} to reweight it

can you just the ideal weight

ermanishchawla · Jul 10, 2020

or simply removing choose_args from the map will do the job??

Alwin · Jul 10, 2020

ermanishchawla said:
can i use this
ceph osd crush reweight {name} {weight} to reweight it

can you just the ideal weight

The balancer introduced extra weights. That information seems to be placed in the crushmap as choose_args. The only way I see is to manually edit the crushmap. But as said, that will trigger quiet some re-balance.
https://docs.ceph.com/docs/nautilus/rados/operations/crush-map-edits/

ermanishchawla · Jul 10, 2020

Alwin said:
The balancer introduced extra weights. That information seems to be placed in the crushmap as choose_args. The only way I see is to manually edit the crushmap. But as said, that will trigger quiet some re-balance.
https://docs.ceph.com/docs/nautilus/rados/operations/crush-map-edits/

Yes thats understandable, now if i just remove the section choose_args from the crushmap, will it do the needful?
I will follow the procedure to apply again

ermanishchawla · Jul 10, 2020

I followed this

Read and write the map
# Read
ceph osd getcrushmap -o map.bin
#Conversion
crushtool -d map.bin -o map.txt
#Editing
vi map.txt [ #Removed the choose_args ]
#Convert Again
crushtool -c map.txt -o map.bin
# Write
ceph osd setcrushmap -i map.bin
.

ermanishchawla · Jul 10, 2020

CrushMap After Applying is like this, I have taken a new dump after applying

Code:

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
device 15 osd.15 class ssd
device 16 osd.16 class ssd
device 17 osd.17 class ssd
device 18 osd.18 class ssd
device 19 osd.19 class ssd
device 20 osd.20 class ssd
device 21 osd.21 class ssd
device 22 osd.22 class ssd
device 23 osd.23 class ssd
device 24 osd.24 class ssd
device 25 osd.25 class ssd
device 26 osd.26 class ssd
device 27 osd.27 class ssd
device 28 osd.28 class ssd
device 29 osd.29 class ssd
device 30 osd.30 class ssd
device 31 osd.31 class ssd
device 32 osd.32 class ssd
device 33 osd.33 class ssd
device 34 osd.34 class ssd
device 35 osd.35 class ssd
device 36 osd.36 class ssd
device 37 osd.37 class ssd
device 38 osd.38 class ssd
device 39 osd.39 class ssd
device 40 osd.40 class ssd
device 41 osd.41 class ssd
device 42 osd.42 class ssd
device 43 osd.43 class ssd
device 44 osd.44 class ssd
device 45 osd.45 class ssd
device 46 osd.46 class ssd
device 47 osd.47 class ssd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host inc1pve25 {
    id -3        # do not change unnecessarily
    id -4 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.0 weight 1.747
    item osd.1 weight 1.747
    item osd.2 weight 1.747
    item osd.3 weight 1.747
}
host inc1pve26 {
    id -5        # do not change unnecessarily
    id -6 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.4 weight 1.747
    item osd.5 weight 1.747
    item osd.6 weight 1.747
    item osd.7 weight 1.747
}
host inc1pve27 {
    id -7        # do not change unnecessarily
    id -8 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.8 weight 1.747
    item osd.9 weight 1.747
    item osd.10 weight 1.747
    item osd.11 weight 1.747
}
host inc1pve28 {
    id -9        # do not change unnecessarily
    id -10 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.12 weight 1.747
    item osd.13 weight 1.747
    item osd.14 weight 1.747
    item osd.15 weight 1.747
}
host inc1pve29 {
    id -11        # do not change unnecessarily
    id -12 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.16 weight 1.747
    item osd.17 weight 1.747
    item osd.18 weight 1.747
    item osd.19 weight 1.747
}
host inc1pve30 {
    id -13        # do not change unnecessarily
    id -14 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.20 weight 1.747
    item osd.21 weight 1.747
    item osd.22 weight 1.747
    item osd.23 weight 1.747
}
host inc1pve31 {
    id -15        # do not change unnecessarily
    id -16 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.24 weight 1.747
    item osd.25 weight 1.747
    item osd.26 weight 1.747
    item osd.27 weight 1.747
}
host inc1pve32 {
    id -17        # do not change unnecessarily
    id -18 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.28 weight 1.747
    item osd.29 weight 1.747
    item osd.30 weight 1.747
    item osd.31 weight 1.747
}
host inc1pve33 {
    id -19        # do not change unnecessarily
    id -20 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.32 weight 1.747
    item osd.33 weight 1.747
    item osd.34 weight 1.747
    item osd.35 weight 1.747
}
host inc1pve34 {
    id -21        # do not change unnecessarily
    id -22 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.36 weight 1.747
    item osd.37 weight 1.747
    item osd.38 weight 1.747
    item osd.39 weight 1.747
}
host inc1pve35 {
    id -23        # do not change unnecessarily
    id -24 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.40 weight 1.747
    item osd.41 weight 1.747
    item osd.42 weight 1.747
    item osd.43 weight 1.747
}
host inc1pve36 {
    id -25        # do not change unnecessarily
    id -26 class ssd        # do not change unnecessarily
    # weight 6.988
    alg straw2
    hash 0    # rjenkins1
    item osd.44 weight 1.747
    item osd.45 weight 1.747
    item osd.46 weight 1.747
    item osd.47 weight 1.747
}
root default {
    id -1        # do not change unnecessarily
    id -2 class ssd        # do not change unnecessarily
    # weight 83.832
    alg straw2
    hash 0    # rjenkins1
    item inc1pve25 weight 6.986
    item inc1pve26 weight 6.986
    item inc1pve27 weight 6.986
    item inc1pve28 weight 6.986
    item inc1pve29 weight 6.986
    item inc1pve30 weight 6.986
    item inc1pve31 weight 6.986
    item inc1pve32 weight 6.986
    item inc1pve33 weight 6.986
    item inc1pve34 weight 6.986
    item inc1pve35 weight 6.986
    item inc1pve36 weight 6.986
}

# rules
rule replicated_rule {
    id 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

Alwin · Jul 10, 2020

ermanishchawla said:
CrushMap After Applying is like this, I have taken a new dump after applying

Seems to look good. I see, that you removed the erasure coded rule as well. Did it trigger the re-balance?

ermanishchawla · Jul 10, 2020

Alwin said:
Seems to look good. I see, that you removed the erasure coded rule as well. Did it trigger the re-balance?

Yes it did rebalance

Alwin · Jul 10, 2020

ermanishchawla said:
Yes it did rebalance

I'm curious, how does it look now, ceph osd df tree?

ermanishchawla · Jul 10, 2020

Code:

ID  CLASS WEIGHT   REWEIGHT SIZE    RAW USE DATA    OMAP    META     AVAIL   %USE VAR  PGS STATUS TYPE NAME         
 -1       83.83191        -  84 TiB  65 GiB  17 GiB 742 KiB   48 GiB  84 TiB 0.08 1.00   -        root default       
 -3        6.98599        - 7.0 TiB 5.4 GiB 1.4 GiB  56 KiB  4.0 GiB 7.0 TiB 0.08 0.99   -            host inc1pve25
  0   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 340 MiB  16 KiB 1024 MiB 1.7 TiB 0.07 0.98 126     up         osd.0     
  1   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 343 MiB  16 KiB 1024 MiB 1.7 TiB 0.07 0.98 125     up         osd.1     
  2   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 398 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.02 119     up         osd.2     
  3   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 368 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.00 127     up         osd.3     
 -5        6.98599        - 7.0 TiB 5.5 GiB 1.5 GiB  56 KiB  4.0 GiB 7.0 TiB 0.08 1.00   -            host inc1pve26
  4   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 391 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.01 125     up         osd.4     
  5   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 398 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.02 130     up         osd.5     
  6   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 394 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.02 118     up         osd.6     
  7   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 309 MiB   8 KiB 1024 MiB 1.7 TiB 0.07 0.96 115     up         osd.7     
 -7        6.98599        - 7.0 TiB 5.4 GiB 1.4 GiB  68 KiB  4.0 GiB 7.0 TiB 0.08 0.99   -            host inc1pve27
  8   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 298 MiB  16 KiB 1024 MiB 1.7 TiB 0.07 0.95 130     up         osd.8     
  9   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 348 MiB  20 KiB 1024 MiB 1.7 TiB 0.07 0.98 120     up         osd.9     
 10   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 356 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 0.99 125     up         osd.10     
 11   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 424 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.04 127     up         osd.11     
 -9        6.98599        - 7.0 TiB 5.4 GiB 1.4 GiB  32 KiB  4.0 GiB 7.0 TiB 0.08 1.00   -            host inc1pve28
 12   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 424 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.04 133     up         osd.12     
 13   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 359 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 0.99 137     up         osd.13     
 14   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 348 MiB   8 KiB 1024 MiB 1.7 TiB 0.07 0.98 122     up         osd.14     
 15   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 346 MiB   8 KiB 1024 MiB 1.7 TiB 0.07 0.98 114     up         osd.15     
-11        6.98599        - 7.0 TiB 5.3 GiB 1.3 GiB  76 KiB  4.0 GiB 7.0 TiB 0.07 0.98   -            host inc1pve29
 16   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 332 MiB  28 KiB 1024 MiB 1.7 TiB 0.07 0.97 118     up         osd.16     
 17   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 384 MiB  24 KiB 1024 MiB 1.7 TiB 0.08 1.01 124     up         osd.17     
 18   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 354 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 0.99 131     up         osd.18     
 19   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 311 MiB  16 KiB 1024 MiB 1.7 TiB 0.07 0.96 114     up         osd.19     
-13        6.98599        - 7.0 TiB 5.6 GiB 1.6 GiB  56 KiB  4.0 GiB 7.0 TiB 0.08 1.03   -            host inc1pve30
 20   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 431 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.04 146     up         osd.20     
 21   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 382 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.01 127     up         osd.21     
 22   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 432 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.04 132     up         osd.22     
 23   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 391 MiB  24 KiB 1024 MiB 1.7 TiB 0.08 1.01 138     up         osd.23     
-15        6.98599        - 7.0 TiB 5.6 GiB 1.6 GiB  60 KiB  4.0 GiB 7.0 TiB 0.08 1.02   -            host inc1pve31
 24   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 360 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 0.99 128     up         osd.24     
 25   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 394 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.02 129     up         osd.25     
 26   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 392 MiB  12 KiB 1024 MiB 1.7 TiB 0.08 1.01 129     up         osd.26     
 27   ssd  1.74699  1.00000 1.7 TiB 1.5 GiB 471 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.07 154     up         osd.27     
-17        6.98599        - 7.0 TiB 5.5 GiB 1.5 GiB  66 KiB  4.0 GiB 7.0 TiB 0.08 1.01   -            host inc1pve32
 28   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 390 MiB  26 KiB 1024 MiB 1.7 TiB 0.08 1.01 125     up         osd.28     
 29   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 380 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.01 135     up         osd.29     
 30   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 430 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.04 111     up         osd.30     
 31   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 356 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 0.99 136     up         osd.31     
-19        6.98599        - 7.0 TiB 5.3 GiB 1.3 GiB  84 KiB  4.0 GiB 7.0 TiB 0.07 0.98   -            host inc1pve33
 32   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 345 MiB  16 KiB 1024 MiB 1.7 TiB 0.07 0.98 119     up         osd.32     
 33   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 277 MiB  28 KiB 1024 MiB 1.7 TiB 0.07 0.93 124     up         osd.33     
 34   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 386 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.01 135     up         osd.34     
 35   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 358 MiB  24 KiB 1024 MiB 1.7 TiB 0.08 0.99 138     up         osd.35     
-21        6.98599        - 7.0 TiB 5.5 GiB 1.5 GiB  64 KiB  4.0 GiB 7.0 TiB 0.08 1.01   -            host inc1pve34
 36   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 394 MiB  24 KiB 1024 MiB 1.7 TiB 0.08 1.02 129     up         osd.36     
 37   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 399 MiB  16 KiB 1024 MiB 1.7 TiB 0.08 1.02 136     up         osd.37     
 38   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 330 MiB  16 KiB 1024 MiB 1.7 TiB 0.07 0.97 125     up         osd.38     
 39   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 389 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.01 132     up         osd.39     
-23        6.98599        - 7.0 TiB 5.4 GiB 1.4 GiB  56 KiB  4.0 GiB 7.0 TiB 0.08 0.99   -            host inc1pve35
 40   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 384 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.01 129     up         osd.40     
 41   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 392 MiB  24 KiB 1024 MiB 1.7 TiB 0.08 1.02 131     up         osd.41     
 42   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 415 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.03 138     up         osd.42     
 43   ssd  1.74699  1.00000 1.7 TiB 1.2 GiB 228 MiB  16 KiB 1024 MiB 1.7 TiB 0.07 0.90  97     up         osd.43     
-25        6.98599        - 7.0 TiB 5.4 GiB 1.4 GiB  68 KiB  4.0 GiB 7.0 TiB 0.08 1.00   -            host inc1pve36
 44   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 368 MiB  24 KiB 1024 MiB 1.7 TiB 0.08 1.00 132     up         osd.44     
 45   ssd  1.74699  1.00000 1.7 TiB 1.3 GiB 345 MiB   8 KiB 1024 MiB 1.7 TiB 0.07 0.98 123     up         osd.45     
 46   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 366 MiB  28 KiB 1024 MiB 1.7 TiB 0.08 1.00 129     up         osd.46     
 47   ssd  1.74699  1.00000 1.7 TiB 1.4 GiB 396 MiB   8 KiB 1024 MiB 1.7 TiB 0.08 1.02 157     up         osd.47     
                      TOTAL  84 TiB  65 GiB  17 GiB 751 KiB   48 GiB  84 TiB 0.08

Alwin · Jul 10, 2020

And now how does the recovery work? Still same as before?

ermanishchawla · Jul 10, 2020

Yes it is same
1 node down ( ie 4 OSD down) ==> around 10 seconds-no write
2 node down ( ie 8 OSD Down) ==> 10 mints no write, not able to login to VM's also

ceph status

cluster:

id: b020e833-3252-416a-b904-40bb4c97af5e

health: HEALTH_WARN

8 osds down

2 hosts (8 osds) down

Reduced data availability: 95 pgs inactive

Degraded data redundancy: 10418/61662 objects degraded (16.895%), 941 pgs degraded, 941 pgs undersized

2/12 mons down, quorum inc1pve25,inc1pve29,inc1pve32,inc1pve26,inc1pve33,inc1pve34,inc1pve28,inc1pve30,inc1pve31,inc1pve27

services:

mon: 12 daemons, quorum inc1pve25,inc1pve29,inc1pve32,inc1pve26,inc1pve33,inc1pve34,inc1pve28,inc1pve30,inc1pve31,inc1pve27 (age 2m), out of quorum: inc1pve35, inc1pve36

mgr: inc1pve30(active, since 23h), standbys: inc1pve27, inc1pve33, inc1pve29, inc1pve34, inc1pve31, inc1pve28, inc1pve32, inc1pve26, inc1pve25

osd: 48 osds: 40 up (since 2m), 48 in (since 7h)

data:

pools: 1 pools, 2048 pgs

objects: 20.55k objects, 80 GiB

usage: 276 GiB used, 84 TiB / 84 TiB avail

pgs: 4.639% pgs not active

10418/61662 objects degraded (16.895%)

1107 active+clean

846 active+undersized+degraded

95 undersized+degraded+peered

io:

client: 1.3 KiB/s rd, 11 KiB/s wr, 0 op/s rd, 1 op/s wr

ermanishchawla · Jul 10, 2020

10 mints later

Code:

2020-07-10 14:37:41.961802 mon.inc1pve25 [INF] Marking osd.44 out (has been down for 606 seconds)
2020-07-10 14:37:41.961822 mon.inc1pve25 [INF] Marking osd.45 out (has been down for 606 seconds)
2020-07-10 14:37:41.961831 mon.inc1pve25 [INF] Marking osd.46 out (has been down for 606 seconds)
2020-07-10 14:37:41.962112 mon.inc1pve25 [WRN] Health check update: 5 osds down (OSD_DOWN)
2020-07-10 14:37:42.967194 mon.inc1pve25 [WRN] Health check update: Reduced data availability: 95 pgs inactive, 284 pgs peering (PG_AVAILABILITY)
2020-07-10 14:37:42.967230 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 7247/63528 objects degraded (11.408%), 657 pgs degraded, 657 pgs undersized (PG_DEGRADED)
2020-07-10 14:37:49.006568 mon.inc1pve25 [WRN] Health check update: Reduced data availability: 45 pgs inactive, 8 pgs peering (PG_AVAILABILITY)
2020-07-10 14:37:49.006607 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 161038/64212 objects degraded (250.791%), 925 pgs degraded, 611 pgs undersized (PG_DEGRADED)
2020-07-10 14:37:51.967919 mon.inc1pve25 [INF] Marking osd.47 out (has been down for 616 seconds)
2020-07-10 14:37:51.969576 mon.inc1pve25 [WRN] Health check update: 4 osds down (OSD_DOWN)
2020-07-10 14:37:51.969591 mon.inc1pve25 [WRN] Health check update: 1 host (4 osds) down (OSD_HOST_DOWN)
2020-07-10 14:37:55.034533 mon.inc1pve25 [WRN] Health check update: Reduced data availability: 34 pgs inactive (PG_AVAILABILITY)
2020-07-10 14:37:55.034560 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 135513/65280 objects degraded (207.587%), 760 pgs degraded, 457 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:01.027396 mon.inc1pve25 [WRN] Health check update: Reduced data availability: 2 pgs inactive (PG_AVAILABILITY)
2020-07-10 14:38:01.027426 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 177979/67902 objects degraded (262.112%), 880 pgs degraded, 450 pgs undersized (PG_DEGRADED)

ermanishchawla · Jul 10, 2020

Code:

2020-07-10 14:38:06.979125 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 149781/71793 objects degraded (208.629%), 854 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:11.983988 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 129631/75819 objects degraded (170.974%), 822 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:16.989025 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 114372/79758 objects degraded (143.399%), 798 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:21.992618 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 98028/83895 objects degraded (116.846%), 767 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:26.996176 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 82823/87864 objects degraded (94.263%), 742 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:31.999029 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 66359/91980 objects degraded (72.145%), 715 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:37.001763 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 46712/95796 objects degraded (48.762%), 679 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:42.004310 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 31925/99894 objects degraded (31.959%), 647 pgs degraded, 450 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:42.004579 mon.inc1pve25 [INF] Marking osd.42 out (has been down for 609 seconds)
2020-07-10 14:38:42.004694 mon.inc1pve25 [INF] Marking osd.43 out (has been down for 609 seconds)
2020-07-10 14:38:42.005753 mon.inc1pve25 [WRN] Health check update: 2 osds down (OSD_DOWN)
2020-07-10 14:38:47.007066 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 23662/103791 objects degraded (22.798%), 416 pgs degraded, 340 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:47.007212 mon.inc1pve25 [INF] Marking osd.40 out (has been down for 614 seconds)
2020-07-10 14:38:47.007236 mon.inc1pve25 [INF] Marking osd.41 out (has been down for 614 seconds)
2020-07-10 14:38:47.009566 mon.inc1pve25 [INF] Health check cleared: OSD_DOWN (was: 2 osds down)
2020-07-10 14:38:47.009590 mon.inc1pve25 [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (4 osds) down)
2020-07-10 14:38:52.011032 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 137458/106446 objects degraded (129.134%), 289 pgs degraded, 35 pgs undersized (PG_DEGRADED)
2020-07-10 14:38:57.017275 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 298164/108906 objects degraded (273.781%), 549 pgs degraded, 48 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:02.022641 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 278168/113028 objects degraded (246.105%), 528 pgs degraded, 47 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:07.027801 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 251581/117018 objects degraded (214.993%), 499 pgs degraded, 47 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:12.032889 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 225042/121071 objects degraded (185.876%), 475 pgs degraded, 46 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:17.037779 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 198412/124908 objects degraded (158.847%), 439 pgs degraded, 44 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:22.042974 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 176412/128871 objects degraded (136.890%), 409 pgs degraded, 42 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:27.047808 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 156587/132912 objects degraded (117.813%), 389 pgs degraded, 40 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:32.052497 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 134052/136629 objects degraded (98.114%), 361 pgs degraded, 40 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:37.056955 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 115024/140493 objects degraded (81.872%), 338 pgs degraded, 40 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:42.061436 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 95990/144192 objects degraded (66.571%), 311 pgs degraded, 40 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:47.065318 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 79483/148044 objects degraded (53.689%), 279 pgs degraded, 103 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:52.068893 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 62951/151632 objects degraded (41.516%), 239 pgs degraded, 243 pgs undersized (PG_DEGRADED)
2020-07-10 14:39:57.072576 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 48938/155418 objects degraded (31.488%), 192 pgs degraded, 199 pgs undersized (PG_DEGRADED)
2020-07-10 14:40:02.077478 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 38807/159288 objects degraded (24.363%), 152 pgs degraded, 157 pgs undersized (PG_DEGRADED)
2020-07-10 14:40:07.080123 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 24105/163149 objects degraded (14.775%), 119 pgs degraded, 124 pgs undersized (PG_DEGRADED)
2020-07-10 14:40:12.082377 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 12401/166839 objects degraded (7.433%), 90 pgs degraded, 92 pgs undersized (PG_DEGRADED)
2020-07-10 14:40:17.084382 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 2539/170652 objects degraded (1.488%), 64 pgs degraded, 67 pgs undersized (PG_DEGRADED)
2020-07-10 14:40:22.086310 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 540/174360 objects degraded (0.310%), 22 pgs degraded, 25 pgs undersized (PG_DEGRADED)
2020-07-10 14:40:27.088485 mon.inc1pve25 [WRN] Health check update: Degraded data redundancy: 44/178014 objects degraded (0.025%), 3 pgs degraded, 4 pgs undersized (PG_DEGRADED)
2020-07-10 14:40:31.416794 mon.inc1pve25 [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 37/181530 objects degraded (0.020%), 2 pgs degraded, 2 pgs undersized)

Alwin · Jul 13, 2020

ermanishchawla said:
Yes it is same
1 node down ( ie 4 OSD down) ==> around 10 seconds-no write
2 node down ( ie 8 OSD Down) ==> 10 mints no write, not able to login to VM's also

How are the VM disks configured (virtio, scsi, ide)?

ermanishchawla · Jul 13, 2020

Alwin said:
How are the VM disks configured (virtio, scsi, ide)?

@inc1pve25:~# qm config 3600
agent: 1
bootdisk: scsi0
cores: 10
cpu: kvm64
ide2: none,media=cdrom
memory: 20480
name: server1
net0: virtio=EA

2:42:2B:F4:43,bridge=vmbr0,firewall=1
net1: virtio=F6:28:36

7:04

A,bridge=vmbr3010,firewall=1
numa: 1
onboot: 1
ostype: l26
scsi0: vm:vm-3600-disk-0,cache=write

Alwin · Jul 13, 2020

ermanishchawla said:
scsi0: vm:vm-3600-disk-0,cache=write

Is suppose the cache is set to writeback.

Try to change the controller to VirtIO SCSI. And add iothread (1x thread/disk), as SDD and discard (less data allocated) to the disks. It will not get rid of the stalled IO, but maybe it gets minimized. In the end the IO may stall for some time, but they should continue normally once the primary OSD changes for a degraded PG.

ermanishchawla · Jul 13, 2020

Alwin said:
Is suppose the cache is set to writeback. Try to change the controller to VirtIO SCSI. And add iothread (1x thread/disk), as SDD and discard (less data allocated) to the disks. It will not get rid of the stalled IO, but maybe it gets minimized. In the end the IO may stall for some time, but they should continue normally once the primary OSD changes for a degraded PG.

Controller is virtio SCSI only do you mean virtio SCSI single and yes cache is write back, missed while copying

Alwin · Jul 13, 2020

ermanishchawla said:
Controller is virtio SCSI only do you mean virtio SCSI single and yes cache is write back, missed while copying

Ok. It is just not visible on that config. Then no IO should be lost, just the time needed to switch. Maybe also bigger librbd caches could minimize the failover time. But besides that, you will not get around it completely.

Ceph unstable Behaviour causing VM hanging

Proxmox Retired Staff

Well-Known Member

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Well-Known Member

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Well-Known Member

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

We value your privacy