Ceph Octopus - Monitor sometimes inconsistent

Jun 8, 2016
344
74
93
48
Johannesburg, South Africa
We appear to have an inconsistent experience with one of the monitors sometimes appearing to miss behave. Ceph health shows a warning with slow operations:
Code:
[admin@kvm6b ~]# ceph -s
  cluster:
    id:     2a554db9-5d56-4d6a-a1e2-e4f98ef1052f
    health: HEALTH_WARN
            17 slow ops, oldest one blocked for 337272 sec, mon.kvm6a has slow ops

  services:
    mon: 3 daemons, quorum kvm6a,kvm6b,kvm6c (age 3d)
    mgr: kvm6a(active, since 4d), standbys: kvm6b, kvm6c
    mds: cephfs:1 {0=kvm6c=up:active} 2 up:standby
    osd: 24 osds: 24 up (since 3d), 24 in

  task status:
    scrub status:
        mds.kvm6c: idle

  data:
    pools:   6 pools, 225 pgs
    objects: 602.79k objects, 1.9 TiB
    usage:   4.9 TiB used, 30 TiB / 35 TiB avail
    pgs:     225 active+clean

  io:
    client:   15 KiB/s rd, 21 MiB/s wr, 33 op/s rd, 561 op/s wr
    cache:    14 MiB/s flush


Logging in to kvm6a and then run:
Code:
systemctl restart ceph-mon@kvm6a


Thereafter everything is immediately healthy:
Code:
[admin@kvm6b ~]# ceph -s
  cluster:
    id:     2a554db9-5d56-4d6a-a1e2-e4f98ef1052f
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum kvm6a,kvm6b,kvm6c (age 12m)
    mgr: kvm6a(active, since 13m), standbys: kvm6b, kvm6c
    mds: cephfs:1 {0=kvm6c=up:active} 2 up:standby
    osd: 24 osds: 24 up (since 4d), 24 in

  task status:
    scrub status:
        mds.kvm6c: idle

  data:
    pools:   6 pools, 225 pgs
    objects: 605.09k objects, 1.9 TiB
    usage:   4.9 TiB used, 30 TiB / 35 TiB avail
    pgs:     225 active+clean

  io:
    client:   9.3 KiB/s rd, 6.7 MiB/s wr, 30 op/s rd, 75 op/s wr
    cache:    4.7 MiB/s flush, 0 op/s promote


All updates have been installed on all servers:
Code:
[admin@kvm6a ~]# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.65-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
pve-kernel-5.4.65-1-pve: 5.4.65-1
ceph: 15.2.6-pve1
ceph-fuse: 15.2.6-pve1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
openvswitch-switch: 2.12.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1


PS: Any news on Ceph 15.2.7?
 
17 slow ops, oldest one blocked for 337272 sec, mon.kvm6a has slow ops
The ceph-mon log should show you the operation that takes long.

Depending from which version Ceph was upgraded, the change for buffered IO could be taking effect.
https://forum.proxmox.com/threads/c...very-night-slowing-down-vms.71573/post-355081

PS: Any news on Ceph 15.2.7?
Are you running RGW? That's the only fix mentioned in the release notes.
https://docs.ceph.com/en/latest/releases/octopus/
 
Hi Alwin,

Thanks for the feedback, issue appears to affect Octopus monitors so I would expect more reports along these lines once more and more people upgrade from Nautilus. Outstanding operations were stuck for almost 4 days whilst everything was active+clean with virtually no load.

To answer your question though, yes we make use of RGW to handle incremental differential backups to a separate cluster using Benjie (https://benji-backup.me/).


Regards
David Herselman
 
Thanks for the feedback, issue appears to affect Octopus monitors so I would expect more reports along these lines once more and more people upgrade from Nautilus. Outstanding operations were stuck for almost 4 days whilst everything was active+clean with virtually no load.
Can you please be more specific about what operations were stuck? Since slow operations can also happen in Nautilus or lower versions.
 
Hi Alwin,

The slow ops were reported by only one of the 3 monitors after having restarted OSD 22. The warnings continued for 4+ days until only that monitor was restarted. During this time all placement groups were active+clean, as shown in the first message above.

My understanding of the logs from the problematic monitor (kvm6a) is that it indicates kvm6c has slow operations whereas neither kvm6b nor kvm6c log anything. This is as per '2020-12-04T12:52:11.349+0200' in the monitor log from kvm6a:


kvm6a:
Code:
2020-12-04T12:52:04.329+0200 7f66797f9700  1 mon.kvm6a@0(leader).osd e57825 e57825: 24 total, 24 up, 24 in
2020-12-04T12:52:04.333+0200 7f66797f9700  0 log_channel(cluster) log [DBG] : osdmap e57825: 24 total, 24 up, 24 in
2020-12-04T12:52:05.313+0200 7f667d801700  0 log_channel(cluster) log [INF] : Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg peering)
2020-12-04T12:52:07.321+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: Degraded data redundancy: 123239/1819005 objects degraded (6.775%), 17 pgs degraded (PG_DEGRADED)
2020-12-04T12:52:07.849+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:08.717+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:08.717+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/3647236440' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:08.765+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:08.765+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/2551954269' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:09.341+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: 1 slow ops, oldest one blocked for 32 sec, mon.kvm6c has slow ops (SLOW_OPS)
2020-12-04T12:52:09.341+0200 7f667d801700  0 log_channel(cluster) log [INF] : Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 123239/1819005 objects degraded (6.775%), 17 pgs degraded)
2020-12-04T12:52:11.349+0200 7f667d801700  0 log_channel(cluster) log [INF] : Health check cleared: SLOW_OPS (was: 1 slow ops, oldest one blocked for 32 sec, mon.kvm6c has slow ops)
2020-12-04T12:52:12.857+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:17.861+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:17.861+0200 7f667d801700 -1 mon.kvm6a@0(leader) e6 get_health_metrics reporting 9 slow ops, oldest is osd_failure(failed timeout osd.22 [v2:10.254.1.3:6802/4170055,v1:10.254.1.3:6804/4170055] for 21sec e57816 v57816)
2020-12-04T12:52:19.033+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:19.033+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/3731941632' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:19.085+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:19.085+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/1519995793' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:20.901+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check failed: 9 slow ops, oldest one blocked for 32 sec, mon.kvm6a has slow ops (SLOW_OPS)
2020-12-04T12:52:22.865+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:22.865+0200 7f667d801700 -1 mon.kvm6a@0(leader) e6 get_health_metrics reporting 17 slow ops, oldest is osd_failure(failed timeout osd.22 [v2:10.254.1.3:6802/4170055,v1:10.254.1.3:6804/4170055] for 21sec e57816 v57816)
2020-12-04T12:52:22.961+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: Slow OSD heartbeats on back (longest 3967.927ms) (OSD_SLOW_PING_TIME_BACK)
2020-12-04T12:52:26.885+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: Slow OSD heartbeats on front (longest 7017.555ms) (OSD_SLOW_PING_TIME_FRONT)
2020-12-04T12:52:27.865+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: 17 slow ops, oldest one blocked for 37 sec, mon.kvm6a has slow ops (SLOW_OPS)
2020-12-04T12:52:27.865+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:27.869+0200 7f667d801700 -1 mon.kvm6a@0(leader) e6 get_health_metrics reporting 17 slow ops, oldest is osd_failure(failed timeout osd.22 [v2:10.254.1.3:6802/4170055,v1:10.254.1.3:6804/4170055] for 21sec e57816 v57816)
2020-12-04T12:52:28.329+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:28.329+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/3076981816' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:28.369+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:28.369+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/4089655273' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:32.869+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: Slow OSD heartbeats on back (longest 7478.382ms) (OSD_SLOW_PING_TIME_BACK)
2020-12-04T12:52:32.869+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: 17 slow ops, oldest one blocked for 42 sec, mon.kvm6a has slow ops (SLOW_OPS)
2020-12-04T12:52:32.869+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:32.873+0200 7f667d801700 -1 mon.kvm6a@0(leader) e6 get_health_metrics reporting 17 slow ops, oldest is osd_failure(failed timeout osd.22 [v2:10.254.1.3:6802/4170055,v1:10.254.1.3:6804/4170055] for 21sec e57816 v57816)
2020-12-04T12:52:37.873+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: 17 slow ops, oldest one blocked for 47 sec, mon.kvm6a has slow ops (SLOW_OPS)
2020-12-04T12:52:37.873+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:37.877+0200 7f667d801700 -1 mon.kvm6a@0(leader) e6 get_health_metrics reporting 17 slow ops, oldest is osd_failure(failed timeout osd.22 [v2:10.254.1.3:6802/4170055,v1:10.254.1.3:6804/4170055] for 21sec e57816 v57816)
2020-12-04T12:52:38.621+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:38.621+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/3482483891' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:38.665+0200 7f667affc700  0 mon.kvm6a@0(leader) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:38.665+0200 7f667affc700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.2:0/399616868' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:42.877+0200 7f667d801700  0 log_channel(cluster) log [WRN] : Health check update: 17 slow ops, oldest one blocked for 52 sec, mon.kvm6a has slow ops (SLOW_OPS)
2020-12-04T12:52:42.877+0200 7f667d801700  1 mon.kvm6a@0(leader).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:42.881+0200 7f667d801700 -1 mon.kvm6a@0(leader) e6 get_health_metrics reporting 17 slow ops, oldest is osd_failure(failed timeout osd.22 [v2:10.254.1.3:6802/4170055,v1:10.254.1.3:6804/4170055] for 21sec e57816 v57816)


kvm6b:
Code:
2020-12-04T12:52:03.295+0200 7fc1ccbbd700  1 mon.kvm6b@1(peon).osd e57824 e57824: 24 total, 24 up, 24 in
2020-12-04T12:52:03.299+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"prefix": "osd metadata", "id": 32} v 0) v1
2020-12-04T12:52:03.299+0200 7fc1ccbbd700  0 log_channel(audit) log [DBG] : from='mgr.190540688 10.254.1.2:0/2009496' entity='mgr.kvm6a' cmd=[{"prefix": "osd metadata", "id": 32}]: dispatch
2020-12-04T12:52:04.359+0200 7fc1ccbbd700  1 mon.kvm6b@1(peon).osd e57825 e57825: 24 total, 24 up, 24 in
2020-12-04T12:52:04.943+0200 7fc1cf3c2700  1 mon.kvm6b@1(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:06.595+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:06.599+0200 7fc1ccbbd700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.3:0/1357251391' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:06.647+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:06.647+0200 7fc1ccbbd700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.3:0/621942878' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:09.947+0200 7fc1cf3c2700  1 mon.kvm6b@1(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:14.947+0200 7fc1cf3c2700  1 mon.kvm6b@1(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:16.847+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:16.851+0200 7fc1ccbbd700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.3:0/2385410752' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:16.895+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:16.895+0200 7fc1ccbbd700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.3:0/1603647083' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:19.947+0200 7fc1cf3c2700  1 mon.kvm6b@1(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:24.947+0200 7fc1cf3c2700  1 mon.kvm6b@1(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:26.079+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:26.079+0200 7fc1ccbbd700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.3:0/3537921673' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:26.119+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:26.119+0200 7fc1ccbbd700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.3:0/4155055808' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:29.947+0200 7fc1cf3c2700  1 mon.kvm6b@1(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:34.951+0200 7fc1cf3c2700  1 mon.kvm6b@1(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:36.299+0200 7fc1ccbbd700  0 mon.kvm6b@1(peon) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
 
kvm6c:
Code:
2020-12-04T12:52:04.363+0200 7ff0f1dc4700  1 mon.kvm6c@2(peon).osd e57825 e57825: 24 total, 24 up, 24 in
2020-12-04T12:52:04.815+0200 7ff0f45c9700  1 mon.kvm6c@2(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:06.547+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:06.547+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/2772873966' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:06.603+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:06.603+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/605218279' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:09.816+0200 7ff0f45c9700  1 mon.kvm6c@2(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:14.816+0200 7ff0f45c9700  1 mon.kvm6c@2(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:16.848+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:16.848+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/458464263' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:16.896+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:16.896+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/3072053022' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:19.820+0200 7ff0f45c9700  1 mon.kvm6c@2(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:24.816+0200 7ff0f45c9700  1 mon.kvm6c@2(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:26.124+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:26.124+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/2451772925' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:26.168+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"prefix":"df","format":"json"} v 0) v1
2020-12-04T12:52:26.168+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/1877622460' entity='client.admin' cmd=[{"prefix":"df","format":"json"}]: dispatch
2020-12-04T12:52:29.820+0200 7ff0f45c9700  1 mon.kvm6c@2(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:34.816+0200 7ff0f45c9700  1 mon.kvm6c@2(peon).osd e57825 _set_new_cache_sizes cache_size:1020054731 inc_alloc: 71303168 full_alloc: 553648128 kv_alloc: 390070272
2020-12-04T12:52:36.392+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:36.392+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/4213175342' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
2020-12-04T12:52:36.440+0200 7ff0f1dc4700  0 mon.kvm6c@2(peon) e6 handle_command mon_command({"format":"json","prefix":"df"} v 0) v1
2020-12-04T12:52:36.440+0200 7ff0f1dc4700  0 log_channel(audit) log [DBG] : from='client.? 10.254.1.4:0/3635481511' entity='client.admin' cmd=[{"format":"json","prefix":"df"}]: dispatch
 
2020-12-04T12:52:09.341+0200 7f667d801700 0 log_channel(cluster) log [WRN] : Health check update: 1 slow ops, oldest one blocked for 32 sec, mon.kvm6c has slow ops (SLOW_OPS)
kvm6a seems to complain about the other MONs as well.

2020-12-04T12:52:17.861+0200 7f667d801700 -1 mon.kvm6a@0(leader) e6 get_health_metrics reporting 9 slow ops, oldest is osd_failure(failed timeout osd.22 [v2:10.254.1.3:6802/4170055,v1:10.254.1.3:6804/4170055] for 21sec e57816 v57816)
What did the OSD.22 get stuck on? Maybe the MON still had requests for that OSD in the queue that timed out (> 30 sec).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!