Ceph down if one node is down

Discussion in 'Proxmox VE: Installation and configuration' started by bizzarrone, Jan 24, 2019.

  1. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    Good evening,
    after some tests, I discovered that if 1 of 4 nodes goes down, the disk IO stucks.
    VM and CT are still up but no disk of them are available for I/O.

    I have 3 ceph monitors.
    When I reboot the node, on ceph logs:

    Code:
    2019-01-24 10:28:08.240463 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175770 : cluster [INF] osd.2 marked itself down
    2019-01-24 10:28:08.276487 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175771 : cluster [WRN] Health check failed: 1 filesystem is degraded (FS_DEGRADED)
    2019-01-24 10:28:08.286445 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175773 : cluster [INF] Standby daemon mds.bluehub-prox05 assigned to filesystem cephfs as rank 0
    2019-01-24 10:28:09.238925 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175776 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
    2019-01-24 10:28:09.238987 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175777 : cluster [WRN] Health check failed: 1 host (1 osds) down (OSD_HOST_DOWN)
    2019-01-24 10:28:10.427732 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175784 : cluster [INF] daemon mds.bluehub-prox05 is now active in filesystem cephfs as rank 0
    2019-01-24 10:28:11.401683 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175785 : cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)
    2019-01-24 10:28:11.402667 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175786 : cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)
    2019-01-24 10:28:11.402698 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 175787 : cluster [WRN] Health check failed: Degraded data redundancy: 3660/1823339 objects degraded (0.201%), 14 pgs degraded (PG_DEGRADED)
    2019-01-24 10:28:24.577475 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8302 : cluster [INF] mon.bluehub-prox03 calling monitor election
    2019-01-24 10:28:24.595828 mon.bluehub-prox05 mon.2 10.9.9.5:6789/0 1879958 : cluster [INF] mon.bluehub-prox05 calling monitor election
    2019-01-24 10:28:34.598958 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8303 : cluster [INF] mon.bluehub-prox03 is new leader, mons bluehub-prox03,bluehub-prox05 in quorum (ranks 1,2)
    2019-01-24 10:28:34.625022 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8308 : cluster [WRN] Health check failed: 1/3 mons down, quorum bluehub-prox03,bluehub-prox05 (MON_DOWN)
    2019-01-24 10:28:34.642025 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8310 : cluster [WRN] overall HEALTH_WARN 1 osds down; 1 host (1 osds) down; 22/1823339 objects misplaced (0.001%); Reduced data availability: 1 pg peering; Degraded data redundancy: 67494/1823339 objects degraded (3.702%), 188 pgs degraded; 1/3 mons down, quorum bluehub-prox03,bluehub-prox05
    2019-01-24 10:28:34.676528 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8311 : cluster [WRN] Health check update: 22/1823513 objects misplaced (0.001%) (OBJECT_MISPLACED)
    2019-01-24 10:28:34.676588 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8312 : cluster [WRN] Health check update: Degraded data redundancy: 68479/1823513 objects degraded (3.755%), 199 pgs degraded (PG_DEGRADED)
    2019-01-24 10:28:34.676648 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8313 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg peering)
    2019-01-24 10:29:09.215151 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8318 : cluster [WRN] Health check update: 22/1823515 objects misplaced (0.001%) (OBJECT_MISPLACED)
    2019-01-24 10:29:09.215200 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8319 : cluster [WRN] Health check update: Degraded data redundancy: 68479/1823515 objects degraded (3.755%), 199 pgs degraded (PG_DEGRADED)
    2019-01-24 10:29:11.285465 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8320 : cluster [WRN] Health check failed: Reduced data availability: 76 pgs inactive (PG_AVAILABILITY)
    2019-01-24 10:29:15.441692 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8321 : cluster [WRN] Health check update: Degraded data redundancy: 68479/1823515 objects degraded (3.755%), 199 pgs degraded, 203 pgs undersized (PG_DEGRADED)
    2019-01-24 10:32:24.933885 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 1 : cluster [INF] mon.bluehub-prox02 calling monitor election
    2019-01-24 10:32:24.938282 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 2 : cluster [INF] mon.bluehub-prox02 calling monitor election
    2019-01-24 10:32:24.979618 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 3 : cluster [INF] mon.bluehub-prox02 is new leader, mons bluehub-prox02,bluehub-prox03,bluehub-prox05 in quorum (ranks 0,1,2)
    2019-01-24 10:32:25.002587 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 4 : cluster [WRN] mon.2 10.9.9.5:6789/0 clock skew 0.491436s > max 0.05s
    2019-01-24 10:32:25.002706 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 5 : cluster [WRN] mon.1 10.9.9.3:6789/0 clock skew 0.491068s > max 0.05s
    2019-01-24 10:32:25.009771 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 10 : cluster [WRN] Health check failed: clock skew detected on mon.bluehub-prox03, mon.bluehub-prox05 (MON_CLOCK_SKEW)
    2019-01-24 10:32:25.009805 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 11 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum bluehub-prox03,bluehub-prox05)
    2019-01-24 10:32:25.010647 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 12 : cluster [WRN] message from mon.2 was stamped 0.491892s in the future, clocks not synchronized
    2019-01-24 10:32:25.022385 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 13 : cluster [WRN] overall HEALTH_WARN 1 osds down; 1 host (1 osds) down; 22/1823515 objects misplaced (0.001%); Reduced data availability: 76 pgs inactive; Degraded data redundancy: 68479/1823515 objects degraded (3.755%), 199 pgs degraded, 203 pgs undersized; clock skew detected on mon.bluehub-prox03, mon.bluehub-prox05
    2019-01-24 10:32:25.428905 mon.bluehub-prox05 mon.2 10.9.9.5:6789/0 1880004 : cluster [INF] mon.bluehub-prox05 calling monitor election
    2019-01-24 10:32:25.429032 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 8348 : cluster [INF] mon.bluehub-prox03 calling monitor election
    2019-01-24 10:32:29.988287 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 16 : cluster [INF] Manager daemon bluehub-prox05 is unresponsive. No standby daemons available.
    2019-01-24 10:32:29.988376 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 17 : cluster [WRN] Health check failed: no active mgr (MGR_DOWN)
    Here my network:

    Code:
    cat /etc/network/interfaces
    auto lo
    iface lo inet loopback
    
    iface eno1 inet manual
    #Production
    
    auto vmbr0
    iface vmbr0 inet static
        address 10.169.136.75
        netmask 255.255.255.128
        gateway 10.169.136.1
        bridge_ports eno1
        bridge_stp off
        bridge_fd 0
    
    iface eno2 inet manual
    
    iface enp0s29f0u2 inet manual
    
    iface ens6f0 inet manual
    
    iface ens6f1 inet manual
    
    iface ens2f0 inet manual
    
    iface ens2f1 inet manual
    
    auto vlan1050
    iface vlan1050 inet static
            vlan_raw_device ens2f0
            address  10.9.9.1
            netmask  255.255.255.0
            network  10.9.9.0
    #Ceph
    
    auto vlan1048
    iface vlan1048 inet static
        vlan_raw_device ens2f0
            address  10.1.1.1
            netmask  255.255.255.0
        network  10.1.1.0
    #Cluster
    
    Here the cluster status:

    Code:
    Quorum information
    ------------------
    Date:             Wed Jan 23 17:09:11 2019
    Quorum provider:  corosync_votequorum
    Nodes:            4
    Node ID:          0x00000001
    Ring ID:          1/4580
    Quorate:          Yes
    
    Votequorum information
    ----------------------
    Expected votes:   4
    Highest expected: 4
    Total votes:      4
    Quorum:           3 
    Flags:            Quorate
    
    Membership information
    ----------------------
        Nodeid      Votes Name
    0x00000001          1 10.1.1.1 (local)
    0x00000002          1 10.1.1.2
    0x00000003          1 10.1.1.3
    0x00000004          1 10.1.1.5
    
    My packages versions:

    Code:
    proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve)
    pve-manager: 5.3-8 (running version: 5.3-8/2929af8e)
    pve-kernel-4.15: 5.3-1
    pve-kernel-4.15.18-10-pve: 4.15.18-32
    pve-kernel-4.15.18-9-pve: 4.15.18-30
    ceph: 12.2.10-pve1
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-3
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-43
    libpve-guest-common-perl: 2.0-19
    libpve-http-server-perl: 2.0-11
    libpve-storage-perl: 5.0-36
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-2
    lxcfs: 3.0.2-2
    novnc-pve: 1.0.0-2
    proxmox-widget-toolkit: 1.0-22
    pve-cluster: 5.0-33
    pve-container: 2.0-33
    pve-docs: 5.3-1
    pve-edk2-firmware: 1.20181023-1
    pve-firewall: 3.0-17
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-6
    pve-i18n: 1.0-9
    pve-libspice-server1: 0.14.1-1
    pve-qemu-kvm: 2.12.1-1
    pve-xtermjs: 3.10.1-1
    qemu-server: 5.0-45
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.12-pve1~bpo1
    
    is it possible that the problem could be the clock skew?

    Code:
    cluster [WRN] mon.2 10.9.9.5:6789/0 clock skew 0.491436s > max 0.05s
    
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,172
    Likes Received:
    191
    The general ceph.log doesn't show this, check your OSD logs to see more.

    One possibility, all MONs need to provide the same updated maps to clients, OSDs and MDS. Use one local timeserver (in hardware) to sync the time from. This way you can make sure, that all the nodes in the cluster have the same time.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    Thank you Alwin,
    I am using timesyncd instead of ntpd and an internal ntp server for the datacentre..
    Could it be a solution to switch to ntpd instead?
     
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,172
    Likes Received:
    191
    If you don't mind me asking, how do you conclude that?

    A node reboot usually has some clock skew and should shortly reside after finished boot (time synced).
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    bizzarrone likes this.
  5. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    I just read in some other threads.
    I will try a new reboot and check the logs of OSD
     
  6. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,172
    Likes Received:
    191
    It greatly depends how the time is synced, while timesyncd uses one source to get its time, ntpd (if not configured otherwise) will use three time servers and calculates a median time from those three. The later can lead to time drifts and cause all sorts of unwanted behavior in a cluster.

    The time should be local to the network, so that any jitter or sudden time drifts can be compensated. Further the time server should have a constant clock cycle available, as with virtual machines, the clock cycle may change (sudden drifts), depending on how much time the VM gets on the physical CPU. But in general it doesn't matter if the time is correct as long as every service in a cluster has the same time.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    Good morning,
    today I performed a new test.

    Code:
    2019-02-06 06:26:18.797253 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33445 : cluster [WRN] Health check update: 35/1833259 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:27:20.872436 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33449 : cluster [WRN] Health check update: 35/1833261 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:28:17.427051 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33453 : cluster [WRN] Health check update: 35/1833263 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:30:18.621915 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33457 : cluster [WRN] Health check update: 35/1833265 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:31:21.253005 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33458 : cluster [WRN] Health check update: 35/1833267 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:32:21.845412 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33460 : cluster [WRN] Health check update: 35/1833269 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:34:21.322243 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33463 : cluster [WRN] Health check update: 35/1833271 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:35:23.620718 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33466 : cluster [WRN] Health check update: 35/1833273 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:37:20.785147 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33470 : cluster [WRN] Health check update: 35/1833275 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:38:21.415102 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33471 : cluster [WRN] Health check update: 35/1833277 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:39:37.036114 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33473 : cluster [WRN] Health check update: 35/1833279 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:40:25.321784 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33474 : cluster [WRN] Health check update: 35/1833281 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:41:23.161330 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33478 : cluster [WRN] Health check update: 35/1833283 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:43:27.257976 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33482 : cluster [WRN] Health check update: 35/1833285 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:44:27.198086 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33483 : cluster [WRN] Health check update: 35/1833287 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:46:26.073083 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33487 : cluster [WRN] Health check update: 35/1833289 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:47:24.752603 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33488 : cluster [WRN] Health check update: 35/1833293 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:49:30.373143 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33495 : cluster [WRN] Health check update: 35/1833295 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:50:28.433518 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33498 : cluster [WRN] Health check update: 35/1833297 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:52:28.460197 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33503 : cluster [WRN] Health check update: 35/1833299 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:54:30.882084 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33507 : cluster [WRN] Health check update: 35/1833301 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:55:29.398258 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33510 : cluster [WRN] Health check update: 35/1833303 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:57:30.592710 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33513 : cluster [WRN] Health check update: 35/1833305 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 06:58:31.217289 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33516 : cluster [WRN] Health check update: 35/1833307 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:00:00.000187 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33518 : cluster [WRN] overall HEALTH_WARN 35/1833307 objects misplaced (0.002%)
    2019-02-06 07:00:34.739248 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33519 : cluster [WRN] Health check update: 35/1833309 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:02:35.913121 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33521 : cluster [WRN] Health check update: 35/1833311 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:04:36.900236 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33523 : cluster [WRN] Health check update: 35/1833313 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:05:37.378355 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33524 : cluster [WRN] Health check update: 35/1833315 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:07:37.194719 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33528 : cluster [WRN] Health check update: 35/1833317 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:09:39.764845 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33532 : cluster [WRN] Health check update: 35/1833319 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:10:40.361387 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33536 : cluster [WRN] Health check update: 35/1833321 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:12:39.741212 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33539 : cluster [WRN] Health check update: 35/1833323 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:14:43.336322 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33542 : cluster [WRN] Health check update: 35/1833325 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:16:40.317086 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33545 : cluster [WRN] Health check update: 35/1833327 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:18:43.048724 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33547 : cluster [WRN] Health check update: 35/1833329 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:20:42.209475 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33553 : cluster [WRN] Health check update: 35/1833331 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:22:43.397293 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33557 : cluster [WRN] Health check update: 35/1833333 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:24:44.582609 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33559 : cluster [WRN] Health check update: 35/1833335 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:26:49.866160 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33562 : cluster [WRN] Health check update: 35/1833337 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:28:47.078988 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33565 : cluster [WRN] Health check update: 35/1833339 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:29:07.156983 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33569 : cluster [WRN] Health check update: 35/1833341 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:29:13.226464 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33570 : cluster [WRN] Health check update: 35/1833345 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:29:18.888455 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33571 : cluster [WRN] Health check update: 35/1833353 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:30:50.749371 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33578 : cluster [WRN] Health check update: 35/1833355 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:31:50.794024 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33581 : cluster [WRN] Health check update: 35/1833357 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:34:52.500845 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33589 : cluster [WRN] Health check update: 35/1833359 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:36:50.361971 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33593 : cluster [WRN] Health check update: 35/1833361 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:38:52.828329 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33599 : cluster [WRN] Health check update: 35/1833363 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:39:17.085303 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33601 : cluster [WRN] Health check update: 35/1833365 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:40:54.865029 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33607 : cluster [WRN] Health check update: 35/1833367 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:42:57.630020 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33611 : cluster [WRN] Health check update: 35/1833369 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:45:54.971033 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33619 : cluster [WRN] Health check update: 35/1833371 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:47:56.060799 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33625 : cluster [WRN] Health check update: 35/1833373 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:49:59.289441 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33629 : cluster [WRN] Health check update: 35/1833375 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:52:59.049132 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33634 : cluster [WRN] Health check update: 35/1833377 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:55:00.243054 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33640 : cluster [WRN] Health check update: 35/1833379 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 07:57:01.377008 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33644 : cluster [WRN] Health check update: 35/1833381 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:00:00.000204 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33650 : cluster [WRN] overall HEALTH_WARN 35/1833381 objects misplaced (0.002%)
    2019-02-06 08:00:01.356317 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33651 : cluster [WRN] Health check update: 35/1833383 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:02:06.369235 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33655 : cluster [WRN] Health check update: 35/1833385 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:05:06.139465 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33658 : cluster [WRN] Health check update: 35/1833387 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:07:07.448718 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33661 : cluster [WRN] Health check update: 35/1833389 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:10:09.124160 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33667 : cluster [WRN] Health check update: 35/1833391 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:12:10.261279 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33668 : cluster [WRN] Health check update: 35/1833393 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:13:53.294134 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33671 : cluster [WRN] Health check update: 35/1833395 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:15:10.238631 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33672 : cluster [WRN] Health check update: 35/1833397 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:17:15.293252 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33675 : cluster [WRN] Health check update: 35/1833399 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:20:15.014979 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33679 : cluster [WRN] Health check update: 35/1833401 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:23:14.811227 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33684 : cluster [WRN] Health check update: 35/1833403 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:25:13.965800 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33687 : cluster [WRN] Health check update: 35/1833405 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:28:19.830326 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33691 : cluster [WRN] Health check update: 35/1833407 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:30:18.973142 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33692 : cluster [WRN] Health check update: 35/1833409 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:32:16.775147 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33695 : cluster [WRN] Health check update: 35/1833411 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:35:19.984680 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33698 : cluster [WRN] Health check update: 35/1833413 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:38:21.660892 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33700 : cluster [WRN] Health check update: 35/1833415 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:40:27.156919 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33701 : cluster [WRN] Health check update: 35/1833417 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:43:25.057301 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33704 : cluster [WRN] Health check update: 35/1833419 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:45:28.826842 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33706 : cluster [WRN] Health check update: 35/1833421 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:48:29.932346 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33709 : cluster [WRN] Health check update: 35/1833423 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:51:27.369428 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33713 : cluster [WRN] Health check update: 35/1833425 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:53:32.569131 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33715 : cluster [WRN] Health check update: 35/1833427 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:56:34.831187 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33719 : cluster [WRN] Health check update: 35/1833429 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 08:59:32.060853 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33723 : cluster [WRN] Health check update: 35/1833431 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:00:00.000214 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33724 : cluster [WRN] overall HEALTH_WARN 35/1833431 objects misplaced (0.002%)
    2019-02-06 09:01:40.054505 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33727 : cluster [WRN] Health check update: 35/1833433 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:02:36.209687 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33728 : cluster [WRN] Health check update: 35/1833435 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:04:37.404897 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33730 : cluster [WRN] Health check update: 35/1833437 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:06:40.211382 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33732 : cluster [WRN] Health check update: 35/1833439 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:07:34.761057 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33734 : cluster [WRN] Health check update: 35/1833441 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:10:38.480632 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33735 : cluster [WRN] Health check update: 35/1833443 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:13:38.228424 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33738 : cluster [WRN] Health check update: 35/1833445 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:16:42.029118 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33743 : cluster [WRN] Health check update: 35/1833447 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:19:41.784732 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33748 : cluster [WRN] Health check update: 35/1833449 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:22:46.536950 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33753 : cluster [WRN] Health check update: 35/1833451 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:25:45.325660 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33755 : cluster [WRN] Health check update: 35/1833453 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:28:45.121326 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33761 : cluster [WRN] Health check update: 35/1833455 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:31:50.877648 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33763 : cluster [WRN] Health check update: 35/1833457 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:34:48.704717 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33764 : cluster [WRN] Health check update: 35/1833459 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:37:50.317191 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33771 : cluster [WRN] Health check update: 35/1833461 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:40:52.065172 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33772 : cluster [WRN] Health check update: 35/1833463 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:41:32.420961 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33774 : cluster [WRN] Health check update: 35/1833465 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:44:57.178459 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33777 : cluster [WRN] Health check update: 35/1833467 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:47:54.228856 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33778 : cluster [WRN] Health check update: 35/1833469 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:48:50.805670 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33779 : cluster [WRN] Health check update: 35/1833471 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:50:56.082862 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33781 : cluster [WRN] Health check update: 35/1833473 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:53:59.897970 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33786 : cluster [WRN] Health check update: 35/1833475 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:55:02.449021 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33788 : cluster [WRN] Health check update: 35/1833477 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 09:57:03.668592 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33789 : cluster [WRN] Health check update: 35/1833479 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:00:00.000197 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33791 : cluster [WRN] overall HEALTH_WARN 35/1833479 objects misplaced (0.002%)
    2019-02-06 10:01:02.052907 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33792 : cluster [WRN] Health check update: 35/1833481 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:04:01.821176 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33796 : cluster [WRN] Health check update: 35/1833483 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:07:03.624856 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33798 : cluster [WRN] Health check update: 35/1833485 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:10:08.021735 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33802 : cluster [WRN] Health check update: 35/1833487 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:14:08.713815 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33804 : cluster [WRN] Health check update: 35/1833489 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:17:11.681387 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33806 : cluster [WRN] Health check update: 35/1833491 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:20:10.316956 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33808 : cluster [WRN] Health check update: 35/1833493 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:23:11.213585 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33811 : cluster [WRN] Health check update: 35/1833495 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:26:15.076549 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33813 : cluster [WRN] Health check update: 35/1833497 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:30:18.186679 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33850 : cluster [WRN] Health check update: 35/1833499 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:33:17.594437 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33852 : cluster [WRN] Health check update: 35/1833501 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:36:19.735120 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33857 : cluster [WRN] Health check update: 35/1833503 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:39:20.829395 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33861 : cluster [WRN] Health check update: 35/1833505 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:42:20.599428 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33864 : cluster [WRN] Health check update: 35/1833507 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:46:25.017395 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33869 : cluster [WRN] Health check update: 35/1833509 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:49:22.805194 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33873 : cluster [WRN] Health check update: 35/1833511 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:52:26.627440 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33878 : cluster [WRN] Health check update: 35/1833513 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:55:19.641528 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33889 : cluster [WRN] Health check update: 35/1833515 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:55:29.626747 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33891 : cluster [WRN] Health check update: 35/1833517 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:55:52.633074 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33895 : cluster [WRN] Health check update: 35/1833519 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:56:00.012227 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33896 : cluster [WRN] Health check update: 35/1833525 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 10:58:32.241286 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33899 : cluster [WRN] Health check update: 35/1833527 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 11:00:00.000208 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33900 : cluster [WRN] overall HEALTH_WARN 35/1833527 objects misplaced (0.002%)
    2019-02-06 11:01:30.036257 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33902 : cluster [WRN] Health check update: 35/1833529 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 11:04:36.020320 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33906 : cluster [WRN] Health check update: 35/1833531 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 11:07:36.016958 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33911 : cluster [WRN] Health check update: 35/1833533 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 11:09:31.552936 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33918 : cluster [WRN] Health check failed: noout flag(s) set (OSDMAP_FLAGS)
    2019-02-06 11:09:46.930486 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33922 : cluster [INF] osd.10 marked itself down
    2019-02-06 11:09:46.930590 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33923 : cluster [INF] osd.7 marked itself down
    2019-02-06 11:09:46.930742 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33924 : cluster [INF] osd.12 marked itself down
    2019-02-06 11:09:46.930826 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33925 : cluster [INF] osd.15 marked itself down
    2019-02-06 11:09:46.931027 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33926 : cluster [INF] osd.14 marked itself down
    2019-02-06 11:09:46.931110 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33927 : cluster [INF] osd.4 marked itself down
    2019-02-06 11:09:46.931245 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33928 : cluster [INF] osd.13 marked itself down
    2019-02-06 11:09:46.995264 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33929 : cluster [WRN] Health check failed: 7 osds down (OSD_DOWN)
    2019-02-06 11:09:46.995311 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33930 : cluster [WRN] Health check failed: 1 host (7 osds) down (OSD_HOST_DOWN)
    2019-02-06 11:09:47.473587 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33932 : cluster [WRN] Health check failed: 1 filesystem is degraded (FS_DEGRADED)
    2019-02-06 11:09:47.480729 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33934 : cluster [INF] Standby daemon mds.bluehub-prox01 assigned to filesystem cephfs as rank 0
    2019-02-06 11:09:49.601441 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33940 : cluster [WRN] Health check failed: Reduced data availability: 15 pgs peering (PG_AVAILABILITY)
    2019-02-06 11:09:49.601482 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33941 : cluster [WRN] Health check failed: Degraded data redundancy: 40846/1833533 objects degraded (2.228%), 110 pgs degraded (PG_DEGRADED)
    2019-02-06 11:09:50.061219 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33944 : cluster [INF] daemon mds.bluehub-prox01 is now active in filesystem cephfs as rank 0
    2019-02-06 11:09:51.035263 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33945 : cluster [INF] Health check cleared: FS_DEGRADED (was: 1 filesystem is degraded)
    2019-02-06 11:09:52.989286 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33948 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 15 pgs peering)
    2019-02-06 11:09:55.084944 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33949 : cluster [WRN] Health check update: Degraded data redundancy: 297679/1833533 objects degraded (16.235%), 746 pgs degraded (PG_DEGRADED)
    2019-02-06 11:10:50.312374 mon.bluehub-prox05 mon.2 10.9.9.5:6789/0 280937 : cluster [INF] mon.bluehub-prox05 calling monitor election
    2019-02-06 11:10:50.345058 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33962 : cluster [INF] mon.bluehub-prox02 calling monitor election
    2019-02-06 11:10:55.348050 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33963 : cluster [INF] mon.bluehub-prox02 is new leader, mons bluehub-prox02,bluehub-prox05 in quorum (ranks 0,2)
    2019-02-06 11:10:55.392042 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33968 : cluster [WRN] Health check failed: 1/3 mons down, quorum bluehub-prox02,bluehub-prox05 (MON_DOWN)
    2019-02-06 11:10:55.415578 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33970 : cluster [WRN] overall HEALTH_WARN noout flag(s) set; 7 osds down; 1 host (7 osds) down; 35/1833533 objects misplaced (0.002%); Degraded data redundancy: 297679/1833533 objects degraded (16.235%), 746 pgs degraded; 1/3 mons down, quorum bluehub-prox02,bluehub-prox05
    2019-02-06 11:10:56.387126 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33971 : cluster [WRN] Health check failed: Reduced data availability: 329 pgs inactive (PG_AVAILABILITY)
    2019-02-06 11:10:56.387177 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33972 : cluster [WRN] Health check update: Degraded data redundancy: 297679/1833533 objects degraded (16.235%), 746 pgs degraded, 760 pgs undersized (PG_DEGRADED)
    2019-02-06 11:14:49.845004 mon.bluehub-prox05 mon.2 10.9.9.5:6789/0 281004 : cluster [INF] mon.bluehub-prox05 calling monitor election
    2019-02-06 11:14:49.845775 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34023 : cluster [INF] mon.bluehub-prox02 calling monitor election
    2019-02-06 11:14:49.880371 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34024 : cluster [INF] mon.bluehub-prox02 is new leader, mons bluehub-prox02,bluehub-prox03,bluehub-prox05 in quorum (ranks 0,1,2)
    2019-02-06 11:14:49.893051 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34025 : cluster [WRN] mon.1 10.9.9.3:6789/0 clock skew 0.102049s > max 0.05s
    2019-02-06 11:14:49.901279 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34030 : cluster [WRN] Health check failed: clock skew detected on mon.bluehub-prox03 (MON_CLOCK_SKEW)
    2019-02-06 11:14:49.901311 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34031 : cluster [INF] Health check cleared: MON_DOWN (was: 1/3 mons down, quorum bluehub-prox02,bluehub-prox05)
    2019-02-06 11:14:49.902103 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34032 : cluster [WRN] message from mon.1 was stamped 0.112893s in the future, clocks not synchronized
    2019-02-06 11:14:49.924378 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34033 : cluster [WRN] overall HEALTH_WARN noout flag(s) set; 7 osds down; 1 host (7 osds) down; 35/1833533 objects misplaced (0.002%); Reduced data availability: 329 pgs inactive; Degraded data redundancy: 297679/1833533 objects degraded (16.235%), 746 pgs degraded, 760 pgs undersized; clock skew detected on mon.bluehub-prox03
    2019-02-06 11:14:49.958161 mon.bluehub-prox03 mon.1 10.9.9.3:6789/0 1 : cluster [INF] mon.bluehub-prox03 calling monitor election
    2019-02-06 11:15:10.023422 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34043 : cluster [WRN] Health check update: 5 osds down (OSD_DOWN)
    2019-02-06 11:15:10.023463 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34044 : cluster [INF] Health check cleared: OSD_HOST_DOWN (was: 1 host (7 osds) down)
    2019-02-06 11:15:10.071324 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34045 : cluster [INF] osd.14 10.9.9.3:6817/3334 boot
    2019-02-06 11:15:10.071391 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34046 : cluster [INF] osd.7 10.9.9.3:6805/2894 boot
    2019-02-06 11:15:11.069813 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34048 : cluster [WRN] Health check update: Reduced data availability: 329 pgs inactive, 37 pgs peering (PG_AVAILABILITY)
    2019-02-06 11:15:11.069881 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34049 : cluster [WRN] Health check update: Degraded data redundancy: 284005/1833533 objects degraded (15.489%), 711 pgs degraded, 723 pgs undersized (PG_DEGRADED)
    2019-02-06 11:15:14.081810 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34055 : cluster [INF] osd.12 10.9.9.3:6809/3013 boot
    2019-02-06 11:15:14.081880 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34056 : cluster [INF] osd.15 10.9.9.3:6813/3139 boot
    2019-02-06 11:15:15.085936 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34060 : cluster [WRN] Health check update: 2 osds down (OSD_DOWN)
    2019-02-06 11:15:15.101614 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34061 : cluster [INF] osd.4 10.9.9.3:6801/2754 boot
    2019-02-06 11:15:17.149725 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34064 : cluster [WRN] Health check update: Reduced data availability: 236 pgs inactive (PG_AVAILABILITY)
    2019-02-06 11:15:17.149782 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34065 : cluster [WRN] Health check update: Degraded data redundancy: 194512/1833533 objects degraded (10.609%), 453 pgs degraded, 456 pgs undersized (PG_DEGRADED)
    2019-02-06 11:15:18.206357 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34068 : cluster [INF] osd.10 10.9.9.3:6825/3925 boot
    2019-02-06 11:15:19.210586 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34072 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
    2019-02-06 11:15:19.237948 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34073 : cluster [INF] osd.13 10.9.9.3:6821/3524 boot
    2019-02-06 11:15:20.124716 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34075 : cluster [INF] Health check cleared: MON_CLOCK_SKEW (was: clock skew detected on mon.bluehub-prox03)
    2019-02-06 11:15:22.653532 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34077 : cluster [WRN] Health check update: Reduced data availability: 50 pgs inactive (PG_AVAILABILITY)
    2019-02-06 11:15:22.653573 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34078 : cluster [WRN] Health check update: Degraded data redundancy: 31162/1833533 objects degraded (1.700%), 66 pgs degraded, 68 pgs undersized (PG_DEGRADED)
    2019-02-06 11:15:24.730382 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34079 : cluster [WRN] Health check update: 35/1833535 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 11:15:24.730425 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34080 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 31162/1833533 objects degraded (1.700%), 66 pgs degraded, 68 pgs undersized)
    2019-02-06 11:15:26.754555 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34081 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 6 pgs inactive)
    2019-02-06 11:15:30.126285 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34082 : cluster [WRN] Health check update: 35/1833537 objects misplaced (0.002%) (OBJECT_MISPLACED)
    2019-02-06 11:16:25.276829 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 34084 : cluster [WRN] Health check update: 35/1833539 objects misplaced (0.002%) (OBJECT_MISPLACED)

    nothing important on osd logs..

    Code:
    root@xxxx-prox02:~# tail /var/log/ceph/ceph-osd.*.log -f
    ==> /var/log/ceph/ceph-osd.2.log <==
    2019-02-06 11:15:18.234277 7fdba49c9700  1 osd.2 pg_epoch: 5396 pg[1.1f9( v 5388'153738 (5387'152162,5388'153738] local-lis/les=5245/5246 n=882 ec=58/58 lis/c 5245/5245 les/c/f 5246/5246/3020 5396/5396/5396) [10,2] r=1 lpr=5396 pi=[5245,5396)/1 luod=0'0 crt=5388'153738 lcod 5388'153737 peered mbc={}] start_peering_interval up [2] -> [10,2], acting [2] -> [10,2], acting_primary 2 -> 10, up_primary 2 -> 10, role 0 -> 1, features acting 4611087853745930235 upacting 4611087853745930235
    2019-02-06 11:15:18.234366 7fdba49c9700  1 osd.2 pg_epoch: 5396 pg[1.1f9( v 5388'153738 (5387'152162,5388'153738] local-lis/les=5245/5246 n=882 ec=58/58 lis/c 5245/5245 les/c/f 5246/5246/3020 5396/5396/5396) [10,2] r=1 lpr=5396 pi=[5245,5396)/1 crt=5388'153738 lcod 5388'153737 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
    2019-02-06 11:15:18.234919 7fdba49c9700  1 osd.2 pg_epoch: 5396 pg[1.25f( v 5387'150896 (5387'149373,5387'150896] local-lis/les=5245/5246 n=914 ec=58/58 lis/c 5245/5245 les/c/f 5246/5246/3020 5396/5396/5396) [10,2] r=1 lpr=5396 pi=[5245,5396)/1 luod=0'0 crt=5387'150896 lcod 5387'150895 peered mbc={}] start_peering_interval up [2] -> [10,2], acting [2] -> [10,2], acting_primary 2 -> 10, up_primary 2 -> 10, role 0 -> 1, features acting 4611087853745930235 upacting 4611087853745930235
    2019-02-06 11:15:18.235101 7fdba49c9700  1 osd.2 pg_epoch: 5396 pg[1.25f( v 5387'150896 (5387'149373,5387'150896] local-lis/les=5245/5246 n=914 ec=58/58 lis/c 5245/5245 les/c/f 5246/5246/3020 5396/5396/5396) [10,2] r=1 lpr=5396 pi=[5245,5396)/1 crt=5387'150896 lcod 5387'150895 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
    2019-02-06 11:15:18.235379 7fdba51ca700  1 osd.2 pg_epoch: 5396 pg[5.62( v 5324'4299 (4450'2700,5324'4299] local-lis/les=5389/5390 n=13 ec=2493/2493 lis/c 5389/5258 les/c/f 5390/5259/3020 5396/5396/5396) [10,17,2] r=2 lpr=5396 pi=[5258,5396)/1 luod=0'0 crt=5324'4299 lcod 5324'4298 active mbc={}] start_peering_interval up [17,2] -> [10,17,2], acting [17,2] -> [10,17,2], acting_primary 17 -> 10, up_primary 17 -> 10, role 1 -> 2, features acting 4611087853745930235 upacting 4611087853745930235
    2019-02-06 11:15:18.235452 7fdba49c9700  1 osd.2 pg_epoch: 5396 pg[5.193( v 5344'4326 (4603'2800,5344'4326] local-lis/les=5389/5390 n=15 ec=4632/2493 lis/c 5389/5265 les/c/f 5390/5267/3020 5396/5396/5396) [10,19,2] r=2 lpr=5396 pi=[5265,5396)/1 luod=0'0 crt=5344'4326 lcod 5344'4325 active mbc={}] start_peering_interval up [19,2] -> [10,19,2], acting [19,2] -> [10,19,2], acting_primary 19 -> 10, up_primary 19 -> 10, role 1 -> 2, features acting 4611087853745930235 upacting 4611087853745930235
    2019-02-06 11:15:18.235459 7fdba51ca700  1 osd.2 pg_epoch: 5396 pg[5.62( v 5324'4299 (4450'2700,5324'4299] local-lis/les=5389/5390 n=13 ec=2493/2493 lis/c 5389/5258 les/c/f 5390/5259/3020 5396/5396/5396) [10,17,2] r=2 lpr=5396 pi=[5258,5396)/1 crt=5324'4299 lcod 5324'4298 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
    2019-02-06 11:15:18.235524 7fdba49c9700  1 osd.2 pg_epoch: 5396 pg[5.193( v 5344'4326 (4603'2800,5344'4326] local-lis/les=5389/5390 n=15 ec=4632/2493 lis/c 5389/5265 les/c/f 5390/5267/3020 5396/5396/5396) [10,19,2] r=2 lpr=5396 pi=[5265,5396)/1 crt=5344'4326 lcod 5344'4325 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
    2019-02-06 11:15:19.243898 7fdba51ca700  1 osd.2 pg_epoch: 5397 pg[5.1b6( v 5324'4202 (4427'2700,5324'4202] local-lis/les=5389/5390 n=14 ec=4632/2493 lis/c 5389/5265 les/c/f 5390/5266/3020 5397/5397/5265) [19,13,2] r=2 lpr=5397 pi=[5265,5397)/1 luod=0'0 crt=5324'4202 lcod 5324'4201 active mbc={}] start_peering_interval up [19,2] -> [19,13,2], acting [19,2] -> [19,13,2], acting_primary 19 -> 19, up_primary 19 -> 19, role 1 -> 2, features acting 4611087853745930235 upacting 4611087853745930235
    2019-02-06 11:15:19.244098 7fdba51ca700  1 osd.2 pg_epoch: 5397 pg[5.1b6( v 5324'4202 (4427'2700,5324'4202] local-lis/les=5389/5390 n=14 ec=4632/2493 lis/c 5389/5265 les/c/f 5390/5266/3020 5397/5397/5265) [19,13,2] r=2 lpr=5397 pi=[5265,5397)/1 crt=5324'4202 lcod 5324'4201 unknown NOTIFY mbc={}] state<Start>: transitioning to Stray
    
    ==> /var/log/ceph/ceph-osd.6.log <==
    
    ==> /var/log/ceph/ceph-osd.8.log <==
    
    
    but again the disk I/O was blocked
     
    #7 bizzarrone, Feb 6, 2019
    Last edited: Feb 6, 2019
  8. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,172
    Likes Received:
    191
    Clock skew was cleared. Was this your question?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  9. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    I masked systemd-timesyncd and installet ntpd.
    No skew detected, anyway, same issue..
    NO I/O, everything is blocked.
    no info from OSD logs.. I think I will rollback to proxmox version 4 or I will switch from ceph to another shared disk
     
  10. Jarek

    Jarek Member

    Joined:
    Dec 16, 2016
    Messages:
    61
    Likes Received:
    8
    I/O is blocked because of:
    Code:
    2019-02-06 11:10:56.387126 mon.bluehub-prox02 mon.0 10.9.9.2:6789/0 33971 : cluster [WRN] Health check failed: Reduced data availability: 329 pgs inactive (PG_AVAILABILITY)
    Please show your crush map.
     
  11. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    Good morning Jarek,
    thank you for your advice.
    Here it is:

    Code:
    # begin crush map
    tunable choose_local_tries 0
    tunable choose_local_fallback_tries 0
    tunable choose_total_tries 50
    tunable chooseleaf_descend_once 1
    tunable chooseleaf_vary_r 1
    tunable chooseleaf_stable 1
    tunable straw_calc_version 1
    tunable allowed_bucket_algs 54
    
    # devices
    device 0 osd.0 class hdd
    device 1 osd.1 class hdd
    device 2 osd.2 class hdd
    device 3 osd.3 class hdd
    device 4 osd.4 class hdd
    device 5 osd.5 class hdd
    device 7 osd.7 class hdd
    device 9 osd.9 class hdd
    device 10 osd.10 class hdd
    device 11 osd.11 class hdd
    device 12 osd.12 class hdd
    device 13 osd.13 class hdd
    device 14 osd.14 class hdd
    device 15 osd.15 class hdd
    device 16 osd.16 class hdd
    device 17 osd.17 class hdd
    device 18 osd.18 class hdd
    device 19 osd.19 class hdd
    device 20 osd.20 class hdd
    device 21 osd.21 class hdd
    device 22 osd.22 class hdd
    device 23 osd.23 class hdd
    device 24 osd.24 class hdd
    device 25 osd.25 class hdd
    
    # types
    type 0 osd
    type 1 host
    type 2 chassis
    type 3 rack
    type 4 row
    type 5 pdu
    type 6 pod
    type 7 room
    type 8 datacenter
    type 9 region
    type 10 root
    
    # buckets
    host bluehub-prox01 {
       id -3       # do not change unnecessarily
       id -4 class hdd       # do not change unnecessarily
       # weight 4.905
       alg straw2
       hash 0   # rjenkins1
       item osd.0 weight 0.817
       item osd.1 weight 0.817
       item osd.3 weight 0.817
       item osd.5 weight 0.817
       item osd.9 weight 0.817
       item osd.11 weight 0.817
    }
    host bluehub-prox02 {
       id -5       # do not change unnecessarily
       id -6 class hdd       # do not change unnecessarily
       # weight 0.455
       alg straw2
       hash 0   # rjenkins1
       item osd.2 weight 0.455
    }
    host bluehub-prox03 {
       id -7       # do not change unnecessarily
       id -8 class hdd       # do not change unnecessarily
       # weight 1.909
       alg straw2
       hash 0   # rjenkins1
       item osd.4 weight 0.273
       item osd.7 weight 0.273
       item osd.10 weight 0.273
       item osd.12 weight 0.273
       item osd.13 weight 0.273
       item osd.14 weight 0.273
       item osd.15 weight 0.273
    }
    host bluehub-prox05 {
       id -9       # do not change unnecessarily
       id -10 class hdd       # do not change unnecessarily
       # weight 10.915
       alg straw2
       hash 0   # rjenkins1
       item osd.16 weight 1.091
       item osd.17 weight 1.091
       item osd.18 weight 1.091
       item osd.19 weight 1.091
       item osd.20 weight 1.091
       item osd.21 weight 1.091
       item osd.22 weight 1.091
       item osd.23 weight 1.091
       item osd.24 weight 1.091
       item osd.25 weight 1.091
    }
    root default {
       id -1       # do not change unnecessarily
       id -2 class hdd       # do not change unnecessarily
       # weight 18.183
       alg straw2
       hash 0   # rjenkins1
       item bluehub-prox01 weight 4.905
       item bluehub-prox02 weight 0.455
       item bluehub-prox03 weight 1.909
       item bluehub-prox05 weight 10.915
    }
    
    # rules
    rule replicated_rule {
       id 0
       type replicated
       min_size 1
       max_size 10
       step take default
       step chooseleaf firstn 0 type host
       step emit
    }
    
    # end crush map
    
    
    
     
  12. Jarek

    Jarek Member

    Joined:
    Dec 16, 2016
    Messages:
    61
    Likes Received:
    8
    And pool size/min size?
     
  13. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    556
    Likes Received:
    58
    You realize that you have no more then 1.9 (TB, presumably) of data that can be written according to your rules... you have 4 nodes but they dont have any space for PGs. your disks are effectively wasted on much of your cluster.
     
  14. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    Thank you Alex,
    How could I fix the situation?
     

    Attached Files:

    #14 bizzarrone, Feb 8, 2019
    Last edited: Feb 8, 2019
  15. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    556
    Likes Received:
    58
    As a general rule, you want to have your OSD nodes as similar as possible, and you want them to end up having the same weight as each other. SInce I dont what drives/how many you have per node I cant give you more specific advice.

    Also, you really want to change your rules. having a minimum of 1 leaves you with potential to have pgs with no parity; this is dangerous and may result in data loss. also, there is no reason to have a maximum of 10 in a replicated pool, it should be 3.
     
  16. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    Good morning,
    I added a new powerfull node full of disks, removed the old one with few osd.
    Nothing changed.
    As I stop 1 OSD, the ceph pool freezes.

    The log:
    Code:
    
    2019-02-20 09:00:00.000189 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82642 : cluster [INF] overall HEALTH_OK
    2019-02-20 09:12:05.253331 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82682 : cluster [INF] osd.15 marked itself down
    2019-02-20 09:12:05.304373 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82683 : cluster [WRN] Health check failed: 1 osds down (OSD_DOWN)
    2019-02-20 09:12:08.535372 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82687 : cluster [WRN] Health check failed: Reduced data availability: 1 pg peering (PG_AVAILABILITY)
    2019-02-20 09:12:08.535408 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82688 : cluster [WRN] Health check failed: Degraded data redundancy: 9077/1587282 objects degraded (0.572%), 17 pgs degraded (PG_DEGRADED)
    2019-02-20 09:12:12.102308 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82690 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 1 pg peering)
    2019-02-20 09:12:14.525307 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82691 : cluster [WRN] Health check update: Degraded data redundancy: 22741/1587282 objects degraded (1.433%), 46 pgs degraded (PG_DEGRADED)
    2019-02-20 09:12:48.420280 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82693 : cluster [WRN] Health check update: Degraded data redundancy: 22741/1587284 objects degraded (1.433%), 46 pgs degraded (PG_DEGRADED)
    2019-02-20 09:13:06.848503 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82695 : cluster [WRN] Health check failed: Reduced data availability: 46 pgs inactive (PG_AVAILABILITY)
    2019-02-20 09:13:06.848553 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82696 : cluster [WRN] Health check update: Degraded data redundancy: 22741/1587284 objects degraded (1.433%), 46 pgs degraded, 44 pgs undersized (PG_DEGRADED)
    2019-02-20 09:13:14.565445 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82697 : cluster [WRN] Health check update: Degraded data redundancy: 22741/1587284 objects degraded (1.433%), 46 pgs degraded, 46 pgs undersized (PG_DEGRADED)
    2019-02-20 09:19:07.306127 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82731 : cluster [INF] Health check cleared: OSD_DOWN (was: 1 osds down)
    2019-02-20 09:19:07.394314 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82732 : cluster [INF] osd.15 10.9.9.3:6800/1487797 boot
    2019-02-20 09:19:09.445225 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82736 : cluster [WRN] Health check update: Reduced data availability: 46 pgs inactive, 5 pgs peering (PG_AVAILABILITY)
    2019-02-20 09:19:09.445269 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82737 : cluster [WRN] Health check update: Degraded data redundancy: 19958/1587284 objects degraded (1.257%), 41 pgs degraded, 41 pgs undersized (PG_DEGRADED)
    2019-02-20 09:19:13.692237 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82738 : cluster [INF] Health check cleared: PG_DEGRADED (was: Degraded data redundancy: 15420/1587284 objects degraded (0.971%), 32 pgs degraded, 32 pgs undersized)
    2019-02-20 09:19:14.657433 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82739 : cluster [WRN] Health check update: Reduced data availability: 3 pgs inactive, 3 pgs peering (PG_AVAILABILITY)
    2019-02-20 09:19:15.772723 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82740 : cluster [INF] Health check cleared: PG_AVAILABILITY (was: Reduced data availability: 3 pgs inactive, 3 pgs peering)
    2019-02-20 09:19:15.772785 mon.bluehub-prox01 mon.0 10.9.9.1:6789/0 82741 : cluster [INF] Cluster is now healthy
    
    
    I have 5 nodes cluster.

    crush map:

    Code:
    
    [global] auth client required = cephx auth cluster required = cephx auth service required = cephx cluster network = 10.9.9.0/24 fsid = 0e7f096e-de29-41c9-b862-1a9c2ec15978 keyring = /etc/pve/priv/$cluster.$name.keyring mon allow pool delete = true osd journal size = 5120 osd pool default min size = 2 osd pool default size = 3 public network = 10.9.9.0/24 [mds] keyring = /var/lib/ceph/mds/ceph-$id/keyring [osd] keyring = /var/lib/ceph/osd/ceph-$id/keyring [mon.bluehub-prox05] host = bluehub-prox05 mon addr = 10.9.9.5:6789 [mon.bluehub-prox04] host = bluehub-prox04 mon addr = 10.9.9.4:6789 [mon.bluehub-prox01] host = bluehub-prox01 mon addr = 10.9.9.1:6789
    
    ============================================================
    # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd device 3 osd.3 class hdd device 4 osd.4 class hdd device 5 osd.5 class hdd device 6 osd.6 class hdd device 7 osd.7 class hdd device 8 osd.8 class hdd device 9 osd.9 class hdd device 10 osd.10 class hdd device 11 osd.11 class hdd device 12 osd.12 class hdd device 13 osd.13 class hdd device 14 osd.14 class hdd device 15 osd.15 class hdd device 16 osd.16 class hdd device 17 osd.17 class hdd device 18 osd.18 class hdd device 19 osd.19 class hdd device 20 osd.20 class hdd device 21 osd.21 class hdd device 22 osd.22 class hdd device 23 osd.23 class hdd device 24 osd.24 class hdd device 25 osd.25 class hdd device 26 osd.26 class hdd device 27 osd.27 class hdd device 28 osd.28 class hdd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host bluehub-prox01 { id -3 # do not change unnecessarily id -4 class hdd # do not change unnecessarily # weight 4.905 alg straw2 hash 0 # rjenkins1 item osd.0 weight 0.817 item osd.1 weight 0.817 item osd.3 weight 0.817 item osd.5 weight 0.817 item osd.9 weight 0.817 item osd.11 weight 0.817 } host bluehub-prox02 { id -5 # do not change unnecessarily id -6 class hdd # do not change unnecessarily # weight 0.000 alg straw2 hash 0 # rjenkins1 } host bluehub-prox03 { id -7 # do not change unnecessarily id -8 class hdd # do not change unnecessarily # weight 1.909 alg straw2 hash 0 # rjenkins1 item osd.4 weight 0.273 item osd.7 weight 0.273 item osd.10 weight 0.273 item osd.12 weight 0.273 item osd.13 weight 0.273 item osd.14 weight 0.273 item osd.15 weight 0.273 } host bluehub-prox05 { id -9 # do not change unnecessarily id -10 class hdd # do not change unnecessarily # weight 10.915 alg straw2 hash 0 # rjenkins1 item osd.16 weight 1.091 item osd.17 weight 1.091 item osd.18 weight 1.091 item osd.19 weight 1.091 item osd.20 weight 1.091 item osd.21 weight 1.091 item osd.22 weight 1.091 item osd.23 weight 1.091 item osd.24 weight 1.091 item osd.25 weight 1.091 } host bluehub-prox04 { id -11 # do not change unnecessarily id -12 class hdd # do not change unnecessarily # weight 4.905 alg straw2 hash 0 # rjenkins1 item osd.2 weight 0.817 item osd.6 weight 0.817 item osd.8 weight 0.817 item osd.26 weight 0.817 item osd.27 weight 0.817 item osd.28 weight 0.817 } root default { id -1 # do not change unnecessarily id -2 class hdd # do not change unnecessarily # weight 22.634 alg straw2 hash 0 # rjenkins1 item bluehub-prox01 weight 4.905 item bluehub-prox02 weight 0.000 item bluehub-prox03 weight 1.909 item bluehub-prox05 weight 10.915 item bluehub-prox04 weight 4.905 } # rules rule replicated_rule { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map
    
    
    ============================================================
    
    pveceph pool ls
    
    Name                       size   min_size     pg_num     %-used                 used
    data                          2          2        512      29.27        2778343453069
    data2                         2          2        512       5.31         376847421567
    data3                         2          2        512       2.46         169212454177
    
    ============================================================
    Quorum information
    ------------------
    Date:             Wed Feb 20 09:24:58 2019
    Quorum provider:  corosync_votequorum
    Nodes:            5
    Node ID:          0x00000001
    Ring ID:          1/4656
    Quorate:          Yes
    
    Votequorum information
    ----------------------
    Expected votes:   5
    Highest expected: 5
    Total votes:      5
    Quorum:           3 
    Flags:            Quorate
    
    Membership information
    ----------------------
        Nodeid      Votes Name
    0x00000001          1 10.1.1.1 (local)
    0x00000002          1 10.1.1.2
    0x00000003          1 10.1.1.3
    0x00000005          1 10.1.1.4
    0x00000004          1 10.1.1.5
    
    ============================================================
    
    pveversion  -v
    proxmox-ve: 5.3-1 (running kernel: 4.15.18-10-pve)
    pve-manager: 5.3-9 (running version: 5.3-9/ba817b29)
    pve-kernel-4.15: 5.3-2
    pve-kernel-4.15.18-11-pve: 4.15.18-33
    pve-kernel-4.15.18-10-pve: 4.15.18-32
    pve-kernel-4.15.18-9-pve: 4.15.18-30
    ceph: 12.2.11-pve1
    corosync: 2.4.4-pve1
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.1-3
    libpve-apiclient-perl: 2.0-5
    libpve-common-perl: 5.0-46
    libpve-guest-common-perl: 2.0-20
    libpve-http-server-perl: 2.0-11
    libpve-storage-perl: 5.0-38
    libqb0: 1.0.3-1~bpo9
    lvm2: 2.02.168-pve6
    lxc-pve: 3.1.0-3
    lxcfs: 3.0.3-pve1
    novnc-pve: 1.0.0-2
    proxmox-widget-toolkit: 1.0-22
    pve-cluster: 5.0-33
    pve-container: 2.0-34
    pve-docs: 5.3-2
    pve-edk2-firmware: 1.20181023-1
    pve-firewall: 3.0-17
    pve-firmware: 2.0-6
    pve-ha-manager: 2.0-6
    pve-i18n: 1.0-9
    pve-libspice-server1: 0.14.1-2
    pve-qemu-kvm: 2.12.1-1
    pve-xtermjs: 3.10.1-1
    qemu-server: 5.0-46
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.12-pve1~bpo1
    
    
    
     
  17. Jarek

    Jarek Member

    Joined:
    Dec 16, 2016
    Messages:
    61
    Likes Received:
    8
    The problem is you have size=min_size, so any down osd will freeze the pool.
    Change size to 3 (this will cause mass data movement, so be advised).
     
    bizzarrone likes this.
  18. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    Thank you Jarek for your time and your reply. I really need to read a manual on Ceph.
    I changed the setting: In 2 hours I think the redundancy will be over, then I will try a new test.
    I was desperate, even I thought to rollback to version 4..
    Thanks again.
    Luca
     
  19. bizzarrone

    bizzarrone Member

    Joined:
    Nov 27, 2014
    Messages:
    42
    Likes Received:
    1
    I definitely solved the problem, really thank you Jarek!
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice