ceph out of quorum - ping is ok but monitor not

Discussion in 'Proxmox VE: Networking and Firewall' started by Harald Treis, Jun 13, 2018.

  1. Harald Treis

    Harald Treis New Member

    Joined:
    Jun 13, 2018
    Messages:
    5
    Likes Received:
    0
    Hi,
    I have 3 proxmox servers with redundant network interfaces. All servers are connected to 2 different switches, to be prepared if a switch (or just a link) fails. Bonding is configure on both side (server and switch) with LACP. (osd are not defined at the moment)

    If one link fails (e.g. I cut the connection to the switch), it takes a couple of seconds an the server is via ping available again. But the ceph-clusters does never return to quorum.

    Why is an operating system fail over (tested with ping) possible, but ceph never gets healthy anymore?

    My Configuration:
    ceph.conf
    Code:
    [global]
        auth client required = cephx
        auth cluster required = cephx
        auth service required = cephx
        cluster network = 192.168.17.0/24
        fsid = 5070e036-8f6c-4795-a34d-9035472a628d
        keyring = /etc/pve/priv/$cluster.$name.keyring
        mon allow pool delete = true
        osd journal size = 5120
        osd pool default min size = 2
        osd pool default size = 3
        public network = 192.168.17.0/24
    
    [osd]
        keyring = /var/lib/ceph/osd/ceph-$id/keyring
    
    [mon.ariel1]
        host = ariel1
        mon addr = 192.168.17.31:6789
    
    [mon.ariel4]
        host = ariel4
        mon addr = 192.168.17.34:6789
    
    [mon.ariel2]
        host = ariel2
        mon addr = 192.168.17.32:6789
    
    /etc/network/interfaces (of ariel1, all IPs of ariel2 ends with 32, of ariel4 it is 34)
    eth0, eth2 and eth4 are connected to switch-1
    eth1, eth3 and eth5 are connected to switch-2
    Code:
    auto lo
    iface lo inet loopback
    iface eth0 inet manual
    iface eth1 inet manual
    iface eth2 inet manual
    iface eth3 inet manual
    iface eth4 inet manual
    iface eth5 inet manual
    
    auto bond0
    iface bond0 inet manual
        slaves eth0 eth1
        bond_miimon 100
        bond_mode 802.3ad
            bond_xmit_hash_policy layer3+4
    #frontside
    
    auto bond1
    iface bond1 inet static
        address  192.168.16.31
        netmask  255.255.255.0
        slaves eth2 eth3
        bond_miimon 100
        bond_mode 802.3ad
            bond_xmit_hash_policy layer3+4
        pre-up (ifconfig eth2 mtu 8996 && ifconfig eth3 mtu 8996)
        mtu 8996
    #corosync
    
    auto bond2
    iface bond2 inet static
            address  192.168.17.31
            netmask  255.255.255.0
        slaves eth4 eth5
        bond_miimon 100
            bond_mode 802.3ad
            bond_xmit_hash_policy layer3+4
        pre-up (ifconfig eth4 mtu 8996 && ifconfig eth5 mtu 8996)
        mtu 8996
    #ceph
    
    auto vmbr0
    iface vmbr0 inet static
        address  192.168.19.31
        netmask  255.255.255.0
        gateway  192.168.19.1
        bridge_ports bond0
        bridge_stp off
        bridge_fd 0
    
    ping to all IPs in network 192.168.17. (31, 32, 34) from all servers are ok
    ceph status
    Code:
      cluster:
        id:     5070e036-8f6c-4795-a34d-9035472a628d
        health: HEALTH_OK
    
      services:
        mon: 3 daemons, quorum ariel1,ariel2,ariel4
        mgr: ariel2(active), standbys: ariel4
        osd: 0 osds: 0 up, 0 in
    
    Now I pull out eth4 from ariel4 - waiting a couple of seconds and ping is available withour any errors, again
    But ceph-cluster fails:
    Code:
    root@ariel1:~# ceph status
      cluster:
        id:     5070e036-8f6c-4795-a34d-9035472a628d
        health: HEALTH_WARN
                1/3 mons down, quorum ariel1,ariel2
    
      services:
        mon: 3 daemons, quorum ariel1,ariel2, out of quorum: ariel4
        mgr: ariel2(active), standbys: ariel4
        osd: 0 osds: 0 up, 0 in 
    
    Is any configuration missing or is this a bug?
    Please help.

    Kind regards,
    Harry
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,512
    Likes Received:
    131
    Are you switches configured with MLAG? Otherwise the LACP doesn't really work, better try a active-backup bond.
     
  3. dietmar

    dietmar Proxmox Staff Member
    Staff Member

    Joined:
    Apr 28, 2005
    Messages:
    16,123
    Likes Received:
    264
    You still have quorum, so what exactly is the problem?
     
  4. Harald Treis

    Harald Treis New Member

    Joined:
    Jun 13, 2018
    Messages:
    5
    Likes Received:
    0
    ceph status days that ariel4s monitor is down - but server is via ping available
    Code:
    mon: 3 daemons, quorum ariel1,ariel2, out of quorum: ariel4
    
    log from ariel1
    Code:
    2018-06-13 10:51:43.078991 mon.ariel1 mon.0 192.168.17.31:6789/0 151 : cluster [WRN] Health check failed: 1/3 mons down, quorum ariel1,ariel2 (MON_DOWN)
    2018-06-13 10:51:43.238338 mon.ariel1 mon.0 192.168.17.31:6789/0 152 : cluster [WRN] overall HEALTH_WARN 1/3 mons down, quorum ariel1,ariel2
    2018-06-13 11:00:00.000120 mon.ariel1 mon.0 192.168.17.31:6789/0 162 : cluster [WRN] overall HEALTH_WARN 1/3 mons down, quorum ariel1,ariel2
    2018-06-13 12:00:00.000116 mon.ariel1 mon.0 192.168.17.31:6789/0 186 : cluster [WRN] overall HEALTH_WARN 1/3 mons down, quorum ariel1,ariel2
    2018-06-13 13:00:00.000107 mon.ariel1 mon.0 192.168.17.31:6789/0 211 : cluster [WRN] overall HEALTH_WARN 1/3 mons down, quorum ariel1,ariel2
    
    monitor service on ariel4 is still running
     
  5. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,512
    Likes Received:
    131
    One out of three MONs has no quorum, but your cluster still does.

    Are you switches configured with MLAG? Otherwise the LACP doesn't really work, better try a active-backup bond.
     
  6. Harald Treis

    Harald Treis New Member

    Joined:
    Jun 13, 2018
    Messages:
    5
    Likes Received:
    0
    Thank you, Alwin.
    It looks like our Netgear XS728T do not have this feature..
     
  7. Harald Treis

    Harald Treis New Member

    Joined:
    Jun 13, 2018
    Messages:
    5
    Likes Received:
    0
    Even the choice "active-backup" does not work.

    I tested with a small vm an disables one port on the first netgear switch (as before):

    root@ariel4:/var/log/ceph# ceph status
    cluster:
    id: 5070e036-8f6c-4795-a34d-9035472a628d
    health: HEALTH_WARN
    1 osds down
    1 host (1 osds) down
    Degraded data redundancy: 13983/37132 objects degraded (37.658%), 96 pgs degraded
    1/3 mons down, quorum ariel2,ariel4
    services:
    mon: 3 daemons, quorum ariel2,ariel4, out of quorum: ariel1
    mgr: ariel4(active), standbys: ariel2, ariel1
    osd: 3 osds: 2 up, 3 in
    data:
    pools: 1 pools, 128 pgs
    objects: 18566 objects, 73458 MB
    usage: 143 GB used, 5443 GB / 5587 GB avail
    pgs: 75.000% pgs not active
    13983/37132 objects degraded (37.658%)
    96 undersized+degraded+peered
    32 active+clean

    Why is ceph not able to switch over to the backup port?

    Code:
    cat /etc/network/interfaces
    auto lo
    iface lo inet loopback
    iface eth0 inet manual
    iface eth1 inet manual
    iface eth2 inet manual
    iface eth3 inet manual
    iface eth4 inet manual
    iface eth5 inet manual
    auto bond0
    iface bond0 inet manual
       slaves eth0 eth1
       bond_miimon 100
       bond_mode active-backup
    #frontside
    
    auto bond1
    iface bond1 inet static
       address  192.168.16.34
       netmask  255.255.255.0
       slaves eth2 eth3
       bond_miimon 100
       bond_mode active-backup
       pre-up (ifconfig eth2 mtu 8996 && ifconfig eth3 mtu 8996)
       mtu 8996
    #corosync
    
    auto bond2
    iface bond2 inet static
       address  192.168.17.34
       netmask  255.255.255.0
       slaves eth4 eth5
       bond_miimon 100
       bond_mode active-backup
       pre-up (ifconfig eth4 mtu 8996 && ifconfig eth5 mtu 8996)
       mtu 8996
    #ceph
    
    auto vmbr0
    iface vmbr0 inet static
       address  192.168.19.34
       netmask  255.255.255.0
       gateway  192.168.19.1
       bridge_ports bond0
       bridge_stp off
       bridge_fd 0
    
    That's for help.
     
  8. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,512
    Likes Received:
    131
    Did you trunk the two switches together? If one node switch their primary interface on the bond, they still need to access the other working members. But they are connected on the other switch. With active-backup, the nic is only listening on one port of the bond.
     
  9. Harald Treis

    Harald Treis New Member

    Joined:
    Jun 13, 2018
    Messages:
    5
    Likes Received:
    0
    Hey Alwin,

    there is something I do not understand (testing bond2: net 192.168.17.0/24):

    I have 3 nodes: ariel1, ariel2, ariel4
    All nodes have the same interface configuration, only the last digit of the ip is different:
    eth0, eth2, eth4 are connected to switch-1; eth1, eth3, eth5 are connected to switch-2
    switches do not support mlag.
    all 6 links are up
    The ceph status says HEALTH_OK
    ping to all servers are ok

    When I cut the link, e.g. eth4 for ariel1, the operating system is able to reconnect,
    but ceph not. Why?

    ping ariel1 -> ariel2/4:
    Code:
    arp -n | grep 192.168.17
    192.168.17.32            ether   a0:36:9f:f7:8f:64   C                     bond2
    192.168.17.34            ether   a0:36:9f:27:ba:2c   C                     bond2
    
    PING 192.168.17.32 (192.168.17.32) 56(84) bytes of data.
    64 bytes from 192.168.17.32: icmp_seq=1 ttl=64 time=0.074 ms
    
    PING 192.168.17.34 (192.168.17.34) 56(84) bytes of data.
    64 bytes from 192.168.17.34: icmp_seq=1 ttl=64 time=0.127 ms
    
    ping ariel2 -> ariel1/4
    Code:
    arp -n | grep 192.168.17
    192.168.17.34            ether   a0:36:9f:27:ba:2c   C                     bond2
    192.168.17.31            ether   a0:36:9f:27:ba:18   C                     bond2
    
    PING 192.168.17.31 (192.168.17.31) 56(84) bytes of data.
    64 bytes from 192.168.17.31: icmp_seq=1 ttl=64 time=0.135 ms
    
    PING 192.168.17.34 (192.168.17.34) 56(84) bytes of data.
    64 bytes from 192.168.17.34: icmp_seq=1 ttl=64 time=0.116 ms
    
    ping ariel4 -> ariel1/2
    Code:
    arp -n | grep 192.168.17
    192.168.17.31            ether   a0:36:9f:27:ba:18   C                     bond2
    192.168.17.32            ether   a0:36:9f:f7:8f:64   C                     bond2
    
    PING 192.168.17.31 (192.168.17.31) 56(84) bytes of data.
    64 bytes from 192.168.17.31: icmp_seq=1 ttl=64 time=0.082 ms
    
    PING 192.168.17.32 (192.168.17.32) 56(84) bytes of data.
    64 bytes from 192.168.17.32: icmp_seq=1 ttl=64 time=0.133 ms
    
    ceph status
    Code:
      cluster:
        id:     5070e036-8f6c-4795-a34d-9035472a628d
        health: HEALTH_OK
      services:
        mon: 3 daemons, quorum ariel1,ariel2,ariel4
        mgr: ariel4(active), standbys: ariel2, ariel1
        osd: 3 osds: 3 up, 3 in
      data:
        pools:   1 pools, 128 pgs
        objects: 18537 objects, 73211 MB
        usage:   142 GB used, 5444 GB / 5587 GB avail
        pgs:     128 active+clean
    
    cat /etc/network/interfaces
    Code:
    auto lo
    iface lo inet loopback
    iface eth0 inet manual
    iface eth1 inet manual
    iface eth2 inet manual
    iface eth3 inet manual
    iface eth4 inet manual
    iface eth5 inet manual
    
    auto bond0
    iface bond0 inet manual
       slaves eth0 eth1
       bond_miimon 100
       bond_mode active-backup
    #frontside
    
    auto bond1
    iface bond1 inet static
       address  192.168.16.31
       netmask  255.255.255.0
       slaves eth2 eth3
       bond_miimon 100
       bond_mode active-backup
       pre-up (ifconfig eth2 mtu 8996 && ifconfig eth3 mtu 8996)
       mtu 8996
    #corosync
    
    auto bond2
    iface bond2 inet static
       address  192.168.17.31
       netmask  255.255.255.0
       slaves eth4 eth5
       bond_miimon 100
       bond_mode active-backup
       pre-up (ifconfig eth4 mtu 8996 && ifconfig eth5 mtu 8996)
       mtu 8996
    #ceph
    
    auto vmbr0
    iface vmbr0 inet static
       address  192.168.19.31
       netmask  255.255.255.0
       gateway  192.168.19.1
       bridge_ports bond0
       bridge_stp off
       bridge_fd 0
    
    Now I pull of the cable in switch-1 for ariel1 (eth4),
    no more listening is possible on this interface:


    ping ariel1 -> ariel2/4:
    Code:
    arp -n | grep 192.168.17
    192.168.17.32            ether   a0:36:9f:f7:8f:64   C                     bond2
    192.168.17.34            ether   a0:36:9f:27:ba:2c   C                     bond2
    
    PING 192.168.17.32 (192.168.17.32) 56(84) bytes of data.
    64 bytes from 192.168.17.32: icmp_seq=1 ttl=64 time=0.147 ms
    
    PING 192.168.17.34 (192.168.17.34) 56(84) bytes of data.
    64 bytes from 192.168.17.34: icmp_seq=1 ttl=64 time=0.143 ms
    
    ping ariel2 -> ariel1/4
    Code:
    arp -n | grep 192.168.17
    192.168.17.34            ether   a0:36:9f:27:ba:2c   C                     bond2
    192.168.17.31            ether   a0:36:9f:27:ba:18   C                     bond2
    
    PING 192.168.17.31 (192.168.17.31) 56(84) bytes of data.
    64 bytes from 192.168.17.31: icmp_seq=1 ttl=64 time=0.135 ms
    
    PING 192.168.17.34 (192.168.17.34) 56(84) bytes of data.
    64 bytes from 192.168.17.34: icmp_seq=1 ttl=64 time=0.098 ms
    
    
    ping ariel4 -> ariel1/2
    Code:
    arp -n | grep 192.168.17
    192.168.17.31            ether   a0:36:9f:27:ba:18   C                     bond2
    192.168.17.32            ether   a0:36:9f:f7:8f:64   C                     bond2
    
    PING 192.168.17.31 (192.168.17.31) 56(84) bytes of data.
    64 bytes from 192.168.17.31: icmp_seq=1 ttl=64 time=0.104 ms
    
    PING 192.168.17.32 (192.168.17.32) 56(84) bytes of data.
    64 bytes from 192.168.17.32: icmp_seq=1 ttl=64 time=0.106 ms
    
    ceph status
    Code:
      cluster:
        id:     5070e036-8f6c-4795-a34d-9035472a628d
        health: HEALTH_WARN
                1 osds down
                1 host (1 osds) down
                Reduced data availability: 96 pgs inactive
                Degraded data redundancy: 13967/37074 objects degraded (37.673%), 96 pgs degraded, 96 pgs undersized
                1/3 mons down, quorum ariel2,ariel4
      services:
        mon: 3 daemons, quorum ariel2,ariel4, out of quorum: ariel1
        mgr: ariel4(active), standbys: ariel2, ariel1
        osd: 3 osds: 2 up, 3 in
      data:
        pools:   1 pools, 128 pgs
        objects: 18537 objects, 73211 MB
        usage:   142 GB used, 5444 GB / 5587 GB avail
        pgs:     75.000% pgs not active
                 13967/37074 objects degraded (37.673%)
                 96 undersized+degraded+peered
                 32 active+clean
    
    After a while proxmox is reducing th available space from 5587 GB to 3724 GB and does a recovery
    one monitor and one osd is missing...

    ceph status
    Code:
      cluster:
        id:     5070e036-8f6c-4795-a34d-9035472a628d
        health: HEALTH_WARN
                1/3 mons down, quorum ariel2,ariel4
      services:
        mon: 3 daemons, quorum ariel2,ariel4, out of quorum: ariel1
        mgr: ariel4(active), standbys: ariel2, ariel1
        osd: 3 osds: 2 up, 2 in
      data:
        pools:   1 pools, 128 pgs
        objects: 18541 objects, 73226 MB
        usage:   141 GB used, 3583 GB / 3724 GB avail
        pgs:     128 active+clean
    
    When reconecting eth4, ariel1 get quorum and ceph status is HEALTH_OK, with original space
    Code:
      cluster:
        id:     5070e036-8f6c-4795-a34d-9035472a628d
        health: HEALTH_OK
      services:
        mon: 3 daemons, quorum ariel1,ariel2,ariel4
        mgr: ariel4(active), standbys: ariel2, ariel1
        osd: 3 osds: 3 up, 3 in
      data:
        pools:   1 pools, 128 pgs
        objects: 18541 objects, 73226 MB
        usage:   142 GB used, 5444 GB / 5587 GB avail
        pgs:     128 active+clean
    
     
  10. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,512
    Likes Received:
    131
    Check your MTU size on all interfaces and the switches.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice