Ceph in Mesh network, fault tolerance

Discussion in 'Proxmox VE: Networking and Firewall' started by Stefano Giunchi, Aug 7, 2019.

  1. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    46
    Likes Received:
    2
    I'm following the Full Mesh guide, method 2 (routing, not broadcast), and everything works.
    I want to add faul tolerance, to handle cable/nic port failures.

    At first, I thought to use bonding: I have 3 nodes, with 4 10Gb ports each. I connected each node with each other with 2 bonded cables. It works, but I have 10%-50% packets lost if one of the two connections fails. Also, I found a RedHat document that states that bonding without a switch is an unsupported method, highly dependant on NIC hardware.

    The other option I'm thinking of, is to use route to reach all nodes using the "other" node if the primary connection fails. I mean:

    NODEA: 10.10.2.10
    NODEB: 10.10.2.11
    NODEC: 10.10.2.12



    Code:
    root@NODEA:~# ip a
    [...]
    6: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether ac:1f:6b:ba:a4:7a brd ff:ff:ff:ff:ff:ff
        inet 10.10.2.10/24 brd 10.10.2.255 scope global eno1
           valid_lft forever preferred_lft forever
        inet6 fe80::ae1f:6bff:feba:a47a/64 scope link
           valid_lft forever preferred_lft forever
    [...]
    8: enp24s0f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
        link/ether ac:1f:6b:ba:a4:78 brd ff:ff:ff:ff:ff:ff
        inet 10.10.2.10/24 brd 10.10.2.255 scope global enp24s0f0
           valid_lft forever preferred_lft forever
        inet6 fe80::ae1f:6bff:feba:a478/64 scope link
           valid_lft forever preferred_lft forever
    [...]
    
    Code:
    root@NODEA:~# ip route add 10.10.2.11/32 nexthop  dev enp24s0f0  weight 10 nexthop dev eno1 weight 1
    root@NODEA:~# ip r
    [...]
    10.10.2.11
            nexthop dev enp24s0f0 weight 10
            nexthop dev eno1 weight 1
    10.10.2.12 dev eno1 scope link
    
    
    I enable ip forwarding in NODEC:
    Code:
    root@NODEC:~# echo 1 > /proc/sys/net/ipv4/ip_forward
    
    I ping NODEB from NODEA
    Code:
    root@NODEA:~# ping 10.10.2.11
    PING 10.10.2.11 (10.10.2.11) 56(84) bytes of data.
    64 bytes from 10.10.2.11: icmp_seq=1 ttl=64 time=0.156 ms
    
    I ping NODEB from NODEC
    Code:
    root@NODEC:~# ping 10.10.2.11
    PING 10.10.2.11 (10.10.2.11) 56(84) bytes of data.
    64 bytes from 10.10.2.11: icmp_seq=1 ttl=64 time=0.208 ms
    
    Bring down primary NODEA-NODEB connection
    Code:
    root@NODEA:~# ip link set enp24s0f0 down
    
    Now, I would like that NODEA reaches NODEB through NODEC, but it doesn't work:
    Code:
    root@NODEA:~# ping 10.10.2.11
    PING 10.10.2.11 (10.10.2.11) 56(84) bytes of data.
    ^C
    --- 10.10.2.11 ping statistics ---
    1 packets transmitted, 0 received, 100% packet loss, time 0ms
    
    All advice welcome.
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,550
    Likes Received:
    221
    Yes, but it depends on the bond mode too.

    You need to allow ip forwarding, otherwise the packets will just be dropped.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    46
    Likes Received:
    2
    Hi Alwin, thanks for your answer.

    I tried balance-rr, balance-alb, active-backup. But I don't want to insist on this, as the routing method seems more elegant to me, and I use only two NIC ports per server.

    I did already enabled it in NODEC (the "middle" one), I tried enabling it in NODEA (the "pinging" one) but it still doesn't work.

    I tried to capture ICMP traffic in NODEC (tcpdump -i eno2 -n icmp), but it doesn't receive nothing, it seems that pings don't pass in the lower weight interface of NODEA, once the higher weight interface is down.
     
  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,550
    Likes Received:
    221
    How does the complete routing table look like (ip route)?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    46
    Likes Received:
    2
    This is the routing table of NODEA:
    Code:
    root@NODEA:/etc/network# ip r
    default via 10.10.1.254 dev vmbr0 onlink
    10.10.1.0/24 dev vmbr0 proto kernel scope link src 10.10.1.10
    10.10.2.0/24 dev eno1 proto kernel scope link src 10.10.2.10
    10.10.2.0/24 dev enp24s0f0 proto kernel scope link src 10.10.2.10
    10.10.2.11
            nexthop dev enp24s0f0 weight 10
            nexthop dev eno1 weight 1
    10.10.2.12 dev eno1 scope link
    10.10.3.0/24 dev bond1 proto kernel scope link src 10.10.3.10 linkdown
    
    And this is the interfaces file:
    Code:
    auto lo
    iface lo inet loopback
    
    auto enp24s0f0
    iface enp24s0f0 inet static
            address  10.10.2.10
            netmask  24
    #       up ip route add 10.10.2.11/32 dev enp24s0f0
            up  ip route add 10.10.2.11/32 nexthop  dev enp24s0f0  weight 10 nexthop dev eno1 weight 1
            down ip route del 10.10.2.11
    #10GB Ceph Sync NodeB
    
    auto eno1
    iface eno1 inet static
            address  10.10.2.10
            netmask  24
            up ip route add 10.10.2.12/32 dev eno1
            down ip route del 10.10.2.12
    #10GB Ceph Sync NodeC
    
    iface enp101s0f0 inet manual
    #Backup
    
    iface enp101s0f1 inet manual
    #Backup
    
    iface enp101s0f2 inet manual
    #Public
    
    iface enp101s0f3 inet manual
    #Public
    
    auto bond1
    iface bond1 inet static
            address 10.10.3.10
            netmask 24
            bond-slaves enp101s0f0 enp101s0f1
            bond-miimon 100
            bond-mode active-backup
    #Backup / Migration / Corosync
    
    auto bond0
    iface bond0 inet manual
            bond-slaves enp101s0f2 enp101s0f3
            bond-miimon 100
            bond-mode active-backup
    #Public
    
    auto vmbr0
    iface vmbr0 inet static
            address  10.10.1.10
            netmask  255.255.255.0
            gateway  10.10.1.254
            bridge-ports bond0
            bridge-stp off
            bridge-fd 0
    
     
  6. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,550
    Likes Received:
    221
    I suppose these entries will have priority. Set the netmask to /32 for 10.10.2.10 and add the 'up ip route' to each interface specifically.

    For Example (from the top of my head):
    Code:
    auto enp24s0f0
    iface enp24s0f0 inet static
            address  10.10.2.10
            netmask  32       
            up ip route add 10.10.2.11/32 dev enp24s0f0
            up  ip route add 10.10.2.12/32 nexthop  dev enp24s0f0 weight 10
            down ip route del 10.10.2.11/32 dev enp24s0f0
            down ip route del 10.10.2.12/32 dev enp24s0f0
    #10GB Ceph Sync NodeB
    
    auto eno1
    iface eno1 inet static
            address  10.10.2.10
            netmask  32
            up ip route add 10.10.2.12/32 dev eno1
            up  ip route add 10.10.2.11/32 nexthop  dev eno1 weight 10
            down ip route del 10.10.2.11/32 dev eno1
            down ip route del 10.10.2.12/32 dev eno1
    #10GB Ceph Sync NodeC
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    46
    Likes Received:
    2
    It doesn't work:
    Code:
    root@NODEA:/etc/network# ifup enp24s0f0
    root@NODEA:/etc/network# ifup eno1
    RTNETLINK answers: File exists
    ifup: failed to bring up eno1
    
    It seems I can't add a second nexthop from the same ip source (even if with different dev).
     
  8. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    46
    Likes Received:
    2
    I'm almost there.

    Once all links are up, I can issue these commands in NODEA and NODEB:
    Code:
    root@NODEA:~#ip route add 10.10.2.11 nexthop dev enp24s0f0  weight 2 nexthop via 10.10.2.12
    root@NODEB:~#ip route add 10.10.2.10 nexthop dev enp24s0f0  weight 2 nexthop via 10.10.2.12
    
    And everything works.
    The problem now is to put it in the interfaces file.
    If I put the full command in the up of both interfaces, I get errors at boot when the first interface is up, because the "other" interface is still not up. Also, if one of the two interfaces is not working, this way the route never comes up:
    Code:
    auto enp24s0f0
    iface enp24s0f0 inet static
            address  10.10.2.11
            netmask  24
            up ip route add 10.10.2.10/32 nexthop dev enp24s0f0 weight 2 nexthop via 10.10.2.12
            up ip route add 10.10.2.12/32 nexthop dev eno1 weight 2 nexthop via 10.10.2.12
    #        down ip route del 10.10.2.10
    #10GB Ceph Sync NodeB
    
    auto eno1
    iface eno1 inet static
            address  10.10.2.11
            netmask  24
            up ip route add 10.10.2.10/32 nexthop dev enp24s0f0 weight 2 nexthop via 10.10.2.12
            up ip route add 10.10.2.12/32 nexthop dev eno1 weight 2 nexthop via 10.10.2.12
    #        down ip route del 10.10.2.12
    #10GB Ceph Sync NodeC
    
    If I try to split the command in two, which should be the best, the second one fails probably because the source ip address is the same:
    Code:
    root@NODEB:~# ip route add 10.10.2.10 nexthop dev enp24s0f0 weight 2
    root@NODEB:~# ip route add 10.10.2.10 nexthop via 10.10.2.12
    RTNETLINK answers: File exists
    
    In the end, I could create a script in if-up.d like this, if nothing else works:
    Code:
    if eno1 and enp24s0f0 are up:
            ip route add 10.10.2.10/32 nexthop dev enp24s0f0 weight 2 nexthop via 10.10.2.12
            ip route add 10.10.2.12/32 nexthop dev eno1 weight 2 nexthop via 10.10.2.12
    if only eno1 is up
            ip route add 10.10.2.10/32 via 10.10.2.12
            ip route add 10.10.2.12/32 nexthop dev eno1
    if only enp24s0f0 is up
            ip route add 10.10.2.12/32 via 10.10.2.10
            ip route add 10.10.2.10/32 nexthop dev enp24s0f0
    
    I don't like it very much, and I'm trying to find a solution which could be good enough to be used in your ceph mesh documentation.

    Thanks
    Stefano
     
  9. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    2,550
    Likes Received:
    221
    The netmask of the interfaces are still /24, that will create the default route for the whole network on both interfaces. That could have an impact.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  10. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    46
    Likes Received:
    2
    After creating a script in if-up.d and if-post-down.d, and banging my head on various exception, I settled to use balance-rr bonding, even if it doesn't work very well when bringing down an interface. I'll see how it works when fisically disconnecting a cable.
    This is my final configuration:
    Code:
    auto bond2
    iface bond2 inet static
            address  10.10.2.10
            netmask  24
            bond-slaves enp24s0f0 enp24s0f1
            bond-miimon 100
            bond-mode balance-rr
            up ip route add 10.10.2.11/32 dev bond2
            down ip route del 10.10.2.11
    #Ceph Sync NodeB
    auto bond3
    iface bond3 inet static
            address  10.10.2.10
            netmask  24
            bond-slaves eno1 eno2
            bond-miimon 100
            bond-mode balance-rr
            up ip route add 10.10.2.12/32 dev bond3
            down ip route del 10.10.2.12
    #Ceph Sync NodeC
    
    Thanks
    Stefano
     
  11. Stefano Giunchi

    Proxmox Subscriber

    Joined:
    Jan 17, 2016
    Messages:
    46
    Likes Received:
    2
    Just an update: bond with balance-rr works perfectly if I disconnect a cable.
    I had problems when keeping down an interface with if-down (I did lose 50% of pings) but if I disconnect the cable, all traffic is correctly routed to the still connected interface of the bond.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice