[SOLVED] Proxmox Ceph Redundant Network Setup help required

Discussion in 'Proxmox VE: Networking and Firewall' started by Mario Minati, Jun 12, 2018.

  1. Mario Minati

    Mario Minati New Member

    Joined:
    Jun 11, 2018
    Messages:
    7
    Likes Received:
    0
    Hello @all,

    we are new to Proxmox. Currently we are using Univention Corporate Server to virtualize 15 machines with 3 physical servers. We are lacking a shared storage and HA. Therefore we would like to setup a proxmox cluster with 5 physical machines, 3 identically configured machines for ceph and 2 machines for virtualisation.

    We read a lot of posts in the forum, the wiki, the docs and thing we have enough background to start setting things up. We would like to ask if out network setup is suitable for the goals of high availabilty and redundance.

    Attached you find a diagram of our network setup:
    - 2 seperate 1 GBE networks for coro sync ring 0 and ring 1 with seperate switches from which we use 1 network for management (external access to proxmox web interfaces and lights out management)
    - 2 seperate 10 GBE networks as ceph public networks with seperate switches and usage bonding
    - 2 seperate 10 GBE network as ceph cluster network with seperate switches and usage of bonding
    - 1 seperate 1 GBE network to access the virtual machines from the outside (DMZ / Intranet)

    Questions:
    - Is this suitable for redundancy?
    - Is this suitable for good performance?
    - Is the selected bond_mode (balance_rr) ok for use in a configration with seperate switches to acchieve also a good performance?

    Thanks for your suggestions!

    Best greets,

    Mario Minati
     

    Attached Files:

  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,647
    Likes Received:
    141
    +1 or :thumbsup: ; Depending on the needs of your backup (backup/iso/templates), you may need more than 1 GbE.

    Looks good on the redundancy level, but check the latency, Ceph is very sensitive to latency and as lower the better.

    If 1 GbE is sufficient, enough for peak traffic? This isn't redundant, is it?

    First, +1 for the nice network diagram. For redundancy, my comments above. The balance_rr mode will send TCP packets out of order as traffic increases, this will trigger a retransmit and stall your ceph network. Better use a active+backup or LACP. For connecting two switches you may need MLAG. I guess a "easier" setup might be to use a 2x10 GbE bond on both switches and separat ceph's public and cluster network through VLAN and a active+backup setup that utilizes each bond individually, up to a switch failure. On failure, both networks would be put on one bond. This still keeps redundancy, but you don't need MLAG or any other method to have inter-switch LACP.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. Mario Minati

    Mario Minati New Member

    Joined:
    Jun 11, 2018
    Messages:
    7
    Likes Received:
    0
    Hello Alwin,

    thanks for your advices. We improved our network setup according to your suggestions (new network diagram is attached) :

    - The management net ist now seperated from corosync ring 0 net, we use additional 10GBE network ports, so we can change the exisiting 1 GBE network switch if we get bandwidth problems on that network

    - To add redundancy and improve peak bandwidth to the outer world (1 GBE dmz network) we use an additional 10 GBE network port in bonding configuration with an 1 GBE port. Is the bond_mode balance_rr suitable for that connection or should it also be an active-backup bond?

    - The bond mode for ceph private and public networks are changed to active-backup, but we still would like to use seperated switches, which should provide use with the desired redundancy, right? I personally dont't like using VLAN that much, as it offers one more step of complexity where we can make mistakes.

    After setup of ceph private and public network we will check latency with the test commands given in the docs... We expect low latency.

    If you would like our kind of network diagram for documentation we can provide you with the LibreOffice Draw file. :)


    Best greets,

    Mario Minati
     

    Attached Files:

  4. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,647
    Likes Received:
    141
    Good, corosync separation will save the bacon. ;)

    As written on the post above, as the packets might be out-of-order, the network card has the extra job to put all packets in sequence again. This may or may not work, depending on your application. If you have a 10 GbE connection then a active-backup with primary 10 GbE would not only give you more bandwidth but also lower latency.

    I guess, my description was a little bit confusing. In the case you can afford extra interfaces on both machines, then you don't need my idea. But if you want redundancy with no extra hardware, then the idea is as follows.

    | eth0.100 & eth1.100 => bond0 => primary (eth0) on switch1 (cluster)
    | eth0.101 & eth1.101 => bond1 => primary (eth1) on switch2 (public)

    I hope it illustrates what I meant. On failure both, public & cluster reside on the same member of the bond. At normal operation both run separated.

    You may compare your results with our benchmarks and comparisons from other users in the thread.
    https://forum.proxmox.com/threads/proxmox-ve-ceph-benchmark-2018-02.41761/

    Thanks for the offer, but I must decline. Definitely a good reference for the discussion though.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  5. Mario Minati

    Mario Minati New Member

    Joined:
    Jun 11, 2018
    Messages:
    7
    Likes Received:
    0
    Hello @Alwin,
    halle @all,

    we have setup the hardware according to our plans (set attached network scheme) and have trouble on testing the bonded interfaces.
    If running a ping from one Proxmox node to an other over a bonded interface and pulling the cable out oh one of the two NICs, than die ping doesn't recover.
    Even if we reattach the cable, the ping doesn't resume.
    After reattaching the cable to NIC1 we have to disconnect NIC2 that the ping command receives answers.

    We discover the same behaviour if watching the quorum state of the ceph network over the Proxmox webinterface and pulling on of the two bonded nics of the ceph internal network.

    What can be wrong with our configuration? Here you find the network configuration of the three nodes.

    #
    # pub-ceph-node-01
    #

    auto lo
    iface lo inet loopback

    auto enp66s0f1
    iface enp66s0f1 inet static
    address 10.247.11.11
    netmask 255.255.0.0
    gateway 10.247.1.1
    #man.pub.intranet

    auto eno1
    iface eno1 inet static
    address 10.246.11.11
    netmask 255.255.255.0
    #sync1.pub.intranet

    auto eno2
    iface eno2 inet static
    address 10.246.12.11
    netmask 255.255.255.0
    #sync2.pub.intranet

    iface enp8s0f0 inet manual
    #san1.pub.intranet

    iface enp8s0f1 inet manual
    #ceph1.pub.intranet

    iface enp65s0f0 inet manual
    #san2.pub.intranet

    iface enp65s0f1 inet manual
    #ceph2.pub.intranet

    iface enp66s0f0 inet manual

    auto bond0
    iface bond0 inet static
    address 10.246.21.11
    netmask 255.255.255.0
    slaves enp65s0f0 enp8s0f0
    bond_miimon 100
    bond_mode active-backup
    #san.pub.intranet

    auto bond1
    iface bond1 inet static
    address 10.246.31.11
    netmask 255.255.255.0
    slaves enp65s0f1 enp8s0f1
    bond_miimon 100
    bond_mode active-backup
    #ceph.pub.intranet



    #
    # pub-ceph-node-02
    #

    auto lo
    iface lo inet loopback

    auto enp66s0f1
    iface enp66s0f1 inet static
    address 10.247.11.12
    netmask 255.255.0.0
    gateway 10.247.1.1
    #man.pub.intranet

    auto eno1
    iface eno1 inet static
    address 10.246.11.12
    netmask 255.255.255.0
    #sync1.pub.intranet

    auto eno2
    iface eno2 inet static
    address 10.246.12.12
    netmask 255.255.255.0
    #sync2.pub.intranet

    iface enp65s0f0 inet manual
    #san2.pub.intranet

    iface enp65s0f1 inet manual
    #ceph2.pub.intranet

    iface enp66s0f0 inet manual

    iface enp8s0f0 inet manual
    #san1.pub.intranet

    iface enp8s0f1 inet manual
    #ceph1.pub.intranet

    auto bond0
    iface bond0 inet static
    address 10.246.21.12
    netmask 255.255.255.0
    slaves enp65s0f0 enp8s0f0
    bond_miimon 100
    bond_mode active-backup
    #san.pub.intranet

    auto bond1
    iface bond1 inet static
    address 10.246.31.12
    netmask 255.255.255.0
    slaves enp65s0f1 enp8s0f1
    bond_miimon 100
    bond_mode active-backup
    #ceph.pub.intranet



    #
    # pub-ceph-node-03
    #

    auto lo
    iface lo inet loopback

    auto enp66s0f1
    iface enp66s0f1 inet static
    address 10.247.11.13
    netmask 255.255.0.0
    gateway 10.247.1.1
    #man.pub.intranet

    auto eno1
    iface eno1 inet static
    address 10.246.11.13
    netmask 255.255.255.0
    #sync1.pub.intranet

    auto eno2
    iface eno2 inet static
    address 10.246.12.13
    netmask 255.255.255.0
    #sync2.pub.intranet

    iface enp8s0f0 inet manual
    #san1.pub.intranet

    iface enp8s0f1 inet manual
    #ceph1.pub.intranet

    iface enp65s0f0 inet manual
    #san2.pub.intranet

    iface enp65s0f1 inet manual
    #ceph2.pub.intranet

    iface enp66s0f0 inet manual

    auto bond0
    iface bond0 inet static
    address 10.246.21.13
    netmask 255.255.255.0
    slaves enp65s0f0 enp8s0f0
    bond_miimon 100
    bond_mode active-backup
    #san.pub.intranet

    auto bond1
    iface bond1 inet static
    address 10.246.31.13
    netmask 255.255.255.0
    slaves enp65s0f1 enp8s0f1
    bond_miimon 100
    bond_mode active-backup
    #ceph.pub.intranet


    Any tips into which direction we should investigate is very welcome.

    Best greets,
    Mario
     
  6. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,647
    Likes Received:
    141
    You can see the state of the bond at '/proc/net/bonding/bondX'. Secondly you may want to set the primary slave interface, so the bond will switch back, when the interface comes back online.
    https://wiki.linuxfoundation.org/networking/bonding

    Last but not least, check your networking with each interface, to have all the scenarios covered. Routing issues?
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. Mario Minati

    Mario Minati New Member

    Joined:
    Jun 11, 2018
    Messages:
    7
    Likes Received:
    0
    Hello Alwin,

    thanks for the quick reply.

    I know about them and can set the primary interface manually in /etc/network/interfaces, it cann't be set via the GUI, right?

    But I still have trouble understanding active-backup bonding after reading the bonding docs more than once:

    If I pull e. g. node01 ceph cluster oder ceph public network connecting of the switch, the ping command stops, it doesn't recover even after waiting a while. My understanding of active-backup bonding mode is that the connection will be recovered automatically, isn't it?

    As we are using separate switches for both networks (pyhsical separation) do all nodes on the network have to switch to the other network? Otherwise the disconnected (on one of both bonding ports) node will not receive the packets on the net from which he was separated, right?

    Is maybe the configuration with active-backup bonding not correct in our case? Do we have to switch to broadcast?

    Sorry many questions as my knowledge on bonding networking is not very deep... I hope you can enlighten me and guide us to a really redundant setup


    Best greets,

    Mario Minati
     
  8. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,647
    Likes Received:
    141
    Not through the GUI.

    In active-backup, the link that fails will be replaced by one of the other slaves in the bond. But this happens only on the bond with the failed link. All other network traffic stays untouched. This means your network needs to be able to route traffic to each interface on the other nodes. With two switches, those two need to be connect through a trunk.

    https://www.kernel.org/doc/Documentation/networking/bonding.txt

    You don't want to flood your network.

    Scenarios with active-backup bond:
    • One switch dies, all connected bonds switch to the remaining connected slave (traffic happens completely on the other switch)
    • A link of a NIC port in the bond fails, the bond switches to its remaining connected slave (traffic is running through both switches)
    • NIC dies, if the bond has two different NICs it can switch otherwise the node is dark.
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  9. Mario Minati

    Mario Minati New Member

    Joined:
    Jun 11, 2018
    Messages:
    7
    Likes Received:
    0
    Hello Alwin,

    the trunk cable between the two switches was the missing link. After connecting the switches the redundancy worked fine.

    Thank you very much for your help!


    Best greets,

    Mario Minati
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice