[SOLVED] bond_mode active-backup issue

RobFantini

Famous Member
May 24, 2012
2,009
102
133
Boston,Mass
Hello

we are currently using bonds set up like this:
Code:
auto bond0
iface bond0 inet manual
        slaves enp8s0f0 enp8s0f1
        bond_miimon 100
        bond_mode 802.3ad
        bond_xmit_hash_policy layer2+3

those work, except I read for cluster that type of bond should not be used:
https://pve.proxmox.com/pve-docs/pve-network-plain.html
If you intend to run your cluster network on the bonding interfaces, then you have to use active-passive mode on the bonding interfaces, other modes are unsupported.

When I change over to active-backup [ used pve > network and restarted node] , the network does not work. no ping out etc.
Code:
auto bond0
iface bond1 inet manual
        slaves enp8s0f0 enp8s0f1
        bond_miimon 100
        bond_mode active-backup

We are using a managed switch . the two wires connect to lagged ports. I've tried the 6 different hash types.

Does anyone have a clue on what to try next?
 
Hi,RobFantini,

the general recommendation (Docu has to be more precise) is to use a dual ring approach for corosync network and not bonds.
If you have to use a bond you should use active-backup but the two switches have to be connected.
LACP would also work but you need MLAG to cross connect the switches and also IGMP has to work correctly.
The downside of this approach is that the latency will/can rise and this can be a problem.
All other bond modes have the problem of packages out of order.
 
Hello Wolfgang,
thank you for the response. we had tried using bond before , and ran in to the latency issues. i thought it was a hardware issue.

please help me see if I got this right:

So the dual ring network would be used just for cluster communication. as part of adding a node to cluster : network hardware , wires and configuration would get done 1st.

storage and vm's would use other networks.


** could you give and example of a /etc/network/interfaces file dual ring set up?

PS:
there always more to learn ! every time i think i know more than 50% the other 50% starts to show up.

Have you seen a dual ring wire set up on a server rack rack ? will need to figure that out.
 
Here the configs
PHP:
auto lo
iface lo inet loopback

iface ens18 inet manual
# VM network

auto ens19
iface ens19 inet static
    address  10.10.19.60
    netmask  255.255.255.0
# ring0

auto ens20
iface ens20 inet static
    address  10.10.20.60
    netmask  255.255.255.0
#ring1

iface ens21 inet manual

iface ens22 inet manual

auto bond0
iface bond0 inet static
    address  10.10.10.60
    netmask  255.255.255.0
    bond-slaves ens21 ens22
    bond-miimon 100
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
# storage net

auto vmbr0
iface vmbr0 inet static
    address  192.168.18.60
    netmask  255.255.240.0
    gateway  192.168.16.1
    bridge-ports ens18
    bridge-stp off
    bridge-fd 0

PHP:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: pve0
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 10.10.20.60
    ring1_addr: 10.10.19.60
  }
  node {
    name: pve1
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 10.10.20.61
    ring1_addr: 10.10.19.61
  }
  node {
    name: pve2
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 10.10.20.62
    ring1_addr: 10.10.19.62 
  }
  node {
    name: pve3
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 10.10.20.63
    ring1_addr: 10.10.19.63 
 }

}
quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: Test
  config_version: 5
  interface {
    bindnetaddr: 10.10.20.60
    ringnumber: 0
  }
  interface {
    bindnetaddr: 10.10.19.60
    ringnumber: 1
  }
  rrp_mode: passive
  ip_version: ipv4
  secauth: on
  version: 2
  rrp_mode: passive
}
 
  • Like
Reactions: RokaKen
You would add a new node like this
Code:
pvecm add <cluster_member_ip>  --ring0_addr <local ip ring0> --ring1_addr <local ip ring1>
 
It will be a couple of weeks before we do the change over to dual ring network.

I plan on trying to convert our existing corosync.conf file to use the new ring addresses . The other option would be to rebuild the production cluster . we have 7 nodes. so i could move vm's , ceph osd's and mons to 4 of them. then try converting 3 nodes to use dual ring. our cpu and memory usage is very low so 3-4 nodes could easily run everything.

is that a path that you'd attempt to go?
 
I would change the corosync.conf, because it is faster but it depends on your setup but normally it can be done online.
The only thing is turning off HA if you use HA, before you change the corosync network.
And be sure that the watchdog is not activated.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!