How to add a 2nd Corosync Link

Hi,

currently that's only possible through editing the configuration. https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_adding_redundant_links_to_an_existing_cluster

There's work on the way do allow adding, editing and removing links through the gui.

I just did that, it seemed to work. The 2nd Link is beeing display in the gui now, too. Also the new Configuration ID/Version.
journalctl -b -u corosync looked good.

However, to test it i did a "ifconfig vmbr0 0.0.0.0" where ring0 is on.
After a few moments it lost quorum and got fenced?

Should it not still be alive via ring1/link1?
Should i not see the other node ips like: 10.10.51.1, 10.10.51.2, 10.10.51.3?


root@cluster5-node02:~# pvecm status
Cluster information
-------------------
Name: cluster5
Config Version: 4
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Feb 17 16:06:26 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000003
Ring ID: 1.38
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 172.16.30.212
0x00000002 1 172.16.30.214
0x00000003 1 172.16.30.213 (local)
 
However, to test it i did a "ifconfig vmbr0 0.0.0.0" where ring0 is on.
After a few moments it lost quorum and got fenced?

ifup/down are not really good test for corosync stuff, even if it should work with corosync 3 - but actually that specific one seems like a bug only showing up with kernels newer than 5.0 we also found a few weeks ago and which should be gone with the next corosync/kronosnet upgrade.

You can check the link state with corosync-cfgtool -sb
 
normally it still should though, check the link state if all shows up?


ssh cluster5-node01 "corosync-cfgtool -sb"
Printing link status.
Local node ID 1
LINK ID 0
addr = 172.16.30.212
status = 333


ssh cluster5-node02 "corosync-cfgtool -sb"
Printing link status.
Local node ID 3
LINK ID 0
addr = 172.16.30.213
status = 333
LINK ID 1
addr = 10.10.51.2
status = 13n


ssh cluster5-node03 "corosync-cfgtool -sb"
Printing link status.
Local node ID 2
LINK ID 0
addr = 172.16.30.214
status = 333
LINK ID 1
addr = 10.10.51.3
status = 1n3


How shall i test it then properly?
 
And check what the log on the fenced (or another node) says regarding corosync - it should talk about both links after they got configured..


Feb 17 15:56:28 cluster5-node01 corosync[3077]: [KNET ] link: host: 3 link: 0 is down
Feb 17 15:56:28 cluster5-node01 corosync[3077]: [KNET ] host: host: 3 (passive) best link: 0 (pri: 1)
Feb 17 15:56:28 cluster5-node01 corosync[3077]: [KNET ] host: host: 3 has no active links
Feb 17 15:56:28 cluster5-node01 corosync[3077]: [TOTEM ] Token has not been received in 1237 ms
Feb 17 15:56:29 cluster5-node01 corosync[3077]: [TOTEM ] A processor failed, forming new configuration.
Feb 17 15:56:31 cluster5-node01 corosync[3077]: [TOTEM ] A new membership (1.34) was formed. Members left: 3
Feb 17 15:56:31 cluster5-node01 corosync[3077]: [TOTEM ] Failed to receive the leave message. failed: 3
Feb 17 15:56:31 cluster5-node01 corosync[3077]: [CPG ] downlist left_list: 1 received
Feb 17 15:56:31 cluster5-node01 corosync[3077]: [CPG ] downlist left_list: 1 received
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: members: 1/3054, 2/3133
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: starting data syncronisation
Feb 17 15:56:31 cluster5-node01 corosync[3077]: [QUORUM] Members[2]: 1 2
Feb 17 15:56:31 cluster5-node01 corosync[3077]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: cpg_send_message retried 1 times
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [status] notice: members: 1/3054, 2/3133
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [status] notice: starting data syncronisation
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: received sync request (epoch 1/3054/00000004)
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [status] notice: received sync request (epoch 1/3054/00000004)
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: received all states
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: leader is 1/3054
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: synced members: 1/3054, 2/3133
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: start sending inode updates
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: sent all (0) updates
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: all data is up to date
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [dcdb] notice: dfsm_deliver_queue: queue length 5
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [status] notice: received all states
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [status] notice: all data is up to date
Feb 17 15:56:31 cluster5-node01 pmxcfs[3054]: [status] notice: dfsm_deliver_queue: queue length 87


=> its only talking about link: 0, right?
 
Feb 17 15:56:28 cluster5-node01 corosync[3077]: [KNET ] host: host: 3 has no active links
=> its only talking about link: 0, right?

Yes, it seems kronosnet (the transport tech corosync uses) did not "saw" the new link yet..
But your current corosync output from the other posts says it does now, you could re-check in the current logs.
 
I did restart corosync on all nodes now. Now i am "only" left with link enabled:0 on nodeid 1:


root@cluster5-node01:~# /usr/sbin/corosync-cfgtool -s
Printing link status.
Local node ID 1
LINK ID 0
addr = 172.16.30.212
status:
nodeid 1: link enabled:1 link connected:1
nodeid 2: link enabled:1 link connected:1
nodeid 3: link enabled:1 link connected:1
LINK ID 1
addr = 10.10.51.1
status:
nodeid 1: link enabled:0 link connected:1
nodeid 2: link enabled:1 link connected:1
nodeid 3: link enabled:1 link connected:1


On node01 i increased the config version to trigger a reload. It seems to be aware of link1, but wont enable it?

Feb 17 16:53:59 cluster5-node01 corosync[192434]: [MAIN ] Completed service synchronization, ready to provide service.
Feb 17 16:53:59 cluster5-node01 corosync[192434]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Feb 17 16:53:59 cluster5-node01 corosync[192434]: [KNET ] pmtud: PMTUD link change for host: 2 link: 1 from 469 to 1397
Feb 17 16:53:59 cluster5-node01 corosync[192434]: [KNET ] pmtud: PMTUD link change for host: 3 link: 0 from 469 to 1397
Feb 17 16:53:59 cluster5-node01 corosync[192434]: [KNET ] pmtud: PMTUD link change for host: 3 link: 1 from 469 to 1397
Feb 17 16:53:59 cluster5-node01 corosync[192434]: [KNET ] pmtud: Global data MTU changed to: 1397
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [CFG ] Config reload requested by node 1
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configuring link 0
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configured link number 0: local addr: 172.16.30.212, port=5405
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configuring link 1
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configured link number 1: local addr: 10.10.51.1, port=5406

Any idea how to enable that link?
 
no need to, as this is just a display issue - a node will always only use one link as loopback to itself. the displayed output will improve with the next corosync version hopefully, to avoid this confusion.
 
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configuring link 0
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configured link number 0: local addr: 172.16.30.212, port=5405
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configuring link 1
Feb 17 16:59:51 cluster5-node01 corosync[192434]: [TOTEM ] Configured link number 1: local addr: 10.10.51.1, port=5406

they are already up and configured now :) I'd still avoid ifdown testing until you got the following versions running:
Code:
corosync: 3.0.3-pve1
libknet1: 1.14-pve1
 
okay, thank you.

i even added a 3rd ring now:

root@cluster5-node02:~# /usr/sbin/corosync-cfgtool -s
Printing link status.
Local node ID 3
LINK ID 0
addr = 172.16.30.213
status:
nodeid 1: link enabled:1 link connected:1
nodeid 2: link enabled:1 link connected:1
nodeid 3: link enabled:1 link connected:1
LINK ID 1
addr = 10.10.51.2
status:
nodeid 1: link enabled:1 link connected:1
nodeid 2: link enabled:1 link connected:1
nodeid 3: link enabled:0 link connected:1
LINK ID 2
addr = 10.10.50.2
status:
nodeid 1: link enabled:1 link connected:1
nodeid 2: link enabled:1 link connected:1
nodeid 3: link enabled:0 link connected:1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!