Proxmox 6 Cluster in WAN and low latency

Mike Tkatchouk · Dec 23, 2019

Hey. Earlier, I used version 5 and a cluster of 4 nodes, one of which was on another network with delays of 30-80ms. To work, I configured corosync to work with udpg and everything worked fine. When upgrading to version 6, I recommended updating corosync and removing the udpu settings as recommended.

Now there is a periodic collapse of the cluster, the absence of a quorum and a bunch of errors in the logs. Perhaps this is due to network degradation.

Did you manage to launch a cluster in a distributed WAN network with unstable communication quality or is this idea now utopian?

fabian · Dec 23, 2019

if you have a single link, you can try setting (see 'man corosync.conf', edit the config in /etc/pve/corosync.conf, don't forget to bump the config_version)

knet_ping_timeout to 5000
knet_pong_count 1
knet_ping_interval 200ms

the default calculated values are not very good for single-link clusters with unreliable networks.

fabian · Dec 23, 2019

that being said, 30-80ms is a lot!

Mike Tkatchouk · Dec 24, 2019

Hi. I change /etc/corosync/corosync.conf and restart deamon on evry node

totem {
cluster_name: pve
config_version: 28
interface {
ringnumber: 0
knet_ping_timeout: 5000
knet_pong_count: 1
knet_ping_interval: 200
}

And receive this message:

root@pve-01:~# tail -f /var/log/daemon.log
Dec 24 11:22:41 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 10
Dec 24 11:22:41 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 10
Dec 24 11:22:42 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 20
Dec 24 11:22:42 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 20
Dec 24 11:22:43 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 30
Dec 24 11:22:43 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 30
Dec 24 11:22:44 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 40
Dec 24 11:22:44 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 40
Dec 24 11:22:45 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 50
Dec 24 11:22:45 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 50

root@pve-03:~$ tail -f /var/log/daemon.log
Dec 24 11:22:18 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 460
Dec 24 11:22:19 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 470
Dec 24 11:22:20 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 480
Dec 24 11:22:21 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 490
Dec 24 11:22:22 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 500
Dec 24 11:22:23 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 510
Dec 24 11:22:24 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 520
Dec 24 11:22:25 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 530
Dec 24 11:22:26 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 540
Dec 24 11:22:27 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 550
Dec 24 11:22:28 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 560

Any ideas?

fabian · Jan 2, 2020

please provide the full logs ("journalctl -u pve-cluster -u corosync") starting with a restart of pve-cluster and corosync on all nodes..

Search

Search

Proxmox 6 Cluster in WAN and low latency

Mike Tkatchouk

Active Member

fabian

Proxmox Staff Member

fabian

Proxmox Staff Member

Mike Tkatchouk

Active Member

fabian

Proxmox Staff Member