Proxmox 6 Cluster in WAN and low latency

Mike Tkatchouk

Active Member
Jan 19, 2018
4
2
43
45
Hey. Earlier, I used version 5 and a cluster of 4 nodes, one of which was on another network with delays of 30-80ms. To work, I configured corosync to work with udpg and everything worked fine. When upgrading to version 6, I recommended updating corosync and removing the udpu settings as recommended.

Now there is a periodic collapse of the cluster, the absence of a quorum and a bunch of errors in the logs. Perhaps this is due to network degradation.

Did you manage to launch a cluster in a distributed WAN network with unstable communication quality or is this idea now utopian?
 
if you have a single link, you can try setting (see 'man corosync.conf', edit the config in /etc/pve/corosync.conf, don't forget to bump the config_version)
  • knet_ping_timeout to 5000
  • knet_pong_count 1
  • knet_ping_interval 200ms
the default calculated values are not very good for single-link clusters with unreliable networks.
 
that being said, 30-80ms is a lot!
 
Hi. I change /etc/corosync/corosync.conf and restart deamon on evry node

totem {
cluster_name: pve
config_version: 28
interface {
ringnumber: 0
knet_ping_timeout: 5000
knet_pong_count: 1
knet_ping_interval: 200
}

And receive this message:

root@pve-01:~# tail -f /var/log/daemon.log
Dec 24 11:22:41 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 10
Dec 24 11:22:41 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 10
Dec 24 11:22:42 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 20
Dec 24 11:22:42 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 20
Dec 24 11:22:43 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 30
Dec 24 11:22:43 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 30
Dec 24 11:22:44 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 40
Dec 24 11:22:44 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 40
Dec 24 11:22:45 pve-01 pmxcfs[30679]: [status] notice: cpg_send_message retry 50
Dec 24 11:22:45 pve-01 pmxcfs[30679]: [dcdb] notice: cpg_send_message retry 50

root@pve-03:~$ tail -f /var/log/daemon.log
Dec 24 11:22:18 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 460
Dec 24 11:22:19 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 470
Dec 24 11:22:20 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 480
Dec 24 11:22:21 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 490
Dec 24 11:22:22 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 500
Dec 24 11:22:23 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 510
Dec 24 11:22:24 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 520
Dec 24 11:22:25 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 530
Dec 24 11:22:26 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 540
Dec 24 11:22:27 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 550
Dec 24 11:22:28 pve-03 pmxcfs[22672]: [dcdb] notice: cpg_join retry 560

Any ideas?
 
please provide the full logs ("journalctl -u pve-cluster -u corosync") starting with a restart of pve-cluster and corosync on all nodes..
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!