Corosync not working properly over GRE IPSEC

Dec 11, 2015
16
0
1
Hi there! I currently have two servers setup in a local cluster working fine. But to offload these servers, I'm going to setup two more servers at different locations, that has a ping of 3 and 6 ms to my primary servers over IPSEC.

core03: 4.2-2/725d76f0
core01 + 02: 4.1-13/cfb599fb. Running dist-upgrade right now

The tunneling is handled by my virtual openbsd 5.9 firewalls. Topology:

core01 10.0.0.250 -> fw01 -> internet <- fw02 <- core03 10.0.1.254
core02 10.0.0.251

Allow anything on fw01+ fw02

set skip on gre0
set skip on enc0

Also forgot to mention that I had the same results using GRE only.

I can ssh from core01 to core03 and vice versa. /etc/hosts is also properly setup

So far, this is the only error from core03 when added to the cluster. I've tried with and without multicast:

Code:
Jul 10 14:47:22 [1887] core03 corosync crit  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jul 10 14:47:22 [1887] core03 corosync error  [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jul 10 14:47:22 [1887] core03 corosync error  [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.

Corosync.log on core01 when adding core03:

Code:
With multicast:
Jul 10 14:35:59 [12461] core01 corosync notice  [CFG  ] Config reload requested by node 1

Without multicast:
Jul 10 14:42:08 [6810] core01 corosync notice  [CFG  ] Config reload requested by node 1
Jul 10 14:42:08 [6810] core01 corosync notice  [TOTEM ] adding new UDPU member {10.0.1.254}

/etc/pve/corosync.conf on core01


Code:
gging {
  debug: off
  to_syslog: yes
}

logging {
  logfile: /var/log/corosync/corosync.log
  timestamp: on
  to_logfile: yes
  to_syslog: yes
}

nodelist {
  node {
  name: core02
  nodeid: 2
  quorum_votes: 1
  ring0_addr: 10.0.0.251
  }

  node {
  name: core03
  nodeid: 3
  quorum_votes: 1
  ring0_addr: 10.0.1.254
  }

  node {
  name: core01
  nodeid: 1
  quorum_votes: 1
  ring0_addr: 10.0.0.250
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: chrjsnse
  config_version: 43
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
  bindnetaddr: 10.0.0.250
  ringnumber: 0
  }

}

When adding core03 with multicast, tcpdump -ni enc0 on primary firewall does not report activity on port 5405/5404, ever. Only ssh traffic

But without multicast(transport: udpu), tcpdump reports activity on port 5405 as unreachable, all the time

Code:
root@fw01:~ # tcpdump -ni enc0
14:42:08.556786 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.863241 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 136 (DF) (gre encap)
14:42:08.864562 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.942444 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 88 (DF) (gre encap)
14:42:08.944211 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.956259 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 88 (DF) (gre encap)
14:42:08.961779 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.974964 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 88 (DF) (gre encap)
14:42:08.976421 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:09.176072 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 136 (DF) (gre encap)
14:42:09.177390 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:09.484490 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 136 (DF) (gre encap)

So I tried copying corosync.conf from core01 to core03:

Code:
root@core03:~# systemctl stop pve-cluster
root@core03:~# systemctl stop corosync

root@core01:~# scp /etc/pve/corosync.conf root@core03:/etc/corosync/corosync.conf

root@core03:~# systemctl start corosync
root@core03:~# systemctl start pve-cluster

But sill get he same error, with and without multicast.

Welcome to tell me if there is any more information you might need. Thanks!
 
Last edited:
Hi,

Corosync has an hard-coded limit of 2ms latency.
So if you have a ipsec tunnel with 3-6ms this is normal behavior.
Cluster over WAN are not recommended.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!