Corosync not working properly over GRE IPSEC

Christoffer JÃ¶nsson · Jul 10, 2016

Hi there! I currently have two servers setup in a local cluster working fine. But to offload these servers, I'm going to setup two more servers at different locations, that has a ping of 3 and 6 ms to my primary servers over IPSEC.

core03: 4.2-2/725d76f0
core01 + 02: 4.1-13/cfb599fb. Running dist-upgrade right now

The tunneling is handled by my virtual openbsd 5.9 firewalls. Topology:

core01 10.0.0.250 -> fw01 -> internet <- fw02 <- core03 10.0.1.254
core02 10.0.0.251

Allow anything on fw01+ fw02

set skip on gre0
set skip on enc0

Also forgot to mention that I had the same results using GRE only.

I can ssh from core01 to core03 and vice versa. /etc/hosts is also properly setup

So far, this is the only error from core03 when added to the cluster. I've tried with and without multicast:

Code:

Jul 10 14:47:22 [1887] core03 corosync crit  [QUORUM] Quorum provider: corosync_votequorum failed to initialize.
Jul 10 14:47:22 [1887] core03 corosync error  [SERV  ] Service engine 'corosync_quorum' failed to load for reason 'configuration error: nodelist or quorum.expected_votes must be configured!'
Jul 10 14:47:22 [1887] core03 corosync error  [MAIN  ] Corosync Cluster Engine exiting with status 20 at service.c:356.

Corosync.log on core01 when adding core03:

Code:

With multicast:
Jul 10 14:35:59 [12461] core01 corosync notice  [CFG  ] Config reload requested by node 1

Without multicast:
Jul 10 14:42:08 [6810] core01 corosync notice  [CFG  ] Config reload requested by node 1
Jul 10 14:42:08 [6810] core01 corosync notice  [TOTEM ] adding new UDPU member {10.0.1.254}

/etc/pve/corosync.conf on core01

Code:

gging {
  debug: off
  to_syslog: yes
}

logging {
  logfile: /var/log/corosync/corosync.log
  timestamp: on
  to_logfile: yes
  to_syslog: yes
}

nodelist {
  node {
  name: core02
  nodeid: 2
  quorum_votes: 1
  ring0_addr: 10.0.0.251
  }

  node {
  name: core03
  nodeid: 3
  quorum_votes: 1
  ring0_addr: 10.0.1.254
  }

  node {
  name: core01
  nodeid: 1
  quorum_votes: 1
  ring0_addr: 10.0.0.250
  }

}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: chrjsnse
  config_version: 43
  ip_version: ipv4
  secauth: on
  version: 2
  interface {
  bindnetaddr: 10.0.0.250
  ringnumber: 0
  }

}

When adding core03 with multicast, tcpdump -ni enc0 on primary firewall does not report activity on port 5405/5404, ever. Only ssh traffic

But without multicast(transport: udpu), tcpdump reports activity on port 5405 as unreachable, all the time

Code:

root@fw01:~ # tcpdump -ni enc0
14:42:08.556786 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.863241 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 136 (DF) (gre encap)
14:42:08.864562 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.942444 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 88 (DF) (gre encap)
14:42:08.944211 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.956259 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 88 (DF) (gre encap)
14:42:08.961779 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:08.974964 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 88 (DF) (gre encap)
14:42:08.976421 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:09.176072 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 136 (DF) (gre encap)
14:42:09.177390 (authentic,confidential): SPI 0xc5b79ea8: 10.0.1.254 > 10.0.0.250: icmp: 10.0.1.254 udp port 5405 unreachable [tos 0xc0] (gre encap)
14:42:09.484490 (authentic,confidential): SPI 0x15fa1f17: 10.0.0.250.48752 > 10.0.1.254.5405: udp 136 (DF) (gre encap)

So I tried copying corosync.conf from core01 to core03:

Code:

root@core03:~# systemctl stop pve-cluster
root@core03:~# systemctl stop corosync

root@core01:~# scp /etc/pve/corosync.conf root@core03:/etc/corosync/corosync.conf

root@core03:~# systemctl start corosync
root@core03:~# systemctl start pve-cluster

But sill get he same error, with and without multicast.

Welcome to tell me if there is any more information you might need. Thanks!

wolfgang · Jul 19, 2016

Hi,

Corosync has an hard-coded limit of 2ms latency.
So if you have a ipsec tunnel with 3-6ms this is normal behavior.
Cluster over WAN are not recommended.

Search

Search

Corosync not working properly over GRE IPSEC

Christoffer JÃ¶nsson

New Member

wolfgang

Proxmox Retired Staff