PVE6 and Mikrotik EoIP

Drummond Korras · Oct 29, 2019

Please can anyone assist me.

I am trying to connect node3 to my cluster, I have a Mikrotik EoIP tunnel with all the vlans setup etc - unicast and multicast 100% successful, however - when I try join the node to the cluster:

waiting for quorum .. OK

and then it just waits, I see the interfaces with some traffic but it just sits and does nothing
If i check the /etc/pve/.members it shows node3 online:1 but doesn't show the IP

What could be the issue?

Alwin · Oct 30, 2019

What does pvecm status show? What pveversion -v are you running?

Drummond Korras said:
I have a Mikrotik EoIP tunnel with all the vlans setup etc - unicast and multicast 100% successful, however

Do you have any test results to share?

Drummond Korras · Oct 30, 2019

Code:

node1:
192.168.2.4 :   unicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.431/2.052/172.618/4.608
192.168.2.4 : multicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.482/2.116/172.656/4.607


node3:
192.168.2.2 :   unicast, xmt/rcv/%loss = 10000/9961/0%, min/avg/max/std-dev = 1.420/1.938/172.769/2.487
192.168.2.2 : multicast, xmt/rcv/%loss = 10000/9961/0%, min/avg/max/std-dev = 1.453/1.988/172.819/2.487

Code:

pvecm status

shows all 3 nodes but after a while, actively blocked


pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 6.5-1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1

on both nodes

Alwin · Oct 30, 2019

Drummond Korras said:
192.168.2.4 : unicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.431/2.052/172.618/4.608

There is a big variation (std-dev) on the latency, this will certainly disrupt corosync traffic.

Drummond Korras said:
pvecm status shows all 3 nodes but after a while, actively blocked

The whole output would have been nice. But doesn't matter for above.

Drummond Korras said:
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)

With Proxmox VE 6, corosync 3 is used and sends unicast packets.

Drummond Korras · Oct 30, 2019

here is a big variation (std-dev) on the latency, this will certainly disrupt corosync traffic.

I will investigate this

pvecm status shows all 3 nodes but after a while, actively blocked

I will try again tonight and publish results

With Proxmox VE 6, corosync 3 is used and sends unicast packets.

unicast seems to be working

I have checked the logs and it appears that node3 was trying to access local-zfs like the other 2 but I have set that up differently. Could that have caused the "hang". I have restricted the storage now to only node1 and node2 and I will try again tonight to connect node3

I am also getting this
pve-ha-lrm[1246]: unable to write lrm status file - /etc/pve/nodes/node3/.lrm.tmp no such file or directory

Alwin · Oct 30, 2019

Drummond Korras said:
I have checked the logs and it appears that node3 was trying to access local-zfs like the other 2 but I have set that up differently. Could that have caused the "hang". I have restricted the storage now to only node1 and node2 and I will try again tonight to connect node3

No. The latency is just not good for that scenario.

Drummond Korras said:
pve-ha-lrm[1246]: unable to write lrm status file - /etc/pve/nodes/node3/.lrm.tmp no such file or directory

It seems that the folder wasn't created during the join.

Drummond Korras · Oct 30, 2019

No. The latency is just not good for that scenario.

I don't want it to join the storage - I just need this machine to act as a voting node for the HA between node1 and node2

It seems that the folder wasn't created during the join.

I will try again tonight and post results

Thank you so so much for you assistance! and such an amazing piece of software

Alwin · Oct 30, 2019

Drummond Korras said:
I don't want it to join the storage - I just need this machine to act as a voting node for the HA between node1 and node2

This has nothing to do with the storage. The latency is not good for joining the nodes together.

If you just want an external vote, see here.
https://pve.proxmox.com/pve-docs/chapter-pvecm.html#_corosync_external_vote_support

Drummond Korras · Oct 31, 2019

Thank you so much
I tried again last night and it failed again
I have setup a QDevice on that node3 and setup HA
Thank you once again for all your assistance

guletz · Oct 31, 2019

Hi @Drummond

As I know by myself eoip is not very nice with latency and cpu. I do not know your eoip implementation details (both ends are linux, or linux on one end only?).
In any case you can try others L2 tunnels or maybe a vpls (mpls tunnel ). If you will describe your eoip details, maybe I could be more precise.

Good luck / Bafta

Drummond Korras · Oct 31, 2019

Hi there.

We have a CCR in the data centre that node1 and node2 are connected directly with SFP+

We then have a RB3011 in our satellite office, an L2TP vpn connection between them and then the EoIP on top of the L2TP.
I have setup L2TP because the satellite office is being provided connectivity from the building behind a natted router with a single public on the edge router (owned by the building)

I have then created a VLAN on the CCR bridging the VLAN on EOIP and the SFP+ Vlans.

On the satellite RB, VLAN on EoIP and bridged to the LAN

Data centre:

Satellite:

This setup on PVE5 worked perfectly, we had all 3 nodes connected and working perfectly.

In the interim I have setup the QDevice on node3

On both sides we have a 100MBps fibre connection
This is the ping result from node3 to node1

64 bytes from 192.168.2.2: icmp_seq=1 ttl=64 time=1.80 ms
64 bytes from 192.168.2.2: icmp_seq=2 ttl=64 time=1.84 ms
64 bytes from 192.168.2.2: icmp_seq=3 ttl=64 time=1.80 ms
64 bytes from 192.168.2.2: icmp_seq=4 ttl=64 time=1.85 ms
64 bytes from 192.168.2.2: icmp_seq=5 ttl=64 time=1.76 ms
64 bytes from 192.168.2.2: icmp_seq=6 ttl=64 time=2.11 ms
64 bytes from 192.168.2.2: icmp_seq=7 ttl=64 time=1.91 ms
64 bytes from 192.168.2.2: icmp_seq=8 ttl=64 time=1.88 ms
64 bytes from 192.168.2.2: icmp_seq=9 ttl=64 time=6.53 ms
64 bytes from 192.168.2.2: icmp_seq=10 ttl=64 time=1.81 ms
64 bytes from 192.168.2.2: icmp_seq=11 ttl=64 time=1.83 ms
64 bytes from 192.168.2.2: icmp_seq=12 ttl=64 time=2.02 ms
64 bytes from 192.168.2.2: icmp_seq=13 ttl=64 time=1.87 ms
64 bytes from 192.168.2.2: icmp_seq=14 ttl=64 time=2.08 ms
64 bytes from 192.168.2.2: icmp_seq=15 ttl=64 time=1.69 ms
64 bytes from 192.168.2.2: icmp_seq=16 ttl=64 time=1.91 ms
64 bytes from 192.168.2.2: icmp_seq=17 ttl=64 time=1.99 ms
64 bytes from 192.168.2.2: icmp_seq=18 ttl=64 time=1.95 ms
64 bytes from 192.168.2.2: icmp_seq=19 ttl=64 time=1.85 ms
64 bytes from 192.168.2.2: icmp_seq=20 ttl=64 time=1.88 ms
64 bytes from 192.168.2.2: icmp_seq=21 ttl=64 time=1.98 ms

guletz · Oct 31, 2019

Hi @Drummond Korras

Nice to have some CCRs/RB3011 around

But your setup it is not so nice. I guess you do not want to gange it!
What you can do, maybe it will help:
- corosync v 2.4.x use udp 5404 and 5405
- I do not know if corosync v 3 use the same ports, so you need to adjust this:

Bash:

IPT="/sbin/iptables"
iptables -F -t mangle
$IPT -t mangle -N dscp-mark48
$IPT -t mangle -A dscp-mark48 -j DSCP --set-dscp 48
$IPT -t mangle -A PREROUTING -p udp -m udp --dport 5404:5405  -j dscp-mark48
$IPT -t mangle -A INPUT -p udp -m udp --dport 5404:5405  -j dscp-mark48
$IPT -t mangle -A OUTPUT -p udp -m udp --dport 5404:5405  -j dscp-mark48

Use this on your PMX servers so corosync traffic would be prioritize. You can also do the same for the all Mikrotik's

Code:

/ip firewall mangle
add action=change-mss chain=forward new-mss=clamp-to-pmtu passthrough=no protocol=tcp tcp-flags=syn tcp-mss=1460-65535
add action=change-dscp chain=prerouting new-dscp=48 passthrough=no port=5404,5405 protocol=udp

Adjust tcp-mss=1460-65535 with your MTU/L2MTU

Also you could use a higher priority for your EoIP tunnel

And I would use IPIP tunnel instead of EoIP, or perhaps proxy-arp without any tunnel in L2TP

Good luck / Bafta !

Drummond Korras · Oct 31, 2019

I more than happy to change the network for it operate more efficiently.
I will try the proxy-arp

What suggestions do you have for me to change the network.

Thank you

guletz · Oct 31, 2019

The main problem of your setup is the fact that you have a bridge with several sub-interfaces with different MTU(ethernet/SFP+, EoiP, VLAN):

ETHERNET FRAME [ { L2tp (EoIP ) } ]

I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

And because you have Mikrotik, why do you use VLANs, insted of mpls tunnels/vpls(better latency, very good for voip, and so on)?

Alwin · Oct 31, 2019

guletz said:
I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

L2TP does not provide encryption.
https://en.wikipedia.org/wiki/Layer_2_Tunneling_Protocol

guletz · Oct 31, 2019

HI @Alwin ,

I know that L2TP does not provide encryption, but I gues/speculate he encrypt his L2TP tunnel with "something"(ipsec maybe)!

Drummond Korras · Oct 31, 2019

guletz said:
The main problem of your setup is the fact that you have a bridge with several sub-interfaces with different MTU(ethernet/SFP+, EoiP, VLAN):

ETHERNET FRAME [ { L2tp (EoIP ) } ]

I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

And because you have Mikrotik, why do you use VLANs, insted of mpls tunnels/vpls(better latency, very good for voip, and so on)?

I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

proxy-arp on the L2TP interface?

Do I attach the VLANs to the L2TP interface and then bridge it?

Drummond Korras · Oct 31, 2019

guletz said:
HI @Alwin ,

I know that L2TP does not provide encryption, but I gues/speculate he encrypt his L2TP tunnel with "something"(ipsec maybe)!

How about SSTP ?

guletz · Nov 1, 2019

Drummond Korras said:
proxy-arp on the L2TP interface?

Add proxy-arp on the bridge where you have l2tp+vlans.

Drummond Korras · Nov 18, 2019

Hi again
I have setup a MPLS/VPLS over L2TP

Would this work in connect the remote node3 to the cluster

PVE6 and Mikrotik EoIP

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Proxmox Retired Staff

Active Member

Famous Member

Active Member

Famous Member

Active Member

Famous Member

Proxmox Retired Staff

Famous Member

Active Member

Active Member

Famous Member

Active Member