PVE6 and Mikrotik EoIP

Drummond Korras

Active Member
Jan 23, 2019
20
25
43
34
www.korcom.co.za
Please can anyone assist me.

I am trying to connect node3 to my cluster, I have a Mikrotik EoIP tunnel with all the vlans setup etc - unicast and multicast 100% successful, however - when I try join the node to the cluster:

waiting for quorum .. OK

and then it just waits, I see the interfaces with some traffic but it just sits and does nothing
If i check the /etc/pve/.members it shows node3 online:1 but doesn't show the IP

What could be the issue?
 
What does pvecm status show? What pveversion -v are you running?

I have a Mikrotik EoIP tunnel with all the vlans setup etc - unicast and multicast 100% successful, however
Do you have any test results to share?
 
Code:
node1:
192.168.2.4 :   unicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.431/2.052/172.618/4.608
192.168.2.4 : multicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.482/2.116/172.656/4.607


node3:
192.168.2.2 :   unicast, xmt/rcv/%loss = 10000/9961/0%, min/avg/max/std-dev = 1.420/1.938/172.769/2.487
192.168.2.2 : multicast, xmt/rcv/%loss = 10000/9961/0%, min/avg/max/std-dev = 1.453/1.988/172.819/2.487

Code:
pvecm status

shows all 3 nodes but after a while, actively blocked


pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 6.5-1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1
on both nodes
 
192.168.2.4 : unicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.431/2.052/172.618/4.608
There is a big variation (std-dev) on the latency, this will certainly disrupt corosync traffic.

pvecm status shows all 3 nodes but after a while, actively blocked
The whole output would have been nice. But doesn't matter for above.

proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
With Proxmox VE 6, corosync 3 is used and sends unicast packets.
 
here is a big variation (std-dev) on the latency, this will certainly disrupt corosync traffic.
I will investigate this


pvecm status shows all 3 nodes but after a while, actively blocked
I will try again tonight and publish results

With Proxmox VE 6, corosync 3 is used and sends unicast packets.
unicast seems to be working

I have checked the logs and it appears that node3 was trying to access local-zfs like the other 2 but I have set that up differently. Could that have caused the "hang". I have restricted the storage now to only node1 and node2 and I will try again tonight to connect node3

I am also getting this
pve-ha-lrm[1246]: unable to write lrm status file - /etc/pve/nodes/node3/.lrm.tmp no such file or directory
 
I have checked the logs and it appears that node3 was trying to access local-zfs like the other 2 but I have set that up differently. Could that have caused the "hang". I have restricted the storage now to only node1 and node2 and I will try again tonight to connect node3
No. The latency is just not good for that scenario.

pve-ha-lrm[1246]: unable to write lrm status file - /etc/pve/nodes/node3/.lrm.tmp no such file or directory
It seems that the folder wasn't created during the join.
 
No. The latency is just not good for that scenario.
I don't want it to join the storage - I just need this machine to act as a voting node for the HA between node1 and node2


It seems that the folder wasn't created during the join.
I will try again tonight and post results

Thank you so so much for you assistance! and such an amazing piece of software
 
Hi @Drummond

As I know by myself eoip is not very nice with latency and cpu. I do not know your eoip implementation details (both ends are linux, or linux on one end only?).
In any case you can try others L2 tunnels or maybe a vpls (mpls tunnel ). If you will describe your eoip details, maybe I could be more precise.

Good luck / Bafta
 
Hi there.

We have a CCR in the data centre that node1 and node2 are connected directly with SFP+

We then have a RB3011 in our satellite office, an L2TP vpn connection between them and then the EoIP on top of the L2TP.
I have setup L2TP because the satellite office is being provided connectivity from the building behind a natted router with a single public on the edge router (owned by the building)

I have then created a VLAN on the CCR bridging the VLAN on EOIP and the SFP+ Vlans.

On the satellite RB, VLAN on EoIP and bridged to the LAN

Data centre:

1572520328254.png

Satellite:
1572520340652.png


This setup on PVE5 worked perfectly, we had all 3 nodes connected and working perfectly.

In the interim I have setup the QDevice on node3


On both sides we have a 100MBps fibre connection
This is the ping result from node3 to node1

64 bytes from 192.168.2.2: icmp_seq=1 ttl=64 time=1.80 ms
64 bytes from 192.168.2.2: icmp_seq=2 ttl=64 time=1.84 ms
64 bytes from 192.168.2.2: icmp_seq=3 ttl=64 time=1.80 ms
64 bytes from 192.168.2.2: icmp_seq=4 ttl=64 time=1.85 ms
64 bytes from 192.168.2.2: icmp_seq=5 ttl=64 time=1.76 ms
64 bytes from 192.168.2.2: icmp_seq=6 ttl=64 time=2.11 ms
64 bytes from 192.168.2.2: icmp_seq=7 ttl=64 time=1.91 ms
64 bytes from 192.168.2.2: icmp_seq=8 ttl=64 time=1.88 ms
64 bytes from 192.168.2.2: icmp_seq=9 ttl=64 time=6.53 ms
64 bytes from 192.168.2.2: icmp_seq=10 ttl=64 time=1.81 ms
64 bytes from 192.168.2.2: icmp_seq=11 ttl=64 time=1.83 ms
64 bytes from 192.168.2.2: icmp_seq=12 ttl=64 time=2.02 ms
64 bytes from 192.168.2.2: icmp_seq=13 ttl=64 time=1.87 ms
64 bytes from 192.168.2.2: icmp_seq=14 ttl=64 time=2.08 ms
64 bytes from 192.168.2.2: icmp_seq=15 ttl=64 time=1.69 ms
64 bytes from 192.168.2.2: icmp_seq=16 ttl=64 time=1.91 ms
64 bytes from 192.168.2.2: icmp_seq=17 ttl=64 time=1.99 ms
64 bytes from 192.168.2.2: icmp_seq=18 ttl=64 time=1.95 ms
64 bytes from 192.168.2.2: icmp_seq=19 ttl=64 time=1.85 ms
64 bytes from 192.168.2.2: icmp_seq=20 ttl=64 time=1.88 ms
64 bytes from 192.168.2.2: icmp_seq=21 ttl=64 time=1.98 ms
 
Last edited:
Hi @Drummond Korras

Nice to have some CCRs/RB3011 around ;)

But your setup it is not so nice. I guess you do not want to gange it!
What you can do, maybe it will help:
- corosync v 2.4.x use udp 5404 and 5405
- I do not know if corosync v 3 use the same ports, so you need to adjust this:

Bash:
IPT="/sbin/iptables"
iptables -F -t mangle
$IPT -t mangle -N dscp-mark48
$IPT -t mangle -A dscp-mark48 -j DSCP --set-dscp 48
$IPT -t mangle -A PREROUTING -p udp -m udp --dport 5404:5405  -j dscp-mark48
$IPT -t mangle -A INPUT -p udp -m udp --dport 5404:5405  -j dscp-mark48
$IPT -t mangle -A OUTPUT -p udp -m udp --dport 5404:5405  -j dscp-mark48

Use this on your PMX servers so corosync traffic would be prioritize. You can also do the same for the all Mikrotik's

Code:
/ip firewall mangle
add action=change-mss chain=forward new-mss=clamp-to-pmtu passthrough=no protocol=tcp tcp-flags=syn tcp-mss=1460-65535
add action=change-dscp chain=prerouting new-dscp=48 passthrough=no port=5404,5405 protocol=udp

Adjust tcp-mss=1460-65535 with your MTU/L2MTU

Also you could use a higher priority for your EoIP tunnel

And I would use IPIP tunnel instead of EoIP, or perhaps proxy-arp without any tunnel in L2TP


Good luck / Bafta !
 
The main problem of your setup is the fact that you have a bridge with several sub-interfaces with different MTU(ethernet/SFP+, EoiP, VLAN):

ETHERNET FRAME [ { L2tp (EoIP ) } ]

I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

And because you have Mikrotik, why do you use VLANs, insted of mpls tunnels/vpls(better latency, very good for voip, and so on)?
 
Last edited:
The main problem of your setup is the fact that you have a bridge with several sub-interfaces with different MTU(ethernet/SFP+, EoiP, VLAN):

ETHERNET FRAME [ { L2tp (EoIP ) } ]

I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

And because you have Mikrotik, why do you use VLANs, insted of mpls tunnels/vpls(better latency, very good for voip, and so on)?


I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.
proxy-arp on the L2TP interface?

Do I attach the VLANs to the L2TP interface and then bridge it?