PVE6 and Mikrotik EoIP

Drummond Korras

Active Member
Jan 23, 2019
20
23
43
34
www.korcom.co.za
Please can anyone assist me.

I am trying to connect node3 to my cluster, I have a Mikrotik EoIP tunnel with all the vlans setup etc - unicast and multicast 100% successful, however - when I try join the node to the cluster:

waiting for quorum .. OK

and then it just waits, I see the interfaces with some traffic but it just sits and does nothing
If i check the /etc/pve/.members it shows node3 online:1 but doesn't show the IP

What could be the issue?
 
What does pvecm status show? What pveversion -v are you running?

I have a Mikrotik EoIP tunnel with all the vlans setup etc - unicast and multicast 100% successful, however
Do you have any test results to share?
 
Code:
node1:
192.168.2.4 :   unicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.431/2.052/172.618/4.608
192.168.2.4 : multicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.482/2.116/172.656/4.607


node3:
192.168.2.2 :   unicast, xmt/rcv/%loss = 10000/9961/0%, min/avg/max/std-dev = 1.420/1.938/172.769/2.487
192.168.2.2 : multicast, xmt/rcv/%loss = 10000/9961/0%, min/avg/max/std-dev = 1.453/1.988/172.819/2.487

Code:
pvecm status

shows all 3 nodes but after a while, actively blocked


pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
pve-manager: 6.0-9 (running version: 6.0-9/508dcee0)
pve-kernel-5.0: 6.0-9
pve-kernel-helper: 6.0-9
pve-kernel-5.0.21-3-pve: 5.0.21-7
pve-kernel-5.0.15-1-pve: 5.0.15-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 6.5-1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-8
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-7
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.1-3
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.2-pve1
on both nodes
 
192.168.2.4 : unicast, xmt/rcv/%loss = 9538/9526/0%, min/avg/max/std-dev = 1.431/2.052/172.618/4.608
There is a big variation (std-dev) on the latency, this will certainly disrupt corosync traffic.

pvecm status shows all 3 nodes but after a while, actively blocked
The whole output would have been nice. But doesn't matter for above.

proxmox-ve: 6.0-2 (running kernel: 5.0.21-3-pve)
With Proxmox VE 6, corosync 3 is used and sends unicast packets.
 
here is a big variation (std-dev) on the latency, this will certainly disrupt corosync traffic.
I will investigate this


pvecm status shows all 3 nodes but after a while, actively blocked
I will try again tonight and publish results

With Proxmox VE 6, corosync 3 is used and sends unicast packets.
unicast seems to be working

I have checked the logs and it appears that node3 was trying to access local-zfs like the other 2 but I have set that up differently. Could that have caused the "hang". I have restricted the storage now to only node1 and node2 and I will try again tonight to connect node3

I am also getting this
pve-ha-lrm[1246]: unable to write lrm status file - /etc/pve/nodes/node3/.lrm.tmp no such file or directory
 
I have checked the logs and it appears that node3 was trying to access local-zfs like the other 2 but I have set that up differently. Could that have caused the "hang". I have restricted the storage now to only node1 and node2 and I will try again tonight to connect node3
No. The latency is just not good for that scenario.

pve-ha-lrm[1246]: unable to write lrm status file - /etc/pve/nodes/node3/.lrm.tmp no such file or directory
It seems that the folder wasn't created during the join.
 
No. The latency is just not good for that scenario.
I don't want it to join the storage - I just need this machine to act as a voting node for the HA between node1 and node2


It seems that the folder wasn't created during the join.
I will try again tonight and post results

Thank you so so much for you assistance! and such an amazing piece of software
 
Hi @Drummond

As I know by myself eoip is not very nice with latency and cpu. I do not know your eoip implementation details (both ends are linux, or linux on one end only?).
In any case you can try others L2 tunnels or maybe a vpls (mpls tunnel ). If you will describe your eoip details, maybe I could be more precise.

Good luck / Bafta
 
Hi there.

We have a CCR in the data centre that node1 and node2 are connected directly with SFP+

We then have a RB3011 in our satellite office, an L2TP vpn connection between them and then the EoIP on top of the L2TP.
I have setup L2TP because the satellite office is being provided connectivity from the building behind a natted router with a single public on the edge router (owned by the building)

I have then created a VLAN on the CCR bridging the VLAN on EOIP and the SFP+ Vlans.

On the satellite RB, VLAN on EoIP and bridged to the LAN

Data centre:

1572520328254.png

Satellite:
1572520340652.png


This setup on PVE5 worked perfectly, we had all 3 nodes connected and working perfectly.

In the interim I have setup the QDevice on node3


On both sides we have a 100MBps fibre connection
This is the ping result from node3 to node1

64 bytes from 192.168.2.2: icmp_seq=1 ttl=64 time=1.80 ms
64 bytes from 192.168.2.2: icmp_seq=2 ttl=64 time=1.84 ms
64 bytes from 192.168.2.2: icmp_seq=3 ttl=64 time=1.80 ms
64 bytes from 192.168.2.2: icmp_seq=4 ttl=64 time=1.85 ms
64 bytes from 192.168.2.2: icmp_seq=5 ttl=64 time=1.76 ms
64 bytes from 192.168.2.2: icmp_seq=6 ttl=64 time=2.11 ms
64 bytes from 192.168.2.2: icmp_seq=7 ttl=64 time=1.91 ms
64 bytes from 192.168.2.2: icmp_seq=8 ttl=64 time=1.88 ms
64 bytes from 192.168.2.2: icmp_seq=9 ttl=64 time=6.53 ms
64 bytes from 192.168.2.2: icmp_seq=10 ttl=64 time=1.81 ms
64 bytes from 192.168.2.2: icmp_seq=11 ttl=64 time=1.83 ms
64 bytes from 192.168.2.2: icmp_seq=12 ttl=64 time=2.02 ms
64 bytes from 192.168.2.2: icmp_seq=13 ttl=64 time=1.87 ms
64 bytes from 192.168.2.2: icmp_seq=14 ttl=64 time=2.08 ms
64 bytes from 192.168.2.2: icmp_seq=15 ttl=64 time=1.69 ms
64 bytes from 192.168.2.2: icmp_seq=16 ttl=64 time=1.91 ms
64 bytes from 192.168.2.2: icmp_seq=17 ttl=64 time=1.99 ms
64 bytes from 192.168.2.2: icmp_seq=18 ttl=64 time=1.95 ms
64 bytes from 192.168.2.2: icmp_seq=19 ttl=64 time=1.85 ms
64 bytes from 192.168.2.2: icmp_seq=20 ttl=64 time=1.88 ms
64 bytes from 192.168.2.2: icmp_seq=21 ttl=64 time=1.98 ms
 
Last edited:
Hi @Drummond Korras

Nice to have some CCRs/RB3011 around ;)

But your setup it is not so nice. I guess you do not want to gange it!
What you can do, maybe it will help:
- corosync v 2.4.x use udp 5404 and 5405
- I do not know if corosync v 3 use the same ports, so you need to adjust this:

Bash:
IPT="/sbin/iptables"
iptables -F -t mangle
$IPT -t mangle -N dscp-mark48
$IPT -t mangle -A dscp-mark48 -j DSCP --set-dscp 48
$IPT -t mangle -A PREROUTING -p udp -m udp --dport 5404:5405  -j dscp-mark48
$IPT -t mangle -A INPUT -p udp -m udp --dport 5404:5405  -j dscp-mark48
$IPT -t mangle -A OUTPUT -p udp -m udp --dport 5404:5405  -j dscp-mark48

Use this on your PMX servers so corosync traffic would be prioritize. You can also do the same for the all Mikrotik's

Code:
/ip firewall mangle
add action=change-mss chain=forward new-mss=clamp-to-pmtu passthrough=no protocol=tcp tcp-flags=syn tcp-mss=1460-65535
add action=change-dscp chain=prerouting new-dscp=48 passthrough=no port=5404,5405 protocol=udp

Adjust tcp-mss=1460-65535 with your MTU/L2MTU

Also you could use a higher priority for your EoIP tunnel

And I would use IPIP tunnel instead of EoIP, or perhaps proxy-arp without any tunnel in L2TP


Good luck / Bafta !
 
The main problem of your setup is the fact that you have a bridge with several sub-interfaces with different MTU(ethernet/SFP+, EoiP, VLAN):

ETHERNET FRAME [ { L2tp (EoIP ) } ]

I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

And because you have Mikrotik, why do you use VLANs, insted of mpls tunnels/vpls(better latency, very good for voip, and so on)?
 
Last edited:
The main problem of your setup is the fact that you have a bridge with several sub-interfaces with different MTU(ethernet/SFP+, EoiP, VLAN):

ETHERNET FRAME [ { L2tp (EoIP ) } ]

I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.

And because you have Mikrotik, why do you use VLANs, insted of mpls tunnels/vpls(better latency, very good for voip, and so on)?


I would keep only the L2TP that I understood is necessary(to encrypt your data), and on each end of vpn I shall use proxy-arp.
proxy-arp on the L2TP interface?

Do I attach the VLANs to the L2TP interface and then bridge it?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!