(Repost from /r/homelab on Reddit)
Running Proxmox 4.4-13 on Debian 8.8
So first let me explain my network topology.
WAN->pfSense router (dedicated box)->Proxmox server->Router/load balancer VM (called lb-lan)->VM1, VM2, etc.
The purpose of the GRE tunnel is so I can have a server somewhere near (low latency) to serve as my WAN interface for the VM's (this box referred to as lb-wan. I've tested just doing a GRE tunnel from my pfSense box to lb-wan and it works full speed)
I remember when I got Proxmox first setup when I built this server earlier this year (been meaning to get back into server and tech things for a while, just been having some life problems) I initially had some rules that were created by Proxmox's firewall scripts blocking the GRE tunnel from working. This thread contained the solution so I edited the Proxmox firewall scripts to not automatically add the conntrack rule (probably a bad idea in hindsight) and went on my merry way.
I have my routing VM (lb-lan) terminate the GRE tunnel and handle routing for my other VM's. It has two network interfaces, one on my LAN, and another for a private VM network. I have firewall rules setup on the LAN interface so a) it's only allowed to connect to the GRE tunnel host, DNS and a few other services, and b) It's not allowed to connect to anything else on my LAN. I've been meaning to move it to it's own isolated VLAN but so far it's been working well. I have the private VM network bridged to a VLAN so I can access it from other devices on my network.
On my pfSense router I have the proper NAT rules in place to forward the GRE packets to lb-lan.
However about a week or so ago I think I upgraded something on the hypervisor through APT (going through package logs now) and that upgrade caused problems for my GRE tunnel (I'm not currently hosting anything major using it so it took me a while to notice, fail). Around the same time but before I noticed this problem I got upgraded from a 50/10 VDSL connection to a 150/150 FTTH connection so I initially blamed my provider throttling things (even though they have a history of not throttling things except in rural areas with congested ADSL). Testing with pfSense like I mentioned above ruled out the provider.
Earlier today I tested just VM->VM GRE tunnel traffic and was only getting 2 Gbps using iperf3, when "naked" VM->VM traffic with iperf3 was around 70 Gbps (The hypervisor is a recent build with an E3-1230v5 and 32GB of DDR4 RAM for reference). Seems a little slow. I've gotten better with this exact server running Proxmox, weeks ago. I've gotten around 40-60 Gbps testing with VMWare Fusion on my 2015 Macbook Pro and multiple Debian VM's so I'm not going crazy.
One thing to note is that no IPSec is being used in any of these tests, but what's interesting is that when IPSec is enabled (using strongswan on lb-lan to lb-wan) I managed to get full speed (of 150 mbps), without IPSec it's only a few Mbps and severe packet loss (retries)
Here's an iperf3 test I did, disabling IPSec and then re-enabling it between lb-lan and lb-wan
Although I'm going to be using IPSec between lb-lan and lb-wan to protect against sniffing of the GRE tunnel I want to figure out why this happens in the first place.
EDIT: I found the packages I upgraded on May 24th, which was before I got the new internet connection but lines up with my timeline. I remember I just upgraded all packages on the hypervisor on that day to patch the Samba vulnerability. As I ran Samba on here as a NAS on steroids sort of thing.
https://pastebin.com/mE9GELjq
Anyone else seen similar?
Thanks
Running Proxmox 4.4-13 on Debian 8.8
So first let me explain my network topology.
WAN->pfSense router (dedicated box)->Proxmox server->Router/load balancer VM (called lb-lan)->VM1, VM2, etc.
The purpose of the GRE tunnel is so I can have a server somewhere near (low latency) to serve as my WAN interface for the VM's (this box referred to as lb-wan. I've tested just doing a GRE tunnel from my pfSense box to lb-wan and it works full speed)
I remember when I got Proxmox first setup when I built this server earlier this year (been meaning to get back into server and tech things for a while, just been having some life problems) I initially had some rules that were created by Proxmox's firewall scripts blocking the GRE tunnel from working. This thread contained the solution so I edited the Proxmox firewall scripts to not automatically add the conntrack rule (probably a bad idea in hindsight) and went on my merry way.
I have my routing VM (lb-lan) terminate the GRE tunnel and handle routing for my other VM's. It has two network interfaces, one on my LAN, and another for a private VM network. I have firewall rules setup on the LAN interface so a) it's only allowed to connect to the GRE tunnel host, DNS and a few other services, and b) It's not allowed to connect to anything else on my LAN. I've been meaning to move it to it's own isolated VLAN but so far it's been working well. I have the private VM network bridged to a VLAN so I can access it from other devices on my network.
On my pfSense router I have the proper NAT rules in place to forward the GRE packets to lb-lan.
However about a week or so ago I think I upgraded something on the hypervisor through APT (going through package logs now) and that upgrade caused problems for my GRE tunnel (I'm not currently hosting anything major using it so it took me a while to notice, fail). Around the same time but before I noticed this problem I got upgraded from a 50/10 VDSL connection to a 150/150 FTTH connection so I initially blamed my provider throttling things (even though they have a history of not throttling things except in rural areas with congested ADSL). Testing with pfSense like I mentioned above ruled out the provider.
Earlier today I tested just VM->VM GRE tunnel traffic and was only getting 2 Gbps using iperf3, when "naked" VM->VM traffic with iperf3 was around 70 Gbps (The hypervisor is a recent build with an E3-1230v5 and 32GB of DDR4 RAM for reference). Seems a little slow. I've gotten better with this exact server running Proxmox, weeks ago. I've gotten around 40-60 Gbps testing with VMWare Fusion on my 2015 Macbook Pro and multiple Debian VM's so I'm not going crazy.
One thing to note is that no IPSec is being used in any of these tests, but what's interesting is that when IPSec is enabled (using strongswan on lb-lan to lb-wan) I managed to get full speed (of 150 mbps), without IPSec it's only a few Mbps and severe packet loss (retries)
Here's an iperf3 test I did, disabling IPSec and then re-enabling it between lb-lan and lb-wan
Code:
root@lb-lan:~# ipsec stop && iperf3 -c 192.168.168.1 && ipsec start && sleep 10 && iperf3 -c 192.168.168.1
Stopping strongSwan IPsec...
Connecting to host 192.168.168.1, port 5201
[ 4] local 192.168.168.2 port 39432 connected to 192.168.168.1 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 153 KBytes 1.25 Mbits/sec 37 2.78 KBytes
[ 4] 1.00-2.00 sec 231 KBytes 1.89 Mbits/sec 48 4.17 KBytes
[ 4] 2.00-3.00 sec 245 KBytes 2.01 Mbits/sec 50 2.78 KBytes
[ 4] 3.00-4.00 sec 231 KBytes 1.89 Mbits/sec 48 4.17 KBytes
[ 4] 4.00-5.00 sec 55.6 KBytes 456 Kbits/sec 18 2.78 KBytes
[ 4] 5.00-6.00 sec 25.0 KBytes 205 Kbits/sec 16 2.78 KBytes
[ 4] 6.00-7.00 sec 83.4 KBytes 684 Kbits/sec 23 4.17 KBytes
[ 4] 7.00-8.00 sec 234 KBytes 1.91 Mbits/sec 51 4.17 KBytes
[ 4] 8.00-9.00 sec 170 KBytes 1.39 Mbits/sec 38 4.17 KBytes
[ 4] 9.00-10.00 sec 80.7 KBytes 661 Kbits/sec 24 2.78 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 1.47 MBytes 1.23 Mbits/sec 353 sender
[ 4] 0.00-10.00 sec 1.43 MBytes 1.20 Mbits/sec receiver
iperf Done.
Starting strongSwan 5.2.1 IPsec [starter]...
Connecting to host 192.168.168.1, port 5201
[ 4] local 192.168.168.2 port 39435 connected to 192.168.168.1 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 20.3 MBytes 170 Mbits/sec 64 276 KBytes
[ 4] 1.00-2.00 sec 19.0 MBytes 159 Mbits/sec 1 216 KBytes
[ 4] 2.00-3.00 sec 18.9 MBytes 158 Mbits/sec 0 236 KBytes
[ 4] 3.00-4.00 sec 18.3 MBytes 154 Mbits/sec 0 257 KBytes
[ 4] 4.00-5.00 sec 18.9 MBytes 159 Mbits/sec 0 277 KBytes
[ 4] 5.00-6.00 sec 19.2 MBytes 161 Mbits/sec 2 227 KBytes
[ 4] 6.00-7.00 sec 18.9 MBytes 158 Mbits/sec 0 256 KBytes
[ 4] 7.00-8.00 sec 18.3 MBytes 154 Mbits/sec 0 272 KBytes
[ 4] 8.00-9.00 sec 18.9 MBytes 159 Mbits/sec 0 280 KBytes
[ 4] 9.00-10.00 sec 19.0 MBytes 159 Mbits/sec 1 217 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-10.00 sec 190 MBytes 159 Mbits/sec 68 sender
[ 4] 0.00-10.00 sec 188 MBytes 158 Mbits/sec receiver
iperf Done.
EDIT: I found the packages I upgraded on May 24th, which was before I got the new internet connection but lines up with my timeline. I remember I just upgraded all packages on the hypervisor on that day to patch the Samba vulnerability. As I ran Samba on here as a NAS on steroids sort of thing.
https://pastebin.com/mE9GELjq
Anyone else seen similar?
Thanks