Strange connectivity issues with Hetzner vSwitch Cloud Network

TheJKM

New Member
Oct 12, 2023
1
0
1
Hello,

I use Proxmox since version 4.4 in my homelab, and recently set up my first Proxmox instance on a Hetzner Root Server. The machine has a single public IPv4 and the usual /64 IPv6 subnet. While most of the setup is working, I have an issue where I need some more ideas.

For IPv6, the host and every VM has an address from the subnet. This is the most straightforward configuration and works flawless.
For IPv4, the public address is assigned to the host. For the VMs, I use a routed setup with masquerading on the host. This is also working flawless.

Additionally, the machine and the VMs have to be accessible from Hetzner cloud servers within a private network. To make this work, I connected a vSwitch to the cloud network and to the server. Hetzner uses VLAN tag 4000 for vSwitch packages. I created an additional interface for this VLAN, and an additional VM Bridge. The bridge has an IP address from the vSwitch private subnet. All VMs connected to this VM Bridge also get an IP address from this subnet. For the VMs, this is working as it should, the VMs can access the cloud servers and vice versa.
On the host, I experience a strange issue. Right after reboot, the connection works incoming and outgoing. After some minutes, it stops working. The host is inaccessible from all cloud servers. In the other direction, the cloud servers are also at first inaccessible from the host. However, after a few packages, the connection comes back alive and works for a few minutes in both directions - until the issue starts again.
When I "fix" the connection with a ping, it takes 3-4 lost packages until the connection works.
When I "fix" it using traceroute to a cloud server, it takes from 14 to 16 until the vSwitch answers.
The issue is only on the host - on the VMs it's always working, independently from the current state on the host.

What could be the issue? Honestly I'm running out of ideas, and I couldn't find a similar case in the web. I'll append my /etc/network/interfaces configuration with the IPs masked.

One note, I really don't want to have more than one public IPv4 address. There will be quite some VMs, and I need the public IPv4 just for API provides who think it's fine to not offer IPv6 for their APIs. All incoming traffic goes through a load balancer and the private network. In times of short IPv4 supplies, I don't want to waste them just for having IPv4 internet connectivity.

In case you need any more information, feel free to ask. Looking forward for any ideas.

TheJKM

Code:
source /etc/network/interfaces.d/*

auto lo
iface lo inet loopback

iface lo inet6 loopback

auto enp27s0
iface enp27s0 inet static
    address <PUBLIC_IP>/32
    gateway <PUBLIC_IP_GATEWAY>
    pointopoint <PUBLIC_IP_GATEWAY>
    up route add -net <PUBLIC_IP_NET_BASE> netmask 255.255.255.192 gw <PUBLIC_IP_GATEWAY> dev enp27s0

iface enp27s0 inet6 static
    address <PUBLIC_IP_V6>/128
    gateway fe80::1

auto enp27s0.4000
iface enp27s0.4000 inet manual
    mtu 1400

auto vmbr0
iface vmbr0 inet static
    address 192.168.0.1/24
    bridge-ports none
    bridge-stp off
    bridge-fd 0
    post-up   iptables -t nat -A POSTROUTING -s '192.168.0.0/24' -o enp27s0 -j MASQUERADE
    post-down iptables -t nat -D POSTROUTING -s '192.168.0.0/24' -o enp27s0 -j MASQUERADE

iface vmbr0 inet6 manual
    address <ANOTHER_PUBLIC_IP_V6>/64
    up ip -6 route add <PUBLIC_IP_V6_PREFIX>/64 dev vmbr0

auto vmbr4000
iface vmbr4000 inet static
    address 10.0.30.10/24
    bridge-ports enp27s0.4000
    bridge-stp off
    bridge-fd 0
    mtu 1400
    up ip route add 10.0.0.0/16 via 10.0.30.1 dev vmbr4000
    down ip route del 10.0.0.0/16 via 10.0.30.1 dev vmbr4000
 
Make sure you set the MTU of 1400 on the VLAN and VM-Bridge. Also set the VMs that are inside this network to MTU of 1 (inherit MTU from Bridge)
If you use OPNSense as Router, you might also want to configure the MTU on its interfaces.

WARNING: The VMs might go down until you reboot them!

I have seen working ping but had issues with cross-node data transfer. Classical 'randomness' of a MTU issue.
 
Last edited:
I have a similar strange issue that I'm not able to resolve since months. I'm sending data with Telegraf (server on Hetzner Cloud) to an InfluxDB (virtual server on Hetzner Robot with Proxmox VE) and get lots of transfer errors. If I redirect the traffic to the public IP of the InfluxDB with port forward it works like it should. Therefore I assume, that somewhere is a MTU issue, but until now I did not find the cause. Ping and other data just works fine, like standard HTTP traffic of websites.

My setup is as following:

* Debian 12 Linux HAproxy loadbalancer in the Hetzner Cloud
* Proxmox VE server (8.3.4) with Hetzner Robot, with two public IP addresses
* Debian 12 Linux VM on the Proxmox VE server


# Hetzner Cloud network
1740040275683.png

# Hetzner cloud server interface config (from the automated setup, fetched from DHCP)
Code:
4: ens11: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether 86:00:00:f0:c7:93 brd ff:ff:ff:ff:ff:ff
    altname enp0s11
    inet 172.30.0.2/32 brd 172.30.0.2 scope global dynamic ens11
       valid_lft 71127sec preferred_lft 71127sec
    inet6 fe80::8400:ff:fef0:c793/64 scope link
       valid_lft forever preferred_lft forever

# Hetzner Robot vSwitch

1740055106274.png


# Hetzner Robot Proxmox VE interface config

The Proxmox VE host itself, is not needed to access from the Hetzner Cloud, therefore there is no route here.
Code:
auto lo
iface lo inet loopback

iface enp41s0 inet manual
#Public

iface enp41s0.4001 inet manual
        mtu 1400
#LAN

iface enp41s0.4005 inet manual
        mtu 1400
#LAN_Proxmox

iface enp41s0.4090 inet manual
        mtu 1400
#LAN_Test

auto vmbr0
iface vmbr0 inet static
        address 94.130.xxx.xxx/26
        gateway 94.130.xxx.xxx
        bridge-ports enp41s0
        bridge-stp off
        bridge-fd 0
#Public

auto vmbr1
iface vmbr1 inet static
        address 172.30.1.240/24
        bridge-ports enp41s0.4001
        bridge-stp off
        bridge-fd 0
        mtu 1400
#LAN

auto vmbr5
iface vmbr5 inet manual
        bridge-ports enp41s0.4005
        bridge-stp off
        bridge-fd 0
        mtu 1400
#LAN_Proxmox

auto vmbr90
iface vmbr90 inet manual
        bridge-ports enp41s0.4090
        bridge-stp off
        bridge-fd 0
        mtu 1400
#LAN_Test

source /etc/network/interfaces.d/*


# Hetzner Robot Proxmox VE - VM network config

1740040640317.png
Code:
2: ens18: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UP group default qlen 1000
    link/ether 00:50:56:aa:aa:aa brd ff:ff:ff:ff:ff:ff
    altname enp0s18
    inet 172.30.1.30/24 brd 172.30.1.255 scope global ens18
       valid_lft forever preferred_lft forever
    inet6 2a01:4f8:xxxx:xxxx:1::30/80 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::xxxx:xxxx:xxxx:xxxx/64 scope link
       valid_lft forever preferred_lft forever


# Test from Hetzner Cloud server to Hetzner Robot Proxmox VE - VM

Code:
root@hetzner-cloud-server-1:~# ping -M do -s 1376 172.30.1.30
PING 172.30.1.30 (172.30.1.30) 1376(1404) bytes of data.
1384 bytes from 172.30.1.30: icmp_seq=1 ttl=63 time=3.54 ms
1384 bytes from 172.30.1.30: icmp_seq=2 ttl=63 time=3.05 ms
1384 bytes from 172.30.1.30: icmp_seq=3 ttl=63 time=2.92 ms
1384 bytes from 172.30.1.30: icmp_seq=4 ttl=63 time=3.00 ms
^C
--- 172.30.1.30 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3004ms
rtt min/avg/max/mdev = 2.918/3.124/3.538/0.243 ms


root@hetzner-cloud-server-1:~# ping -M do -s 1377 172.30.1.30
PING 172.30.1.30 (172.30.1.30) 1377(1405) bytes of data.
^C
--- 172.30.1.30 ping statistics ---
7 packets transmitted, 0 received, 100% packet loss, time 6123ms


root@hetzner-cloud-server-1:~# mtr -s 1000 -r -c 200 172.30.1.30
Start: 2025-02-20T10:12:01+0100
HOST: 172.30.0.2                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 172.30.0.1                 0.0%   200    3.9   2.7   1.6  17.1   1.2
  2.|-- 172.30.0.1                 0.0%   200    0.7   0.8   0.4  13.2   1.2
  3.|-- 172.30.1.30                0.0%   200    3.0   3.1   2.9   6.0   0.2


# Test from Hetzner Cloud server to Hetzner Cloud server

Code:
root@hetzner-cloud-server-1:~# ping -M do -s 1422 172.30.0.10
PING 172.30.0.10 (172.30.0.10) 1422(1450) bytes of data.
1430 bytes from 172.30.0.10: icmp_seq=1 ttl=63 time=0.352 ms
1430 bytes from 172.30.0.10: icmp_seq=2 ttl=63 time=0.454 ms
1430 bytes from 172.30.0.10: icmp_seq=3 ttl=63 time=0.394 ms
1430 bytes from 172.30.0.10: icmp_seq=4 ttl=63 time=0.462 ms
^C
--- 172.30.0.10 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3056ms
rtt min/avg/max/mdev = 0.352/0.415/0.462/0.045 ms


root@hetzner-cloud-server-1:~# ping -M do -s 1423 172.30.0.10
PING 172.30.0.10 (172.30.0.10) 1423(1451) bytes of data.
ping: local error: message too long, mtu=1450
ping: local error: message too long, mtu=1450
ping: local error: message too long, mtu=1450
ping: local error: message too long, mtu=1450
^C
--- 172.30.0.10 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 3059ms


# Test from Hetzner Robot Proxmox VE - VM to Hetzner Cloud server

Code:
root@hetzner-robot-promox-ve-vm:~# ping -M do -s 1372 172.30.0.2
PING 172.30.0.2 (172.30.0.2) 1372(1400) bytes of data.
1380 bytes from 172.30.0.2: icmp_seq=1 ttl=62 time=3.76 ms
1380 bytes from 172.30.0.2: icmp_seq=2 ttl=62 time=3.02 ms
1380 bytes from 172.30.0.2: icmp_seq=3 ttl=62 time=2.93 ms
1380 bytes from 172.30.0.2: icmp_seq=4 ttl=62 time=3.01 ms
^C
--- 172.30.0.2 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3005ms
rtt min/avg/max/mdev = 2.926/3.177/3.757/0.336 ms


root@hetzner-robot-promox-ve-vm:~# ping -M do -s 1373 172.30.0.2
PING 172.30.0.2 (172.30.0.2) 1373(1401) bytes of data.
ping: local error: message too long, mtu=1400
ping: local error: message too long, mtu=1400
ping: local error: message too long, mtu=1400
^C
--- 172.30.0.2 ping statistics ---
3 packets transmitted, 0 received, +3 errors, 100% packet loss, time 2029ms


root@hetzner-robot-promox-ve-vm:~# mtr -s 1000 -r -c 200 172.30.0.2
Start: 2025-02-20T10:12:14+0100
HOST: 172.30.1.30                 Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 172.30.1.1                 0.0%   200    0.6   1.0   0.4  35.3   3.0
  2.|-- 172.30.0.2                 0.0%   200    3.0   3.0   2.8   3.8   0.1
 
Last edited:
@TheJKM Did you ever find a solution? I have the same problem.

My setup: A dedicated server with PVE and a VM, connected via vSwitch with a private network in the Hetzner cloud, and a VM in the Hetzner cloud.
  • After a reboot of the PVE host all 3 servers can ping each other.
  • After maybe 10 minutes (without sending a ping) the ping no longer works from the cloud VM to the PVE host, ping from the cloud VM to the PVE VM still works and vice versa.
  • When I send a ping from the PVE host to the cloud VM it takes maybe 10 seconds before the first ping goes through. At that point the ping also works again from the cloud VM to the PVE host.
Network config from the PVE host (public IP addresses and gateway changed):
Code:
auto lo
iface lo inet loopback

iface lo inet6 loopback

auto enp6s0
iface enp6s0 inet manual

auto enp6s0.4001
iface enp6s0.4001 inet manual
    mtu 1400

auto vmbr0
iface vmbr0 inet static
    address 100.200.32.226/26
    gateway 100.200.32.193
    pointopoint 100.200.32.193
    bridge-ports enp6s0
    bridge-stp off
    bridge-fd 0

iface vmbr0 inet6 static
    address 2001:0db8:85a3:0000::2/64
    gateway fe80::1

auto vmbr1
iface vmbr1 inet static
    address 10.100.1.2/24
    bridge-ports enp6s0.4001
    bridge-stp off
    bridge-fd 0
    mtu 1400
    up ip route add 10.100.0.0/16 via 10.100.1.1 dev vmbr1
    down ip route del 10.100.0.0/16 via 10.100.1.1 dev vmbr1

Network config of the cloud VM (public IP address changed):
Code:
auto lo
iface lo inet loopback

auto eth0
iface eth0 inet dhcp

iface eth0 inet6 static
    address 2001:0000:130F:0000::1/64
    dns-nameservers 2a01:4ff:ff00::add:2 2a01:4ff:ff00::add:1
    gateway fe80::1

auto enp7s0
iface enp7s0 inet static
    address 10.100.0.3
    netmask 255.255.255.255
    mtu 1400
    pointopoint 10.100.0.1
    post-up ip route add 10.100.0.0/16 via 10.100.0.1 dev enp7s0

Network config of the PVE VM (vmbr1):
Code:
auto lo
iface lo inet loopback

auto ens18
iface ens18 inet static
    address 10.100.1.103/32
    gateway 10.100.1.1
    mtu 1400

Any help would be appreciated.
 
Last edited:
We sometimes had the issue that the
Code:
mtu 1400
did not work consistently. (not pve specific - also on vanilla debian)
Re-check if the MTU is actually set in the running-config:
Code:
ip a

We have added another up-hook to make sure:
Code:
up ip link set {{ nic.device }} mtu 1400 || true

It can be also useful to test pings with a specific size: (as already mentioned by others)
Code:
ping <IP> -s 1372
(28+1372=1400)
 
Last edited:
In my case I resolved it yesterday by also setting manually the MTU size of the VM's in Hetzner Cloud to 1400 (instead of the 1450 assigned by DHCP) by adding this to the /etc/dhcp/dhclient.conf file:

Code:
interface "ens10" {
    supersede interface-mtu 1400;
}

ens10 is the interface pointing to the internal network. Since then everything is working fine with my Promox VE and Proxmox VE VM's config above.
 
PVE host:
Code:
# ip a
5: vmbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc noqueue state UP group default qlen 1000
    link/ether 9c:6b:00:85:0c:39 brd ff:ff:ff:ff:ff:ff
    inet 10.100.1.2/24 scope global vmbr1
       valid_lft forever preferred_lft forever
    inet6 fe80::9e6b:ff:fe85:c39/64 scope link
       valid_lft forever preferred_lft forever

# ip r
default via 100.200.32.193 dev vmbr0 proto kernel onlink 
10.100.0.0/16 via 10.100.1.1 dev vmbr1 
10.100.1.0/24 dev vmbr1 proto kernel scope link src 10.100.1.2

Cloud VM:
Code:
# ip a
3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1400 qdisc fq_codel state UP group default qlen 1000
    link/ether 86:00:00:db:08:8a brd ff:ff:ff:ff:ff:ff
    inet 10.100.0.3 peer 10.100.0.1/32 brd 10.100.0.3 scope global enp7s0
       valid_lft forever preferred_lft forever
    inet6 fe80::8400:ff:fedb:88a/64 scope link 
       valid_lft forever preferred_lft forever

$ ip r
default via 172.31.1.1 dev eth0 
10.100.0.0/16 via 10.100.0.1 dev enp7s0 
10.100.0.1 dev enp7s0 proto kernel scope link src 10.100.0.3 
172.31.1.1 dev eth0 scope link

That looks the same when the pings are working and when they are not working. But there isn't a MTU for the route. May that be a problem?

Ping with "-s 1372" doesn't change anything, it works or does not work the same way.

It's not only the ping though, the PVE host is also not reachable on the private IP via SSH or HTTPS when the ping does not work.
 
In my case I resolved it yesterday by also setting manually the MTU size of the VM's in Hetzner Cloud to 1400 (instead of the 1450 assigned by DHCP) by adding this to the /etc/dhcp/dhclient.conf file:

Code:
interface "ens10" {
    supersede interface-mtu 1400;
}

ens10 is the interface pointing to the internal network. Since then everything is working fine with my Promox VE and Proxmox VE VM's config above.
I did the same today, didn't change anything for me. I just did it differently, by removing the auto-configuration (https://docs.hetzner.com/cloud/networks/server-configuration/), it's in the configuration above, enp7s0 is my interface.
 
Last edited:
Not yet, I still have to double check mrmanuel's configuration, which is very similar to mine.

For now I've added a cronjob that sends one ping every minute from the PVE host to the cloud VM. That's pretty ugly but keeps the connection "alive" most of the time.