pve host goes temporarily offline but VMs don't

thaf · Jul 6, 2023

I'm seeing an odd scenario play out on a test pve cluster. The three pve servers have about 50% packet loss, dropping off the net for minutes at a time, but all VMs on them remain available at all times with 0% packet loss.

Here's the network config from one of them:

Code:

auto lo
iface lo inet loopback

iface enp5s0f0 inet manual

iface eno1 inet manual

iface eno2 inet manual

iface eno3 inet manual

iface eno4 inet manual

iface enp5s0f1 inet manual

auto vmbr0
iface vmbr0 inet static
    address 10.194.128.211/21
    gateway 10.194.135.254
    bridge-ports enp5s0f0
    bridge-stp off
    bridge-fd 0

auto vlan50
iface vlan50 inet static
    address 10.202.74.241/24
    vlan-raw-device enp5s0f1

The logs are entirely devoid of anything resembling errors on both the server side and the switch side, a ten minute tcpdump showed nothing out of the ordinary (aside from no traffic to/from the bridge IP for a while), and an almost similar (no tagged VLAN on the corosync interface) setup behaves pretty much as I would expect this to do as well. The NICs are Intel X520 dual port.

I've tried with both 7.4 and 8.0 with no change.

Does anyone have a suggestion as to where I should look for some sort of next debug step?

thaf · Feb 12, 2024

Somewhat disturbingly, this problem went away with no changes performed. All it took was waiting about half a year...

Search

Search

pve host goes temporarily offline but VMs don't

thaf

New Member

thaf

New Member