Proxmox 7.2-11 - Apparent Bridge Outbound Routing Issues

StormFury

Active Member
Jul 17, 2018
3
0
41
42
Good afternoon everyone.

I have a couple of HPE C7000 blade servers that are all configured identical. Same HDDs, same Flexloms, same Mezzanine cards, same memory, and same processors, and the same network setups:

vmrb0 - eno1 (CIDR 10.XXX.XXX.41-48 OR .21-28) (connected to switch vlan 110 to router to public inet)
vmbr1 - eno2 (connected to switch vlan 110 to router to public inet)
vmbr2 - ens1f0 (connected to switch vlan 120 to public inet)
vmbr3 - ens1f1 (connected to switch vlan 120 to public inet)

## I have attached the /etc/network/interfaces file for c7k002-n01 and c7k002-n03 as these are the two servers below ##

I can access them all via direct IP (10.XXX.XXX.41-.48 on c7k002 and 10.XXX.XXX.21-.28 on c7k001)
They can all ping each other.
They are on (2) different clusters (c7k001 and c7k002)

Here is my problem:

The VM (Specifications listed below) can be running on c7k002-n01 and it can ping the host (10.XXX.XXX.41), it can ping other hosts (10.XXX.XXX.21-28, 42-48), it can ping the gateway (10.XXX.0.1), and it can ping ANYWHERE on the internet. It can also be pinging by other hosts and VMs in the network.

When I migrate this machine, or restore this machine with unique MAC and IP addressing) to a few of my nodes (c7k002-n03, n05, n06, and a couple of others) it can ping the host (when i migrate it to n03 i can ping 10.XXX.XXX.43 (host IP)); HOWEVER, it cannot ping beyond the host. So it cannot ping other hosts, other vms on the network, the gateway, or the internet. That said IT CAN be pinged by the host, other hosts, and other vms on the network. When I disable the ens18 adapter and add a public IP to ens19 and restart the network on n03, 05, 06, et cetera) i do pull the public ip address and i can get to the internet. I have even installed a fresh instance of OracleLinux 8.6 and get the same issues.

When it is on those nodes that it cannot ping beyond the bridge i get the following response to ping:
Destination Host Unreachable.

When it is on those nodes that it cannot ping beyond the bridge i get the following response to traceroute to the gateway:
traceroute to 10.XXX.0.1 (10.XXX.0.1), 30 hops max, 60 byte packets
1 testInstall (10.XXX.101.101) 3088.268 ms !H 3088.118 ms !H 3088.95 ms !H


To recap: it can be on n01 with ens18 configured with an internal Ip of 10.XXX.101.101 and ping all hosts, vms, gateway, and internet. I migrate it to n03 and it can ONLY ping the host, but it can be pinged by other hosts and vms. I can then migrate it to n04 and it works properly. I can then migrate to n05 or 6 and it stops working.

My thoughts are that if it were a bad network card I would not be able to access the system at all through the vmbr0 ip address.

Thank you in advance for any assistance that you might be able to provide.

Storm

## VM SYSTEM DETAILS FOLLOWING ##

Here is the Virtual Machine i am dealing with, though this also happens with windows 10, debian, and ubuntu images:
OS: Oracle Linux Server 8.6
Memory: 8.00 GB
Processors: 8 (4 sockets, 2 cores) (default kvm64 (Which doesn't work with Oracle Linux 9.0 or 9.1))
BIOS: OVMF (UEFI)
Display: Default
Machine: Default (i440fx)
SCSI Controller: VirtIO SCSI
Hard Disk (sata0: local-lvm:vm-XXXX-disk-1,format=raw,size=50G
Network Device (net0); e1000=MAC,bridge=vmbr0
Network Device (net1); e1000=MAC,bridge=vmbr2
EFI Disk: local-lvm:vm-XXXX-disk-0,format=raw,size=128k

## I have attached the configuration for /etc/sysconfig/network-scripts/ifcfg-ens18 to this thread ##

Results of "# ip a"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens18 <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state up group default qlen 1000
link/ether <MAC REDACTED> brd ff:ff:ff:ff:ff:ff
inet 10.XXX.101.101/16 brd 10.XXX.255.255 scope global noprefixroute ens18
valid_lft forever preferred_lft forever
4: ens19 <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_code1 state up group default qlen 1000
link/ether <MAC REDACTED> brd ff:ff:ff:ff:ff:ff
 

Attachments

  • c7k002-n01.txt
    1.2 KB · Views: 1
  • c7k002-n03.txt
    1.2 KB · Views: 0
  • vm_ifcfg-ens18.txt
    360 bytes · Views: 0
Last edited:
I am going to go ahead and update this:

It might be an issue with FlexLOM adapters. I am going to be going down to replace them soon and will update back once I have.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!