Weird one for you: Host/LXC's cannot communicate with SR-IOV connected VM on same host, no prob external to host

pipe2null

New Member
Feb 26, 2023
3
0
1
I am trying to get several net appliances working on the same 4 core mini x86_64 box running PVE 8.0.3, so need to offload all feasible network overhead to hardware, and use LXCs instead of VMs whenever possible, but having problem with network communication between host/LXC and VMs.

I have pfSense in a VM using a pcie passthrough of a SR-IOV VF (Trunk). This pfSense instance has no problem serving up IPs to EXTERNAL-TO-HOST devices for any of the trunked VLANs, no problem with web admin, etc. But the host and any local LXCs cannot get DHCP IPs, and even if setting static IPs, NO communication to the pfSense VM works via VM's VF from the host. But host and local LXCs no prob using EXTERNAL-TO-HOST dhcp servers.

Quick rundown of system:
- BIOS and connectX card support SR-IOV. I have SR-IOV working just fine.
- All VMs get their own dedicated VFs. LXCs have to use host bridge since passing physical NIC (or host-owned VF) to LXC is not currently supported.
- Both 10G SFP's plus another 2.5G port are trunks going to different switches, using linux host bridge to connect trunked vlans up:
auto vmbr0000
iface vmbr0000 inet manual
bridge-ports enp3s0 enp5s0 enp5s0d1
bridge-stp off
bridge-fd 0
Note: I have played with making vmbr0000 "vlan aware" but I end up getting errors from the connectX configuration. As configured above, vlans are getting routed OK between external switches via trunks via vmbr0000 bridge no problem. Also, host static ips using vlans on that bridge communicate with EXTERNAL-TO-HOST devices no problem, ie for web management. Also, host and/or LXC DHCP ips obtained from EXTERNAL-TO-HOST dhcp servers work no problem.

BUT, the host and any LXCs running on the host cannot communicate with VM via the VM's SR-IOV VF within the same host. VMs can communicate with EXTERNAL-TO-HOST devices no problem. Different VMs with VFs can communicate to other VMs via VFs no problem. VMs using a tap on a host bridge will NOT communicate to other VMs that only use SR-IOV even though all are on the same VLAN also all using the same shared connectX card. The problem I am seeing appears to be related to the host's network stack and/or SR-IOV use. Basically, the host bridge appears to be eating any traffic between the host/LXC and any SR-IOV-connected VMs sharing the same connectX card, but VM to VM via shared connectX card works as expected.

I am not an expert on this, but it seems that communication at the lowest level of the network stack (SR-IOV/hardware) has no issue with communication between different VF instances, but the linux bridge craps out for any communication with other VFs on same card even though pushing traffic out the physical port is no problem. I have tried MANY different configs, and can't get it to work. YES, obviously I can add a bunch of additional bridges and taps and what not, or just use VMs only, but I need to minimize any/all overhead anywhere I can, only have 4 cpu cores with no hyperthreading.


The point:
Is there a way to get the host bridge to work properly and get traffic forwarded to VMs via their VFs? I am trying to find out the "correct" way to fix this in the most performant way, not hacky workarounds that I already know how to do. Only have 4 cpu cores to use for all VMs and LXCs, so... yea. And of course, I may have screwed up the config somewhere, so hopefully there is a simple fix for this?

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!