Connectivity issues - PVE w/SR-IOV and OpenVSwitch

FrogOnABike

New Member
Jul 10, 2024
1
0
1
Have been trying to work through this one for some time now but slowly running out of ideas.

I can probably provide more details next week as it's last thing on a Friday evening and my brain is almost cooked but to summarise.

We have an 8-node ProxMox cluster and have been having some odd connectivity issues with one node that hosts our pfSense.
It's running with PCI-Passthrough, to share one of the on-board 1G connections (WAN) and then a couple of SR-IOV VFs from an Intel X710 fibre card - these are then added to a LAGG in pfSense.

The PFs on the X710 are used in an OVS-Bond, connected to a Bridge to provide both host comms to the rest of the cluster. This pair of cards are configured as an LACP Port-Channel on the switch.

In general the internet connection running through the pfSense works great - except when trying to connect from the host machine. Get lots of DNS issues, timeouts and slow connection, pings generally end up losing about 50% of packets.

This host is on 7.4.something owing to it being rather iffy to update and hosting the internet connection for our DC machines, and running pfSense 2.7.0.

To try and give some resilience, and possibly troubleshoot this weird issue I'd started to setup a newer 2.7.2 pfSense VM on the other roughly identical machine we have in our cluster to provide a HA pair of pfSenses (a colleague, since been made redundant started this YEARS ago, but never really finished it) but have been facing similar issues.
With this one I had changed the connection to cluster a bit to follow the suggestions in the ProxMox OpenVSwitch docs and instead of assigning an IP to the Bridge, used an IntPort for the local host.

The pfSense on there has PCI Passthrough setup to connect to:
2x1Gbit Eth
2x40Gbps VFs from the Intel X710 created from SR-IOV

With the 2 SR-IOV setup as a Roundrobin LAGG (The same setup on the current pfSense) I get zero connectivity from either the host or a VM on the local host (I have a basic Ubuntu LiveCD one on there for diags and config) to New-pfSense.

The host and local VM can connect just fine to the OG-pfSense on the other host.

That other host + other VMs in the cluster can ping/connect to the New-pfSense and Old-pfSense....

So the issue appears to just be the hosts connectivity to local pfSense in each case and I'm a bit lost on what's causing it between SR-IOV and OpenVSwitch, so thought I'd have a check in here to see if anyone has tried this setup and had similar issues?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!