Unable to forward Onboard NIC to OPNSense

ToasterPC

New Member
May 12, 2023
11
2
3
Hello there!

I just started using Proxmox VE 7.4 (6.2) as of last week with a Lenovo ThinkCentre M900 Tiny I got off Amazon Renewed, mainly for the purpose of running it as an OPNSense router and as a host for Home Assistant OS, plus a few Docker containers.

So far, I've been able to install both OPNSense and HASS OS to the device, and HASS seems to run perfectly, though sadly I cannot say the same for OPNSense, given I've been dealing with constant reboots and, as of now, no Internet connectivity.
The issue seems to pop up when trying to forward my Intel NIC, an I219-LM with the following specs per lspci -vvv:

00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (2) I219-LM (rev 31) DeviceName: Onboard LAN Subsystem: Lenovo Ethernet Connection (2) I219-LM Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Interrupt: pin A routed to IRQ 16 IOMMU group: 6 Region 0: Memory at df000000 (32-bit, non-prefetchable) [disabled] [size=128K] Capabilities: [c8] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=1 PME- Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+ Address: 0000000000000000 Data: 0000 Capabilities: [e0] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR+ AFStatus: TP+ Kernel driver in use: vfio-pci Kernel modules: e1000e

Also under the 0000:00:1f device are both the Onboard SMBus controller and the HD Audio controller, though without any modification the NIC itself is in its own IOMMU group, distinct from the other pieces.


So far, when I try to passthrough only the NIC itself, I am lucky after the Host's boot sequence and about 10 minutes of the VM's, given that PPPoE dialing completes without issue and I get both an IP address and Internet connectivity, but somewhere between 2-15 minutes depending on the VM's mood, I'll get vfio-pci 0000:00:1f.6: timed out waiting for pending transaction; performing AF function level reset anyway on the syslog of the host, and the VM will get into a reboot loop with no way of restoring Internet connectivity and/or stability to the system.

If I instead try to passthrough the entire 0000:00:1f device, I'll get the same behavior, but this time the following errors in the host's syslog:

May 12 20:13:25 pve QEMU[1237]: kvm: vfio: Cannot reset device 0000:00:1f.4, no available reset mechanism. May 12 20:13:25 pve QEMU[1237]: kvm: vfio: Cannot reset device 0000:00:1f.3, no available reset mechanism. May 12 20:13:25 pve QEMU[1237]: kvm: vfio: Cannot reset device 0000:00:1f.2, no available reset mechanism. May 12 20:13:25 pve QEMU[1237]: kvm: vfio: Cannot reset device 0000:00:1f.0, no available reset mechanism.

I've already tried blacklisting the NIC's driver from loading, upgrading and booting with the opt-in 6.2 kernel, and even disabling both all of the CPU's C-States and any PCIe power-saving measures like ASPM, but for some reason I'm completely unable to get the NIC to behave under these circumstances.

I'd really like to get the Onboard NIC to work in this fashion, given that if the problem lies with OPNSense itself, I'd probably be able to move over to OpenWRT and enjoy the CAKE SQM instead of OPNSense's, but I'm pretty sure I'll get better results either way if the entire NIC is being controlled by the VM instead of using a bridge, which besides seems to not work currently with OPNSense when it tries to dial the PPPoE connection to my ISP.

In case none of this works, my last resort would be another NIC I acquired as a precaution (an Intel I225 running over the device's NVMe port), but I'd really like to avoid this as there is not enough clearance between the SATA port and the NVMe port to have both the drive installed inside and the extension cables running out of the expansion card, which would mean having to drill out the casing to leave the Ethernet port and the 2.5" bay connected externally, but I'm pretty sure that would void my warranty (as meaningless as that would be).

Is there anything I can do to remedy this issue?

Thanks for the help!
 
Hello there!

Just bumping the thread since no one has answered yet. Hopefully someone has an idea that could lend me a hand.
 
can you post the vm config, and the dmesg from the host?

AFAIR, sometimes it did help to disable hardware offloading in opsense for some nics: https://docs.opnsense.org/manual/interfaces_settings.html

a question is, why do you want to passthrough the nic in the first place? you could also try with creating a seperate bridge just for that interface (without configuring an ip on the host) and use that bridge for opnsense with a virutal nic (e.g. virtio)
 
can you post the vm config, and the dmesg from the host?

AFAIR, sometimes it did help to disable hardware offloading in opsense for some nics: https://docs.opnsense.org/manual/interfaces_settings.html

a question is, why do you want to passthrough the nic in the first place? you could also try with creating a seperate bridge just for that interface (without configuring an ip on the host) and use that bridge for opnsense with a virutal nic (e.g. virtio)
Sure, no problem.

The config is the following:

cat /etc/pve/qemu-server/100.conf agent: 1 balloon: 0 boot: order=virtio0 cores: 4 cpu: host cpuunits: 200 machine: q35 memory: 4096 meta: creation-qemu=7.2.0,ctime=1684289661 name: OPNSense net0: virtio=F2:23:A3:F5:12:1E,bridge=vmbr0 net1: virtio=AE:03:E2:08:89:C3,bridge=vmbr1 numa: 0 onboot: 1 ostype: other scsihw: virtio-scsi-pci smbios1: uuid=8c83a5dd-b211-40a5-9ab7-7072872ca924 sockets: 1 startup: order=1,up=0,down=0 virtio0: local-lvm:vm-100-disk-0,iothread=1,size=64G vmgenid: 48d48fa1-22f6-4112-94bd-6a39688a02fb

And the dmesg is as follows:

On Pastebin, since it did not fit here

As per why I was trying to pass the NIC through? I had several issues having the PADI respond at all with my ISP's PPPoE dialing service (FTTH Alcatel router with no VLAN in bridge mode). It would take about 15 minutes to have an ACK of any sort and usually before getting there the VM would soft-reboot.

Per some advice I got, I reinstalled Proxmox from scratch and saw the same behavior with pfSense and OpenWRT, but without the VM crashing on me this time. I then took a nap and when I woke up the link had come online, so I redid everything from scratch with OPNSense and it took abouut 10 minutes to negotiate the link, but I'm still not sure what could have been causing the issue, given exchanging the NICs and using different cables provided the same result, but at least I've got Internet now.

Perhaps I should mark the issue as solved then, but I'm not certain if the NIC taking upwards of 10 minutes to negotiate something a Raspberry Pi did instantly should be cause for concern.


In any case, thanks for the help!
 
The config is the following:
so you don't use passthrough at the moment ? (at least in the config it's not there)

did you try to disable the offloading as i mentioned?

in doubt, i'd try to use tcpdump on the host to see what gets sent/received from/to the vm and compare with what you'd expect

as to why the pppoe connection takes long to start, honestly not much experience with that, so i can't really help there
 
so you don't use passthrough at the moment ? (at least in the config it's not there)

did you try to disable the offloading as i mentioned?

in doubt, i'd try to use tcpdump on the host to see what gets sent/received from/to the vm and compare with what you'd expect

as to why the pppoe connection takes long to start, honestly not much experience with that, so i can't really help there
Not really at the moment, I'm honestly happy I get Internet at least now though, and after further reading it seems it wouldn't have made a difference in speed or latency for my use case anyway, so no harm no foul.

Regarding offloading, I forgot to mention: for some reason it was turned off by default on my OPNSense installation. So I guess it wasn't the cause of the issue.

Regarding the connection timeouts don't worry, I hope at least if someone else runs into the same issues this thread can point them in the right direction, so from that perspective I think the problem was worth it in the end.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!