NIC passthrough to different VM cause the crash of host?

fgg1991

New Member
Mar 25, 2022
4
0
1
I have a 4 port 8125b NIC, and I was trying to passthrough 3 of them to openwrt, 1 to synology.

In /etc/defaut/grub file, I use command:

GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on pcie_acs_override=downstream,multifunction"



I read the post in PVE forum, and most of the relevant topics point to IOMMU problem, but I checked my PVE, and there is no IOMMU problem, all 4 have different IOMMU group.



r/Proxmox - NIC passthrough to different VM cause the crash of host?
NIC



r/Proxmox - NIC passthrough to different VM cause the crash of host?
IOMMU group

r/Proxmox - NIC passthrough to different VM cause the crash of host?
IOMMU group

And this is my openwrt VM hardware setting:

r/Proxmox - NIC passthrough to different VM cause the crash of host?
And this is the synology vm setting. It could work with only the e1000e bridge network

r/Proxmox - NIC passthrough to different VM cause the crash of host?


Any one know the reason?
 
Your IOMMU groups are not reliable because you break up the actual groups by using pcie_acs_override. Maybe the NIC is not actually in its own group and your system just cannot handle it being passed through while the other devices (of the real group) are necessary for the Proxmox host?
Maybe you can show the actual groups, using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done without using pcie_acs_override?
If you have a AMD Ryzen motherboard then you don't have that many PCIe lanes from the CPU and most PCIe slots are in one big chipset group (except for the X570). Maybe you can provide a link or at least tell us which motherboard exactly that you are using? Then I might be able to tell you which PCIe slots can be used for passthrough.

PS: You don't need amd_iommu=on because it is on by default.
 
Your IOMMU groups are not reliable because you break up the actual groups by using pcie_acs_override. Maybe the NIC is not actually in its own group and your system just cannot handle it being passed through while the other devices (of the real group) are necessary for the Proxmox host?
Maybe you can show the actual groups, using for d in /sys/kernel/iommu_groups/*/devices/*; do n=${d#*/iommu_groups/*}; n=${n%%/*}; printf 'IOMMU group %s ' "$n"; lspci -nns "${d##*/}"; done without using pcie_acs_override?
If you have a AMD Ryzen motherboard then you don't have that many PCIe lanes from the CPU and most PCIe slots are in one big chipset group (except for the X570). Maybe you can provide a link or at least tell us which motherboard exactly that you are using? Then I might be able to tell you which PCIe slots can be used for passthrough.

PS: You don't need amd_iommu=on because it is on by default.
Thanks for reply!

I delete the pcie_acs_override command in grub and reboot the pve, then I run your code, here is the output:
Untitled.png


Would it help to evaluate the problem?
 
The four NICs (03:00.0, 04:00.0, 05:00.0 and 06:00.0) are in separate groups (additional PCI bridges are usually not an issue), so that appear to not be the problem. Only passthrough of 0c:00.0 (the on-board NIC) would give problems like the Proxmox host losing the SATA controller and some USB controllers, which tends to crash the system.
Is there anything at the end of journalctl -b -1. after a crash and reboot. that could give a hint about what cause the crash? Or any messages on the host console (connected to a physical display) just before the crash?

PS: Did you really need to type the command manually and take a screenshot? Is there no SSH connection to the Proxmox host?
 
The four NICs (03:00.0, 04:00.0, 05:00.0 and 06:00.0) are in separate groups (additional PCI bridges are usually not an issue), so that appear to not be the problem. Only passthrough of 0c:00.0 (the on-board NIC) would give problems like the Proxmox host losing the SATA controller and some USB controllers, which tends to crash the system.
Is there anything at the end of journalctl -b -1. after a crash and reboot. that could give a hint about what cause the crash? Or any messages on the host console (connected to a physical display) just before the crash?

PS: Did you really need to type the command manually and take a screenshot? Is there no SSH connection to the Proxmox host?
lol, actually these pics are all from ssh client, not phone camera.

I connect the pve to monitor, and nothing shown on display during crash & reboot, also no log in kern.log during this piriod.

Totally no idea...
 
I'm sorry but then I'm also clueless. Usually when Proxmox freezes/crashes on passthrough, it is because a device in a IOMMU group with the network and SATA controller is being passed and Proxmox cannot continue writing logs and cannot be reached via the network. That results in no logs and no information, just like in your case.

A quick search on this forum shows people using 8125b's with Proxmox and that it requires a newer kernel than OpenWRT currently uses (I think). I did not find any passthrough issues reported and usually if the card itself does not work properly with passthrough it becomes unusable inside the VM and does not crash the host.

Maybe try passing all four ports to the same VM? Maybe the hardware gives a strange error on the PCIe bus that causes the system to crash?
A work-around might be to make virtual bridges for each port and connect them via VirtIO network devices to the VM?
 
I'm sorry but then I'm also clueless. Usually when Proxmox freezes/crashes on passthrough, it is because a device in a IOMMU group with the network and SATA controller is being passed and Proxmox cannot continue writing logs and cannot be reached via the network. That results in no logs and no information, just like in your case.

A quick search on this forum shows people using 8125b's with Proxmox and that it requires a newer kernel than OpenWRT currently uses (I think). I did not find any passthrough issues reported and usually if the card itself does not work properly with passthrough it becomes unusable inside the VM and does not crash the host.

Maybe try passing all four ports to the same VM? Maybe the hardware gives a strange error on the PCIe bus that causes the system to crash?
A work-around might be to make virtual bridges for each port and connect them via VirtIO network devices to the VM?
I tried two solution,

pass 1 to synology, the rest virtio to openwrt: worked
pass 3 to openwrt, 1 with e1000e to synology: worked, but synology slow.

for now I choose the first one, both synology passthrough port and desktop connect to openwrt virtio ports, using openwrt as a switch.

The shortage is that with iperf3, connection between desktop and openwrt is OK, about 2.3Gb/s in both direction

but connection between synology and openwrt is abnormal, 1.6Gb/s and 2.3Gb/s, no idea why only one direction can achieve target speed.

connection between synology and desktop is even worse, only 0.8Gb/s and 1.2Gb/s.....although ok for 3.5 5400rpm hdd, but not satisfiying.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!