PCI passthrough to pfSense hangs after network error

yobyot

Member
Jan 19, 2023
5
4
8
I'm hoping someone might have a clue to help me diagnose an issue that's a bit hard to explain.

I'm running 8.0.4 on a Protectli Vault with Intel I225-V Ethernet adapters. pfSense CE 2.7 is running in a VM with one of the I225-Vs passed through to it for pfSense's WAN uplink port. pfSense's other interfaces are Linux bridges and not an issue for this question.

As a Proxmox newbie, I configured a backup job that uses the "stop" option (for maximum consistency) followed by a post-backup script that used rclone to upload the backup VM file to a cloud storage provider.

This means that the pfSense router -- and therefore the uplink -- was down at the exact moment the post-backup script attempted to use the uplink to store the backup offsite. I could've solved the problem with a sleep command to allow pfSense to restart but, for this question, assume it was attempting to restart at the same time Proxmox invoked the post-backup script.

It's a dumb configuration -- and I've since shifted to "snapshot" for the backup option which leaves pfSense running and able therefore to upload the post-backup script's connection to the cloud.

But what's really strange is that to restart pfSense after this misconfiguration, I have to reboot the physical appliance. If I just attempt to restart pfSense via the Proxmox GUI, I get an error that in the pfSense console that indicates it cannot access the passed-through PCI device. As a result, pfSense just stops.

My question after all that is, why would a failed attempt by a post-backup script that uses a router whose uplink is a PCI device cause Proxmox itself to not be able to reset the device when a VM accesses it? If it is really passed-through, shouldn't a reboot of the VM's OS reset the device? IOW, why would a failed upload attempt (the router is down) cause a fault in the real hardware Proxmox has assigned directly to a VM?


Thanks for any suggestions.
 
Sometimes PCI(e) devices not reset properly and can only be used once per Proxmox host reboot. Sometimes during shutdown drivers leave a PCI(e) device in a state that a driver (sometimes the exact same) cannot make the PCI(e) device recover from. The driver does not expect to see the device again without a physical reboot. Passthrough is not a scenario that most manufacturers design or test for.
Work-arounds are different reset methods, with or without ROM-file, with or without ROM-Bar or sometime something special in the driver. Sometimes starting a different VM with a different OS or driver with passthrough of the same device works. Passthrough is hit or miss and what you describe is not uncommon.
I don't know a work-around for your specific network device or for the pfSense driver, sorry.
 
  • Like
Reactions: yobyot
Sometimes PCI(e) devices not reset properly and can only be used once per Proxmox host reboot. Sometimes during shutdown drivers leave a PCI(e) device in a state that a driver (sometimes the exact same) cannot make the PCI(e) device recover from. The driver does not expect to see the device again without a physical reboot. Passthrough is not a scenario that most manufacturers design or test for.
Work-arounds are different reset methods, with or without ROM-file, with or without ROM-Bar or sometime something special in the driver. Sometimes starting a different VM with a different OS or driver with passthrough of the same device works. Passthrough is hit or miss and what you describe is not uncommon.
I don't know a work-around for your specific network device or for the pfSense driver, sorry.
Thanks!

I'm OG (first virtualized system: VM/370). I guess I am also a purist in the sense that a hypervisor should always present an "idealized" environment to the guest on the latter's startup. IOW, hardware is always in a known state.

Obviously, PCI passthrough kinda breaks this objective. But now that I know Proxmox may not be able to reset passed-through hardware, I'll avoid oddball scripts like the one I tried at first.