Proxmox crashes/best practice for linux bridges

Losmanikos

New Member
Oct 24, 2024
5
0
1
Hi guys,

few months back I bought a miniPC with N305 CPU and 6x2.5GbE ports.
Idea was to replace my existing router so to have Proxmox as hypervisor and two VMs - pfSense and Media VM (Debian with Plex, NFS from USB HDDs etc.)

I started with three linux bridges - one for WAN (tied to my WAN port), second tied to port leading to my PC (so I could access both VMs and PVE management) and third for Media VM (not bound to any physical port).

It worked fine - except it kept crashing every once in a while (sometimes in matter of days, sometimes only hours). Whole miniPC just rebooted.
Based on logs it was business as usual and suddenly reboot. I tried troubleshooting via all kinds of logs (mostly with help of GPT, since I'm not that savvy with Linux).
Nothing helped. So I tried removing all the potentially dangerous factors like iGPU passthrough, USB drive passtrough, removed Media VM, so only pfSense was up and running. Didn't help.

My BIOS was up to date and so were PVE and VMs. So I thought It's got to be HW issue. I run multiple mem tests, SSD checks, stress tests for both CPU, iGPU, IO - nothing, all good.
PfSense had 2 cores, 4GB RAM, 50GB storage - which was already overkill (out of 8 cores, 16GB, 1TB total)
So that left pretty much only PSU issue. Before RMA, I decided to install pfSense on bare metal without PVE. So no virtual bridges. I restored the backup, just replaced bridges with actual ports and what do you know? Works like a charm. No crash so far (there's already more than a month uptime).

Stability is nice, however I didn't bought this miniPC to be just a "dumb" router.

I plan to re-install Proxmox, but I'm not sure how to tackle the virtual bridges as they were most likely the cause of crashes.
  1. If I pass through WAN port to pfSense, will I be able to perform updates in Proxmox management?
  2. If I pass through my PC port to pfSense, how will I access PVE management?
It's quite confusing, since official documentation recommends to use bridges --> https://docs.netgate.com/pfsense/en/latest/recipes/virtualize-proxmox-ve.html

Does anyone have experience with this?

Appreciate any advice
 
Hello. The problem could be located to a kernel crash using bridges. But as seen many times in this forum with small form factor units, it rather comes from the processor. Can you try to :
- re-enable bridges and reproduce the problem
- put a fan on your miniPC in order to cool it well
- tell us it the problem is still the same ?
 
I had such an effect on a similar System with a Samsung 980 SSD m.2 500GB. The SSD had a faulty firmware, which told the contraoller that there aus 82°C, and then the SSD cause a crash. Of course there had been no 82°C, only ~50°C. After Firmware Update everything was fine.
 
Oh yeah, I forgot to mention that - I have a fan installed and thermals won't go over 53°C for CPU (back then with both VMs and decent utilization)
Not really sure if there's anything wrong with SSD FW, I'm using WD Blue SN5000 1TB.

But bottom line is that there's nothing wrong with my setup and it should work, right?
 
Ok, so it happened again.
Fresh Proxmox installation + updates and modified GUI so I could see thermals on CPU and NVMe.
1741452255188.png

1741452418565.png
Again there's only one VM with PFsense.

1741452453523.png

There's almost no utilization. Thermals are more less the same whole day and it crashed anyway.

pfSense shows nothing unusual (identical setup - bridges worked like a charm on bare metal)
As for Proxmox I've been through:
journalctl -b -1
dmesg | less
dmesg | grep -i vfio

Everything looked ok to me. Are there any other places I could check to find the reason behind all of this?
 
What is the output of "dmesg | fgrep -i microcode"? Or, to put it another way: Did you install the package intel-microcode on your Proxmox host?

I also found these devices to be picky with RAM.
 
Last edited:
root@pve:~# dmesg | fgrep -i microcode
[ 0.092878] Register File Data Sampling: Vulnerable: No microcode
[ 0.734426] microcode: Current revision: 0x0000000e

As I said, it's a fresh install with updates.

As for RAM, my DIMM is supported, memtest and stress tests did not fail + it was ok on bare metal.
 
I believe that version - which comes from your BIOS - is hopelessly outdated. I would try "apt install intel-microcode iucode-tool". After that, you should re-examine the microcode version. It should be updated to 0x0000001c for N100. I am not 100% sure, because I use the CPU bare-metal with OPNsense and under FreeBSD, the newest version is called 0x1c. Also, I only assume that the 8-core N305 has the same microcode revision as the 4-core variant N100.

FWIW, if the microcode had been updated by the Debian package, you should see something to the extent of:

[ 1.306760] microcode: Current revision: 0x0000000e
[ 1.306866] microcode: Updated early from: 0x0000001c

I had another machine with an Intel 12th gen CPU that only became stable after microcode updates were applied. The one distributed with the BIOS froze after 1-2 days.
 
Last edited:
Last edited:
  • Like
Reactions: Losmanikos