Changing Hardware - leads to configuration and access disaster

Nov 7, 2024
2
0
1
Hi, I'm kind of a beginner to Proxmox, but I have some basic knowledge about hypervisors and such.

Maybe I'm stupid, but I also read the following thread, marked as solved, which gives me a very bad "feeling" regarding proxmox' behavior on system faults, stability, and hardware changes.
https://forum.proxmox.com/threads/proxmox-cannot-access-lan-after-hardware-change-help.86442/

It's about how changing hardware leads to a non-accessible system.
I can also confirm this behavior for my side multiple times.

I don't know if I'm doing anything wrong; maybe I'm missing some information, but in regards to the severity to this problem, I'm also shocked there is actually no real additional information about this behavior.
My experience regarding hardware change on Proxmox can't be legit or intentional; if so, then I'm shocked and disillusioned about the design of Proxmox.
I know, this problem —renumbered PCIe devices— maybe is inherited by Debian, but even this can't be the answer or solution for an hypervisor!

The main problem:
How can it be possible that, for example, an added or removed piece of hardware, like a GPU oder NIC leads to an instant disaster, like I cannot access the server anymore via network because the network stack is renumbered or whatever?
So changing hardware for whatever good or bad/forced reason destroys an essential proxmox configuration!
I can't even believe this is intentional and true for such a delicate, centric piece of infrastructure like a hypervisor.
So it's intentional that I can't even access my server via web-gui anymore when I change the hardware, to maybe reconfigure the hardware?


Another example:
What if I change some hardware in an headless system? It wouldn't be possible to reconfigure it anymore; there would be no console.


Am I missing something? Or is this really "selled as" an intentional solution/design?
I'm also shocked that I didn't find more information about this realy problematic behavior.

Solution?:
If some PCI information changes after hardware change, why there is at least no automatic remapping/reconfiguration for essential hardware like the main NIC/management network?
 
Last edited:
Hi @bbgeek17,
Thanks for your information; that led me in the right direction.
Sadly, the information/documentation isn't very clear about the problem that "hardware changes" are included to this kind of still problematic and intricate behaviour.

It says: "... This way, you can avoid naming changes due to kernel updates, driver updates, or newer versions of the naming scheme."
"Hardware change" is not mentioned clearly there.
After this, now I know "updates or newer versions of the naming scheme" are kind of included into this.

So I guess I have to adapt, but I still can't believe why not every piece of hardware is pinned by default after first sight by debian/proxmox to some default, and only if you want or are forced to change it, because of a failure, you change or reorder it manually.
What if there is a NIC failure to the management NIC and you pin-matched it to the MAC address? Everything else would change too if you just had to replace the NIC. It just makes no sense, do you have to define a probalby never used spare/fallback for this?

Solution?:
If everything is pinned by default on first sight, the system automatically sees the diff after a failure and can make a guess for the new NIC if the previous Management NIC is missing.
For every other NIC it makes sense to pin it by default too. I guess this current behavior could be a real security issue too.

However, thanks for the fast answer.
Or maybe someone has additional Infomation that I'am missing out?
 
Last edited:
Or maybe someone has additional Infomation that I'am missing out?
Just decades of experience ... yes this happens sometimes, yet It's not the norm. Positional PCIe network names have been around for many years and work(ed) very reliable. In practice - at least in the enterprise realm, where PVE's target market is located - you seldomly add/remove hardware from a server besides RAM and storage. YMMV

The new kernel problem, as outlined in the documentation, is a very new problem and distributions are working on getting out the quirks, especially for the main culprit i40.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!