Boot Failure After Upgrade from 8 to 9.

StuMacstu

New Member
Aug 7, 2025
1
0
1
Good afternoon,

I have 6 nodes in a cluster and started running through the upgrade from version 8 to 9. The first three nodes went fine but now I'm having issues with number four.

Installation appeared to go fine and after reboot the machine did not come back. Trying to reach the UI said refused to connect but the box responded to ping. I've come in to take a look next morning and found that when I rebooted it reached network.services/start and stalls for 7 hours until I manually reboot it and then the same happens.

Messages on the screen show that vmbr0:eno1 is up 1000Mbps and full duplex but then moves into a blocking state but doesn't come back up.

Eno1 is connected to a cisco 3750 switch which is configured as an access port, in the correct VLAN, speed 1000, duplex full, spanning-tree portfast and has worked fine until the proxmox upgrade. Disconnecting Eno1 reached the same point and still failed. On screen messages said the port moved to a blocking state, then disabled and stalled in the same place. I disconnected all network cables and removed any SFPs and tried again but had the same issues and have no CLI prompt at this point just the stalled service counting up.

I read that the update can change the interface names. I managed to modify the boot option to reach a CLI using:

root=zfs=rpool/ROOT/pve-1 boot=zfs init=/bin/bash console=tty1

In here my /etc/network/interfaces file is still intact and running [ip a] shows all of my interfaces with the same names that they have used prior to the upgrade. All interfaces were showing the state as DOWN and if I tried to 'ifup eno1' it says that another instance of this program is already running.

This is where my knowledge rapidly grinds to a halt and I start relying on Google. Running [ps aux | grep ifup] only shows my grep as a process so I can't kill what has hold of ifup.

Gemini said to create a rescue disk to get into the CLI and disable the network.service but when I create a bootable USB for version 9 and go to rescue mode it can't find rpool and sends me back to the installation options menu instead of to a CLI.

I don't want to go blindly into an AI rabbit hole and make things worse so what would anyone here suggest as next steps to recover the node?

The boot menu still has the option to boot with the previous kernel if that helps for any fault finding steps.
 
Last edited: