Proxmox freezing during boot with new GPU

maddyyy0110

New Member
Jul 27, 2024
6
0
1
Hello, I'm having some issues setting up a new arc a380 GPU in my proxmox server.

hardware:
PC: Thinkcentre M710s 10M8
CPU: Intel(R) i5-7400
Motherboard: IB250MH 00XK134
Old GPU: GT 730
New GPU: Arc a380

Proxmox info:
Proxmox version: 8.2.4
Proxmox kernel: Linux 6.8.8-3-pve (2024-07-16T16:16Z
BIOS: legacy

Problem Description
* System turns on fine, everything spins up
* Can see hardware show up no issues

20240727_141506(1).jpg


* This is as far as I get, stays frozen on this splash screen, everything runs but cant interact or reach from web portal

20240727_141526(1).jpg

kern.log section:

2024-07-27T14:09:57.720007+10:00 pve kernel: [ 3037.663289] vmbr0: port 4(veth103i0) entered disabled state
2024-07-27T14:09:57.740025+10:00 pve kernel: [ 3037.682710] vmbr0: port 5(veth104i0) entered disabled state
2024-07-27T14:09:57.786009+10:00 pve kernel: [ 3037.729039] vmbr0: port 3(veth102i0) entered disabled state
2024-07-27T14:09:59.069107+10:00 pve kernel: [ 3039.011803] vmbr0: port 5(veth104i0) entered disabled state
2024-07-27T14:09:59.069117+10:00 pve kernel: [ 3039.011876] veth104i0 (unregistering): left allmulticast mode
2024-07-27T14:09:59.069117+10:00 pve kernel: [ 3039.011879] veth104i0 (unregistering): left promiscuous mode
2024-07-27T14:09:59.069118+10:00 pve kernel: [ 3039.011882] vmbr0: port 5(veth104i0) entered disabled state
2024-07-27T14:09:59.350872+10:00 pve kernel: [ 3039.293296] audit: type=1400 audit(1722053399.348:54): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-104_</var/lib/lxc>" pid=14750 comm="apparmor_parser"
2024-07-27T14:10:00.001004+10:00 pve kernel: [ 3039.943993] EXT4-fs (dm-8): unmounting filesystem 91ab7530-eeab-42a1-bcb4-dee6faf50ef7.
2024-07-27T14:10:14.513029+10:00 pve kernel: [ 3054.456166] tap101i0: left allmulticast mode
2024-07-27T14:10:14.513038+10:00 pve kernel: [ 3054.456184] fwbr101i0: port 2(tap101i0) entered disabled state
2024-07-27T14:10:14.524059+10:00 pve kernel: [ 3054.467306] fwbr101i0: port 1(fwln101i0) entered disabled state
2024-07-27T14:10:14.524067+10:00 pve kernel: [ 3054.467355] vmbr0: port 2(fwpr101p0) entered disabled state
2024-07-27T14:10:14.525115+10:00 pve kernel: [ 3054.467484] fwln101i0 (unregistering): left allmulticast mode
2024-07-27T14:10:14.525131+10:00 pve kernel: [ 3054.467486] fwln101i0 (unregistering): left promiscuous mode
2024-07-27T14:10:14.525132+10:00 pve kernel: [ 3054.467488] fwbr101i0: port 1(fwln101i0) entered disabled state
2024-07-27T14:10:14.536025+10:00 pve kernel: [ 3054.478640] fwpr101p0 (unregistering): left allmulticast mode
2024-07-27T14:10:14.536030+10:00 pve kernel: [ 3054.478642] fwpr101p0 (unregistering): left promiscuous mode
2024-07-27T14:10:14.536030+10:00 pve kernel: [ 3054.478644] vmbr0: port 2(fwpr101p0) entered disabled state
2024-07-27T14:10:44.087047+10:00 pve kernel: [ 3084.029715] vmbr0: port 4(veth103i0) entered disabled state
2024-07-27T14:10:44.087084+10:00 pve kernel: [ 3084.029782] veth103i0 (unregistering): left allmulticast mode
2024-07-27T14:10:44.087085+10:00 pve kernel: [ 3084.029785] veth103i0 (unregistering): left promiscuous mode
2024-07-27T14:10:44.087085+10:00 pve kernel: [ 3084.029787] vmbr0: port 4(veth103i0) entered disabled state
2024-07-27T14:10:44.092037+10:00 pve kernel: [ 3084.034695] vmbr0: port 3(veth102i0) entered disabled state
2024-07-27T14:10:44.092043+10:00 pve kernel: [ 3084.034753] veth102i0 (unregistering): left allmulticast mode
2024-07-27T14:10:44.092043+10:00 pve kernel: [ 3084.034756] veth102i0 (unregistering): left promiscuous mode
2024-07-27T14:10:44.092044+10:00 pve kernel: [ 3084.034757] vmbr0: port 3(veth102i0) entered disabled state
2024-07-27T14:10:44.353047+10:00 pve kernel: [ 3084.295890] audit: type=1400 audit(1722053444.351:55): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-103_</var/lib/lxc>" pid=14985 comm="apparmor_parser"
2024-07-27T14:10:44.363031+10:00 pve kernel: [ 3084.305951] audit: type=1400 audit(1722053444.361:56): apparmor="STATUS" operation="profile_remove" profile="/usr/bin/lxc-start" name="lxc-102_</var/lib/lxc>" pid=14987 comm="apparmor_parser"
2024-07-27T14:10:44.979045+10:00 pve kernel: [ 3084.921805] EXT4-fs (dm-6): unmounting filesystem 09f8cb07-9b85-42d2-a57f-625c6f6a87b4.
2024-07-27T14:10:45.011202+10:00 pve kernel: [ 3084.954229] EXT4-fs (dm-7): unmounting filesystem 2230cc19-3ba0-4128-88ff-f434d7668abe.




Trouble Shooting steps:
* Forced proxmox to boot with iGPU from BIOS (works fine with old GPU in)
Unsure where to go from here as I see no error messages and am quite new to proxmox as a whole :/
 
This is my guess why it cannot be reached via the network:
The network devices have a named based on the PCI ID. Changing (or adding or removing or enabling/disabling on-board) PCI(e) devices can change the PCI ID of other PCI(e) devices. They "shift up or down by one or more" and this breaks the network configuration because the name of the device changes. Find you the new names of the network devices (usually the first number went up by 1) and adjust /etc/network/interfaces accordingly.
This network device name change can also happen due to a kernel update. You can also name the network devices yourself to prevent this but I don't have a link to a guide for you just now.
 
  • Like
Reactions: maddyyy0110
This is my guess why it cannot be reached via the network:
The network devices have a named based on the PCI ID. Changing (or adding or removing or enabling/disabling on-board) PCI(e) devices can change the PCI ID of other PCI(e) devices. They "shift up or down by one or more" and this breaks the network configuration because the name of the device changes. Find you the new names of the network devices (usually the first number went up by 1) and adjust /etc/network/interfaces accordingly.
This network device name change can also happen due to a kernel update. You can also name the network devices yourself to prevent this but I don't have a link to a guide for you just now.

I came across this fix when doing some research into this problem earlier. Does it still pertain to my situation if proxmox doesn't finish the boot process? I thought for other people their machine does boot but they just can't reach it from the web GUI?
 
I came across this fix when doing some research into this problem earlier. Does it still pertain to my situation if proxmox doesn't finish the boot process? I thought for other people their machine does boot but they just can't reach it from the web GUI?
I can't tell if your Proxmox finished the boot process or not, since there is no network and the GPU does not show anything. Maybe I'm correct about the network problem and you just don't have any screen output. How do you know it does not boot (assuming that you can depend the screen output)?
 
I can't tell if your Proxmox finished the boot process or not, since there is no network and the GPU does not show anything. Maybe I'm correct about the network problem and you just don't have any screen output. How do you know it does not boot (assuming that you can depend the screen output)?
True, I was just assuming it didn't finish its boot process since it freezes on the lenovo splash screen.

this is my current interfaces file:


GNU nano 7.2 interfaces
auto lo
iface lo inet loopback

iface enp3s0 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.0.207/24
gateway 192.168.0.1
bridge-ports enp3s0
bridge-stp off
bridge-fd 0

I modified it to the following:

GNU nano 7.2 interfaces
auto lo
iface lo inet loopback

iface enp4s0 inet manual

auto vmbr0
iface vmbr0 inet static
address 192.168.0.207/24
gateway 192.168.0.1
bridge-ports enp4s0
bridge-stp off
bridge-fd 0

Which unfortunately did not work.

Is there a way to tell what the network device is being renamed to? Or should I just trial and error it, +- 1 until I get a result.
 
Is there a way to tell what the network device is being renamed to? Or should I just trial and error it, +- 1 until I get a result.
Usually you login to the host console, but that requires a screen. Maybe check journalctl (search for rename) with a "good boot" after a "failed boot".
Or set the network names to something that does not change: https://pve.proxmox.com/wiki/Network_Configuration#_naming_conventions
Does your system shut down gracefully when pressing the power button (don't press it for long!)? You can use that to test if Proxmox starts properly if it also shuts down gracefully without output on the screen. Make sure to give it some time.
Maybe also see if booting your system with a Ubuntu 24.04 installer USB give you a working graphical desktop
 
Running:
journalctl | grep "rename"
Gives me the following output (when filtered for correct times):

Jul 27 13:19:23 pve kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0
Jul 27 13:19:41 pve kernel: eth0: renamed from vethVh1FIS
Jul 27 13:19:43 pve kernel: eth0: renamed from vethlinWTB
Jul 27 13:19:45 pve kernel: eth0: renamed from veth9kSbC1
Jul 27 13:49:09 pve kernel: eth0: renamed from vetheF4CQx
Jul 27 14:22:02 pve kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0
Jul 27 14:22:20 pve kernel: eth0: renamed from vethTwZEXE
Jul 27 14:22:22 pve kernel: eth0: renamed from vethBxu9je
Jul 27 14:22:24 pve kernel: eth0: renamed from vethl7g8Ca
Jul 27 14:46:57 pve kernel: eth0: renamed from veth0p3QXB
Jul 27 17:37:31 pve kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0
Jul 27 17:41:03 pve kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0
Jul 27 17:41:21 pve kernel: eth0: renamed from vethbVFtTN
Jul 27 17:41:23 pve kernel: eth0: renamed from vethdZS24C
Jul 27 17:41:25 pve kernel: eth0: renamed from vethBBnQfN
Jul 27 17:56:01 pve kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0
Jul 27 17:56:19 pve kernel: eth0: renamed from vethobVan8
Jul 27 17:56:21 pve kernel: eth0: renamed from vethgFFW9j
Jul 27 17:56:23 pve kernel: eth0: renamed from vethr6icHj

Am I reading this correctly that `Jul 27 17:56:01 pve kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0` refers to renaming 'enp3s0' to 'r8169' ?

I can confirm that the system does shut down "gracefully" when I tap the power button after giving it sufficient time to boot.
 
Am I reading this correctly that `Jul 27 17:56:01 pve kernel: r8169 0000:03:00.0 enp3s0: renamed from eth0` refers to renaming 'enp3s0' to 'r8169' ?
No, it mean the name of your network device is enp3s0 (and the driver is r8169). The other renames are from guests.
I can confirm that the system does shut down "gracefully" when I tap the power button after giving it sufficient time to boot.
From your logs, I doubt that the network not working is caused by name changing (as they are all enp3s0). Maybe check the same log for error messages that might give a clue (about network or GPU?
 
No, it mean the name of your network device is enp3s0 (and the driver is r8169). The other renames are from guests.

From your logs, I doubt that the network not working is caused by name changing (as they are all enp3s0). Maybe check the same log for error messages that might give a clue (about network or GPU?

Nothing is appearing in the journalctl logs. Powered on the system with new GPU at 20:20, powered off at 20:25. Powered on system with old GPU at 20:31. However, I see a jump from 20:02, when it was previously on, to 20:31 with the old GPU.

Jul 27 20:02:49 pve systemd-journald[385]: Journal stopped
Jul 27 20:31:12 pve kernel: Linux version 6.8.8-3-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.8.8-3 (2024-07-16T16:16Z) ()

Tested the Ubuntu installer trick which worked with old GPU but not with new GPU :(
 
Nothing is appearing in the journalctl logs. Powered on the system with new GPU at 20:20, powered off at 20:25. Powered on system with old GPU at 20:31. However, I see a jump from 20:02, when it was previously on, to 20:31 with the old GPU.
That sounds like it is actually stuck very early in the boot process or maybe even before that.
And the shutdown you tried would probably instantaneous instead of taking some time with the drive LED flashing a lot (like it does when Proxmox does start).
Tested the Ubuntu installer trick which worked with old GPU but not with new GPU :(
Maybe the GPU is broken? Or some kind of incompatibility with your motherboard or BIOS settings? Or maybe a power supply issue (A380 is 3x GT730). Did you connect the PCIe 6-pin cable? Can you test it in another machine?
 
That sounds like it is actually stuck very early in the boot process or maybe even before that.
And the shutdown you tried would probably instantaneous instead of taking some time with the drive LED flashing a lot (like it does when Proxmox does start).

Maybe the GPU is broken? Or some kind of incompatibility with your motherboard or BIOS settings? Or maybe a power supply issue (A380 is 3x GT730). Did you connect the PCIe 6-pin cable? Can you test it in another machine?

GPU works in main PC. I have the sparkle elf arc a380 which does require external power from PSU. Power could still be a factor though. Motherboard takes power through proprietary 10 pin input rather than the usual 24 (I use a 24 - 10 pin adapter). So maybe this isnt enough for GPU?

Compatibility could also be an issue too. I know intel says you need a 10th gen CPU or later but I have heard of people getting it to work on older generations like mine (similarly with the motherboard by installing a workaround for resizeable BAR).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!