Help! My Proxmox server is not booting anymore ;-(

geekdot

New Member
Jan 9, 2026
8
0
1
Hi Gang,

I hope you can help me out with this one. My Proxmox server (8.4.14, Kernel 6.8.12-17) started to loose its network connection for 3 of 4 VMs three days ago. While that one single VM (XPnology) ran fine, all others including the Proxmox GUI where not answering anymore. I did not touch the system for weeks, so "it just happened".

Without ssh and easy alternative access I powered the server down, attached monitor and keyboard and started it again. One fsck later Linux (Debian bookworm) came up fine but all PVE* services failing and there's no network. A manual dhclient changed that, so luckily I can conveniently copy-paste in here.
(Bootup video available here)

/etc/hostname is "proxmox" - For starters here are some configs:

Code:
root@proxmox:/etc/pve# pveversion -v
proxmox-ve: 8.4.0 (running kernel: 6.8.12-17-pve)
pve-manager: 8.4.16 (running version: 8.4.16/368e3c45c15b895c)
proxmox-kernel-helper: 8.1.4
proxmox-kernel-6.8: 6.8.12-17
proxmox-kernel-6.8.12-17-pve-signed: 6.8.12-17
proxmox-kernel-6.8.12-16-pve-signed: 6.8.12-16
proxmox-kernel-6.8.12-9-pve-signed: 6.8.12-9
proxmox-kernel-6.8.12-8-pve-signed: 6.8.12-8
proxmox-kernel-6.8.12-7-pve-signed: 6.8.12-7
proxmox-kernel-6.8.12-6-pve-signed: 6.8.12-6
proxmox-kernel-6.8.12-5-pve-signed: 6.8.12-5
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
proxmox-kernel-6.5.13-6-pve-signed: 6.5.13-6
proxmox-kernel-6.5: 6.5.13-6
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 17.2.7-pve2
corosync: 3.1.9-pve1
criu: 3.17.1-2+deb12u2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.30-pve2
libproxmox-acme-perl: 1.6.0
libproxmox-backup-qemu0: 1.5.2
libproxmox-rs-perl: 0.3.5
libpve-access-control: 8.2.2
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.1.2
libpve-cluster-perl: 8.1.2
libpve-common-perl: 8.3.6
libpve-guest-common-perl: 5.2.2
libpve-http-server-perl: 5.2.2
libpve-network-perl: 0.11.3
libpve-rs-perl: 0.9.4
libpve-storage-perl: 8.3.7
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-2
lxcfs: 6.0.0-pve2
novnc-pve: 1.6.0-2
proxmox-backup-client: 3.4.7-1
proxmox-backup-file-restore: 3.4.7-1
proxmox-backup-restore-image: 0.7.0
proxmox-firewall: 0.7.1
proxmox-kernel-helper: 8.1.4
proxmox-mail-forward: 0.3.3
proxmox-mini-journalreader: 1.5
proxmox-offline-mirror-helper: 0.6.8
proxmox-widget-toolkit: 4.3.13
pve-cluster: 8.1.2
pve-container: 5.3.3
pve-docs: 8.4.1
pve-edk2-firmware: 4.2025.02-4~bpo12+1
pve-esxi-import-tools: 0.7.4
pve-firewall: 5.1.2
pve-firmware: 3.16-3
pve-ha-manager: 4.0.7
pve-i18n: 3.4.5
pve-qemu-kvm: 9.2.0-7
pve-xtermjs: 5.5.0-2
qemu-server: 8.4.5
smartmontools: 7.3-pve1
spiceterm: 3.3.1
swtpm: 0.8.0+pve1
vncterm: 1.8.1
zfsutils-linux: 2.2.8-pve1

Code:
root@proxmox:~# cat /etc/hosts
127.0.0.1 localhost.localdomain localhost
192.168.1.250 proxmox.local proxmox

# The following lines are desirable for IPv6 capable hosts

::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts

Code:
root@proxmox:~# cat /etc/network/interfaces

auto lo
iface lo inet loopback

iface enp2s0 inet manual

iface enp1s0 inet manual

auto vmbr0
iface vmbr0 inet static
        address 192.168.1.250/24
        gateway 192.168.1.253
        bridge-ports enp2s0
        bridge-stp off
        bridge-fd 0

source /etc/network/interfaces.d/*

vgdisplay, lvdisplay and ls /dev/pve show all Volumes.

Wondering what's up with that network (same before and after dhclient'ing)...

Code:
root@proxmox:~# systemctl status networking
● networking.service - Network initialization
     Loaded: loaded (/lib/systemd/system/networking.service; enabled; preset: enabled)
     Active: active (exited) since Sat 2026-01-10 12:14:17 CET; 50min ago
       Docs: man:interfaces(5)
             man:ifup(8)
             man:ifdown(8)
    Process: 645 ExecStart=/usr/share/ifupdown2/sbin/start-networking start (code=exited, status=0/SUCCESS)
   Main PID: 645 (code=exited, status=0/SUCCESS)
        CPU: 195ms

Jan 10 12:14:17 proxmox networking[726]:   File "/usr/lib/python3.11/uuid.py", line 59, in <module>
Jan 10 12:14:17 proxmox networking[726]:     import platform
Jan 10 12:14:17 proxmox networking[726]:   File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
Jan 10 12:14:17 proxmox networking[726]:   File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
Jan 10 12:14:17 proxmox networking[726]:   File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
Jan 10 12:14:17 proxmox networking[726]:   File "<frozen importlib._bootstrap_external>", line 936, in exec_module
Jan 10 12:14:17 proxmox networking[726]:   File "<frozen importlib._bootstrap_external>", line 1069, in get_code
Jan 10 12:14:17 proxmox networking[726]:   File "<frozen importlib._bootstrap_external>", line 729, in _compile_bytecode
Jan 10 12:14:17 proxmox networking[726]: ValueError: bad marshal data (unknown type code)
Jan 10 12:14:17 proxmox systemd[1]: Finished networking.service - Network initialization.

Now looking at the PVE processes - E.g. journalctl -ru pveproxy.service. pveproxy is constantly segfaulting, like all others, too :-(

Code:
Jan 10 12:14:22 proxmox systemd[1]: Failed to start pveproxy.service - PVE API Proxy Server.
Jan 10 12:14:22 proxmox systemd[1]: pveproxy.service: Failed with result 'signal'.
Jan 10 12:14:22 proxmox systemd[1]: pveproxy.service: Control process exited, code=killed, status=11/SEGV
Jan 10 12:14:21 proxmox pvecm[978]: Unable to load access control list: Connection refused
Jan 10 12:14:21 proxmox pvecm[978]: ipcc_send_rec[3] failed: Connection refused
Jan 10 12:14:21 proxmox pvecm[978]: ipcc_send_rec[2] failed: Connection refused
Jan 10 12:14:21 proxmox pvecm[978]: ipcc_send_rec[1] failed: Connection refused
Jan 10 12:14:21 proxmox systemd[1]: Starting pveproxy.service - PVE API Proxy Server...
Jan 10 12:14:20 proxmox systemd[1]: pveproxy.service: Consumed 1.263s CPU time.
Jan 10 12:14:20 proxmox systemd[1]: Stopped pveproxy.service - PVE API Proxy Server.
Jan 10 12:14:20 proxmox systemd[1]: pveproxy.service: Scheduled restart job, restart counter is at 1.
Jan 10 12:14:19 proxmox systemd[1]: pveproxy.service: Consumed 1.263s CPU time.
Jan 10 12:14:19 proxmox systemd[1]: Failed to start pveproxy.service - PVE API Proxy Server.
Jan 10 12:14:19 proxmox systemd[1]: pveproxy.service: Failed with result 'signal'.
Jan 10 12:14:19 proxmox systemd[1]: pveproxy.service: Control process exited, code=killed, status=11/SEGV
Jan 10 12:14:19 proxmox pvecm[970]: Unable to load access control list: Connection refused
Jan 10 12:14:19 proxmox pvecm[970]: ipcc_send_rec[3] failed: Connection refused
Jan 10 12:14:19 proxmox pvecm[970]: ipcc_send_rec[2] failed: Connection refused
Jan 10 12:14:19 proxmox pvecm[970]: ipcc_send_rec[1] failed: Connection refused

Most PVE tool network accesses (eg. pvecm updatecerts --force) also result in those failed ipcc_send_rec

Code:
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused

What else did I do:
  • apt-update and apt dist-upgrade. Worked ok
  • memtest86ran through fine
I hope I provided enough initial info and you might have an idea what's wrong with my system. I'm not an Proxmox expert so I'm pretty much lost at this point reading 1000 posts in several forums since yesterday... to no avail.
So thanks a ton in advance for your help! <3
 
Last edited:
Note: I only quickly reviewed your post.

Your pve gui is located, per your network config, at 192.168.1.250 with your gateway at 192.168.1.253. First step is to verify that these are correct.

One thing that happened to me when pve had network problems was I found that the interface ID changed and the bridge was calling for a bond that didn't have valid interfaces after an update. However, I would assume that would prevent any WAN connection, such as apt update.

ip link show <- this will show you if your interfaces enp2s0 and vmbr0 are both up. If it shows down you can try to bring them up manually. If they show a different interface ID then well, there's your likely problem and you need to reconfigure /etc/network/interfaces.

ping -c 2 192.168.1.253 <- this will show you if you can reach your gateway. You should also try to ping beyond the gateway to verify a route exists (i.e. another pve node, or 1.1.1.1,etc.
 
Note: I only quickly reviewed your post.
Thanks for having a look and helping! <3
Your pve gui is located, per your network config, at 192.168.1.250 with your gateway at 192.168.1.253. First step is to verify that these are correct.
That's absolutely correct. The gateway IP has "historical reasons" ;)
After a manual dhclient call I can get into the LAN and WAN. Pinging LAN clients and to the outside world works.
I mean: apt works, so I's say that's fine (sort of)...
ip link show <- this will show you if your interfaces enp2s0 and vmbr0 are both up. If it shows down you can try to bring them up manually. If they show a different interface ID then well, there's your likely problem and you need to reconfigure /etc/network/interfaces.
vmbr0 is not listed:
Code:
root@proxmox:~# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp1s0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN mode DEFAULT group default qlen 1000
    link/ether 6c:bf:b5:03:4b:75 brd ff:ff:ff:ff:ff:ff
3: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 6c:bf:b5:03:4b:76 brd ff:ff:ff:ff:ff:ff
The interface ID (enp2s0) is the same as used in /etc/network/interfaces as bridge.

This ModuleNotFoundError: No module named 'ifupdown2' could be the reason why the network (and everything else depending on it) is not coming up?

Code:
root@proxmox:~# ifup vmbr0
Traceback (most recent call last):
  File "/usr/sbin/ifup", line 30, in <module>
    from ifupdown2.lib.log import LogManager, root_logger
ModuleNotFoundError: No module named 'ifupdown2'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/sbin/ifup", line 33, in <module>
    from lib.log import LogManager, root_logger
  File "/usr/share/ifupdown2/lib/log.py", line 32, in <module>
    from systemd.journal import JournalHandler
  File "/usr/lib/python3/dist-packages/systemd/journal.py", line 25, in <module>
    import uuid as _uuid
  File "/usr/lib/python3.11/uuid.py", line 59, in <module>
    import platform
  File "<frozen importlib._bootstrap>", line 1178, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1149, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 936, in exec_module
  File "<frozen importlib._bootstrap_external>", line 1069, in get_code
  File "<frozen importlib._bootstrap_external>", line 729, in _compile_bytecode
ValueError: bad marshal data (unknown type code)

...but it's installed:

Code:
root@proxmox:~# apt install ifupdown2
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ifupdown2 is already the newest version (3.2.0-1+pmx11).
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.

...and a forced re-install does not change this.
 
Last edited:
Ok, let's close this thread.
After clearing the python cache I got ifup and the like working again, but it seems that's just the tip of the "bit rot" iceberg :(
I guess my nVME is dying:

Even I've used just 4% of it's capacity 8% spare blocks are already used. And we have a Media and Data Integrity Errors: 1 which is the smoking gun showing that the nVME is on the way of the dodo...

Code:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        34 Celsius
Available Spare:                    92%
Available Spare Threshold:          1%
Percentage Used:                    4%
Data Units Read:                    81,521,253 [41.7 TB]
Data Units Written:                 5,476,342 [2.80 TB]
Host Read Commands:                 660,041,741
Host Write Commands:                252,225,041
Controller Busy Time:               2,163
Power Cycles:                       30
Power On Hours:                     15,317
Unsafe Shutdowns:                   8
Media and Data Integrity Errors:    1
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               34 Celsius
Temperature Sensor 2:               39 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged
 
Ok, let's close this thread.
After clearing the python cache I got ifup and the like working again, but it seems that's just the tip of the "bit rot" iceberg :(
I guess my nVME is dying:

Even I've used just 4% of it's capacity 8% spare blocks are already used. And we have a Media and Data Integrity Errors: 1 which is the smoking gun showing that the nVME is on the way of the dodo...

Code:
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        34 Celsius
Available Spare:                    92%
Available Spare Threshold:          1%
Percentage Used:                    4%
Data Units Read:                    81,521,253 [41.7 TB]
Data Units Written:                 5,476,342 [2.80 TB]
Host Read Commands:                 660,041,741
Host Write Commands:                252,225,041
Controller Busy Time:               2,163
Power Cycles:                       30
Power On Hours:                     15,317
Unsafe Shutdowns:                   8
Media and Data Integrity Errors:    1
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               34 Celsius
Temperature Sensor 2:               39 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
No Errors Logged

I wouldn't kill off a 2 year nvme for just 1 error... if it starts stepping I would definitely flag it but one error in 2 years might just of been a voltage drop/spike... and that is kind of shown by that unsafe shutdown count of 8...

But glad that you got your proxmox node back up and that bridge working...
 
I wouldn't kill off a 2 year nvme for just 1 error... if it starts stepping I would definitely flag it but one error in 2 years might just of been a voltage drop/spike...
Well, I think there are 2 arguments speaking against it:

1) a 300+ lines long return of debsums -s incl. libc6, libglib, some Perl- and Python-modules :oops:
2) The NMEe's brand is "FIKWOT"... replace one vowel at will ;) (.... I was young a needed the money)

Anyhow, new (major brand, 5yrs guarantee) NMEe ordered. Thanks a metric ton for helping, @Pouch6867 <3