After migration Proxmox 8 to 9 with Kernel 6.14 PC freezes

Lalsacien

Member
Apr 11, 2023
11
1
8
France
Dear all,

Since I migrated from Proxmox 8 to 9 with kernel 6.14 my NUC freezes after one on two minutes. When I reboot with the kernel 6.8 all is working fine.
Perhaps my hardware is too old to 6.14 kernel : I have a NUC Beelink from 2023 with a N5105 processor. On boot no VM, no LXC, and no NFS mapping too.

I'm not a kernel specialist, so, could someone help me ? I really don't know where I have to search :(

Perharps I need to return on the last 8 version.

As attached files, you will find boot_6.14.txt and boot_6.8.txt logs. Perhaps it could help.

Many thanks for help
Best Regards
Lalsacien
 

Attachments

Hello,
For the moment, I started Proxmox with kernel 6.8 and it works. I saw there are many Proxmox updates. I will try to update and see if it works better. Maybe this afternoon
 
  • Like
Reactions: abobakr
Not better with the lastest patches. I'm now under Proxmox 9.0.11 with the last kernel 6.14 and still it crashes after a few minutes (without any VM started).
 
You could try the Opt-in Linux 6.17 Kernel for Proxmox VE 9 available on test & no-subscription:
How to install:

  1. Ensure that either the pve-no-subscription or pvetest repository is set up correctly.
    You can do so via CLI text-editor or using the web UI under Node -> Repositories.
  2. Open a shell as root, e.g. through SSH or using the integrated shell on the web UI.
  3. apt update
  4. apt install proxmox-kernel-6.17
  5. reboot
Details on it on the thread I linked before:
 
It seems to be much better. No Freezes for the moment. In 10 minutes I will start a VM.

I wrote too rapidly... system froze... :(
 
Last edited:
hmm please provide the log:
Bash:
journalctl --since "3h ago" > /tmp/$(hostname)-syslog.txt
Adjust the time to include since booting kernel 6.17 until it freezes. , maybe we could find something
 
Code:
Nov 14 15:30:06 pve systemd[1]: nut-driver@apc.service: Control process exited, code=exited, status=1/FAILURE
Nov 14 15:30:06 pve systemd[1]: nut-driver@apc.service: Failed with result 'exit-code'.
Nov 14 15:30:06 pve systemd[1]: Failed to start nut-driver@apc.service - Network UPS Tools - device driver for NUT device 'apc'.
Nov 14 15:30:11 pve nut-monitor[1333]: Poll UPS [apc@localhost] failed - Driver not connected
Nov 14 15:30:13 pve systemd-logind[858]: The system will power off now!
Nov 14 15:30:13 pve systemd-logind[858]: System is powering down.

The rest of the log after this looks like a clean shutdown caused by your UPS software. How did you install it (i.e. from Debian respositories vs some other way). If it wasn't via Debian repository, you may need to uninstall or disable it and then rebuild the driver against the new kernel.

ETA: Or maybe the UPS is just not connected any more.
 
Last edited:
Seems like your Proxmox host is shutting down because NUT (Network UPS Tools) cannot access your UPS due to USB permission problems:
Code:
Nov 14 15:30:06 pve nut-monitor[1333]: Poll UPS [apc@localhost] failed - Driver not connected
Nov 14 15:30:06 pve nut-driver@apc[1549]: Network UPS Tools - Generic HID driver 0.52 (2.8.1)
Nov 14 15:30:06 pve nut-driver@apc[1549]: USB communication driver (libusb 1.0) 0.46
Nov 14 15:30:06 pve nut-driver@apc[1549]: libusb1: Could not open any HID devices: insufficient permissions on everything
Nov 14 15:30:06 pve nut-driver@apc[1549]: No matching HID UPS found
Nov 14 15:30:06 pve nut-driver@apc[1549]: upsnotify: notify about state 4 with libsystemd: was requested, but not running as a service unit now, will not spam more about it
Nov 14 15:30:06 pve nut-driver@apc[1549]: upsnotify: failed to notify about state 4: no notification tech defined, will not spam more about it
Nov 14 15:30:06 pve nut-driver@apc[1485]: Driver failed to start (exit status=1)
Nov 14 15:30:06 pve nut-driver@apc[1485]: Network UPS Tools - UPS driver controller 2.8.1
Nov 14 15:30:06 pve systemd[1]: nut-driver@apc.service: Control process exited, code=exited, status=1/FAILURE
Nov 14 15:30:06 pve systemd[1]: nut-driver@apc.service: Failed with result 'exit-code'.
Nov 14 15:30:06 pve systemd[1]: Failed to start nut-driver@apc.service - Network UPS Tools - device driver for NUT device 'apc'.
Nov 14 15:30:11 pve nut-monitor[1333]: Poll UPS [apc@localhost] failed - Driver not connected
Nov 14 15:30:13 pve systemd-logind[858]: The system will power off now!
Nov 14 15:30:13 pve systemd-logind[858]: System is powering down.
Could be driver related issue with the newer kernel on PVE9.
Is it an option to stop NUT for testing? if yes, you could try to disable the NUT related services and see if the server will start after that.
I don’t know much about how NUT is set up, but I assume you could disable it like the following (You know this better than I do, so please be careful. I don’t want to break your setup):
systemctl disable nut-monitor.service nut-server.service nut-driver@apc.service


P.S @BobhWasatch beats me to it, I should have refresh the browser to see his comment ^^
 
Last edited:
Ok I will disable NUT. The question is why I don’t have the shutdown in kernel 6.08 ?
Possibilities that come to mind:
  • You installed NUT from somewhere besides the Debian repository and the driver needs to be rebuilt.
  • You installed NUT from Debian repositories but something changed in the config file between v12 and v13 (you would have got a warning about that during the upgrade and there would be a new version with dpkg-dist or similar file extension left next to the old one).
  • Device names or permissions changed between kernels.
 
Code:
Nov 14 17:31:59 pve kernel: Generic FE-GE Realtek PHY r8169-0-200:00: attached PHY driver (mii_bus:phy_addr=r8169-0-200:00, irq=MAC)
Nov 14 17:31:59 pve kernel: r8169 0000:02:00.0 enp2s0: Link is Down
Nov 14 17:31:59 pve kernel: vmbr0: port 1(enp2s0) entered blocking state
Nov 14 17:31:59 pve kernel: vmbr0: port 1(enp2s0) entered forwarding state

This seems fine to me. Is it really crashed or is the network not working? Maybe the network device names changed and "enp2s0" isn't correct any more or maybe there's something going on with the port. What is the output of "ip a"?
 
@BobhWasatch to be more precise, the Nuc boots without problem on Kernel 6.17 (the same with 6.14) and after a few minutes (3-5) the Nuc freeze. No access with a usb keyboard connected and on console screen just the Proxmox login. On the HTML access a message that says the server is no more there.
No kernel panic screen etc…
 
Last edited: