Proxmox 8 seems unstable

Aug 13, 2021
58
6
13
51
Hi,

i have a PVE Server running since PVE 6. It seems that since the upgrade to 8 the Server runs instable. The server had serveral freezes and at the last time problems with the filesystem. In any case the server runs after a restart normal again. At the moment the server have problems with the networkstability. In any case unusable for the tecteaching in my school. I had read some theads that some server have similar issues wiht PVE8.
The server is a HP DL380p G8 with 256 Gb RAM, 2x 10-core Intel Xeon E5-2680 v2, Intel I350 Network Chip.
Have anyone an idea how to solve the problem?

kind regards
Micheal
 
Hi,
in the meantime i had installed two Proxmox 7 Nodes. The Proxmox 8 Node still exists. At 10.3. i had a crash again. The crash occurs whil all VMs und LCXs are backuped. In the Syslog ist on the 10.3 at the end an unexpeced entry
 

Attachments

  • syslog before crash.txt
    682.7 KB · Views: 4
  • Syslog after crash.txt
    200.4 KB · Views: 3
Now it was possible to finish the backup successfully. The system runs at the moment. I have to wait for pupils to create load.
 
  • Like
Reactions: Moayad
Thank you for the update! feel free to re-post the syslog when the issue happened again.
 
since 13.03.2024 on hang und crash. It seems that the microcode update was successfully. Also a very low amount of lost connections while working in the console via Web.
 
I got the feedback from my colleagues that the last 2 Weeks the system running stable again. The microcode update seem the solution.
 
These are servers for studentlessions. These Servers are more than ten years old. No money for fresh hardware :-( So the microcode update in the cli was the only way to solve that. No Bios updates from the ventor any more
 
unfortunaly the hang occurs again. No hints in the syslog. The hang occurs while the backup.

Apr 05 05:01:59 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: call_decode: 1184 callbacks suppressed
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:04 pve01 kernel: nfs: server 172.18.99.4 OK
Apr 05 05:02:08 pve01 kernel: tap10003i0: left allmulticast mode
Apr 05 05:02:08 pve01 kernel: fwbr10003i0: port 2(tap10003i0) entered disabled state
Apr 05 05:02:08 pve01 kernel: fwbr10003i0: port 1(fwln10003i0) entered disabled state
Apr 05 05:02:08 pve01 kernel: vmbr24: port 2(fwpr10003p0) entered disabled state
Apr 05 05:02:08 pve01 kernel: fwln10003i0 (unregistering): left allmulticast mode
Apr 05 05:02:08 pve01 kernel: fwln10003i0 (unregistering): left promiscuous mode
Apr 05 05:02:08 pve01 kernel: fwbr10003i0: port 1(fwln10003i0) entered disabled state
Apr 05 05:02:08 pve01 kernel: fwpr10003p0 (unregistering): left allmulticast mode
Apr 05 05:02:08 pve01 kernel: fwpr10003p0 (unregistering): left promiscuous mode
Apr 05 05:02:08 pve01 kernel: vmbr24: port 2(fwpr10003p0) entered disabled state
Apr 05 05:02:08 pve01 qmeventd[2755]: read: Connection reset by peer
Apr 05 05:02:08 pve01 systemd[1]: 10003.scope: Deactivated successfully.
Apr 05 05:02:08 pve01 systemd[1]: 10003.scope: Consumed 9min 34.729s CPU time.
Apr 05 05:02:10 pve01 qmeventd[2516787]: Starting cleanup for 10003
Apr 05 05:02:10 pve01 qmeventd[2516787]: Finished cleanup for 10003
Apr 05 05:02:17 pve01 pvestatd[3342]: status update time (5.210 seconds)
Apr 05 05:02:26 pve01 pvescheduler[2280582]: INFO: Finished Backup of VM 10003 (00:47:04)
Apr 05 05:02:26 pve01 pvescheduler[2280582]: INFO: Starting Backup of VM 10004 (qemu)
Apr 05 05:02:32 pve01 systemd[1]: Started 10004.scope.
Apr 05 05:02:34 pve01 kernel: tap10004i0: entered promiscuous mode
Apr 05 05:02:34 pve01 kernel: vmbr24: port 2(fwpr10004p0) entered blocking state
Apr 05 05:02:34 pve01 kernel: vmbr24: port 2(fwpr10004p0) entered disabled state
Apr 05 05:02:34 pve01 kernel: fwpr10004p0: entered allmulticast mode
Apr 05 05:02:34 pve01 kernel: fwpr10004p0: entered promiscuous mode
Apr 05 05:02:34 pve01 kernel: vmbr24: port 2(fwpr10004p0) entered blocking state
Apr 05 05:02:34 pve01 kernel: vmbr24: port 2(fwpr10004p0) entered forwarding state
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 1(fwln10004i0) entered blocking state
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 1(fwln10004i0) entered disabled state
Apr 05 05:02:34 pve01 kernel: fwln10004i0: entered allmulticast mode
Apr 05 05:02:34 pve01 kernel: fwln10004i0: entered promiscuous mode
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 1(fwln10004i0) entered blocking state
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 1(fwln10004i0) entered forwarding state
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 2(tap10004i0) entered blocking state
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 2(tap10004i0) entered disabled state
Apr 05 05:02:34 pve01 kernel: tap10004i0: entered allmulticast mode
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 2(tap10004i0) entered blocking state
Apr 05 05:02:34 pve01 kernel: fwbr10004i0: port 2(tap10004i0) entered forwarding state
-- Reboot --
Apr 05 19:18:34 pve01 kernel: microcode: updated early: 0x428 -> 0x42e, date = 2019-03-14
Apr 05 19:18:34 pve01 kernel: Linux version 6.5.13-3-pve (build@proxmox) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-3 (2024-03-20T10:45Z) ()
Apr 05 19:18:34 pve01 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-6.5.13-3-pve root=/dev/mapper/pve-root ro quiet
Apr 05 19:18:34 pve01 kernel: KERNEL supported cpus:
 
yes the last bios was 2019.05.24 but other things like the Ilo are still getting updates
Unfortunaly HP has been removed the last three bios updates :-( (It has been removed from the web for a potential security vulnerability that can lead to arbitrary code execution.) The newest download is 2015 07 01
 
Now it's clear what is wrong. In the ILO/ILM i see the message that the CPU 1 has a unrecoverable error. HP advicory "Replace Processor" The first time that the error has been logged. Strange thing that the Upgrade from 7 to 8 and that error occurs at nealy the same time. Regards for the help!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!