proxmox Continuous freezes every day

Nov 22, 2022
6
0
6
Good morning,

I have a data center with three nodes and identical machines:

CPU:Ryzen 9 5950X 16-Core Processor -
RAM: 128GB RAM ECC
MB: ASROCK X570D4U-2L2T
PVE version: 8.3.3
latest kernel

For the past two weeks, one of the nodes keeps crashing. The node goes offline, and the VMs are unreachable. When I access via IPMI, I see the Proxmox console login screen frozen, with no ability to type.I’ve tried searching the logs and journal, but there are no errors at the time of the freeze (I’ve set up an alert on one of the servers that sends a message when a VM goes down. I need to restart from console IPMI to bring the node back online.

"I noticed that every time it restarts, the following messages appear:

__common_interrupt: 1.55 No irq handler for vector
NODE kernel: __common_interrupt: 2.55 No irq handler for vector
NODE kernel: __common_interrupt: 3.55 No irq handler for vector
NODE kernel: __common_interrupt: 4.55 No irq handler for vector
NODE kernel: __common_interrupt: 5.55 No irq handler for vector
NODE kernel: __common_interrupt: 6.55 No irq handler for vector
NODE kernel: __common_interrupt: 7.55 No irq handler for vector
NODE kernel: __common_interrupt: 8.55 No irq handler for vector
NODE kernel: __common_interrupt: 9.55 No irq handler for vector
NODE kernel: __common_interrupt: 10.55 No irq handler for vector
NODE kernel: snd_hda_intel 0000:2e:00.4: no codecs found!
NODE pmxcfs[1694]: [quorum] crit: quorum_initialize failed: 2
NODE pmxcfs[1694]: [quorum] crit: can't initialize service
NODE pmxcfs[1694]: [confdb] crit: cmap_initialize failed: 2
NODE pmxcfs[1694]: [confdb] crit: can't initialize service
NODE pmxcfs[1694]: [dcdb] crit: cpg_initialize failed: 2
NODE pmxcfs[1694]: [dcdb] crit: can't initialize service
NODE pmxcfs[1694]: [status] crit: cpg_initialize failed: 2
NODE pmxcfs[1694]: [status] crit: can't initialize service"

I think it's due to the fact that the cluster services haven't started yet. If you try typing: systemctl status pve-cluster, everything is on


Already tried to:
enable low C1 on Bios, change the Kernel version, and perform a memory check with Memtest, but nothing has changed.

Could someone kindly help me?

thank you in advice

Lorenzo
 
Last edited:
Feb 20 07:17:01 NODE CRON[212702]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 20 07:17:01 NODE CRON[212703]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 20 07:17:01 NODE CRON[212702]: pam_unix(cron:session): session closed for user root
Feb 20 07:49:39 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 08:09:21 NODE pmxcfs[5969]: [dcdb] notice: data verification successful
Feb 20 08:14:52 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 08:15:25 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 08:17:01 NODE CRON[224491]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 20 08:17:01 NODE CRON[224492]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 20 08:17:01 NODE CRON[224491]: pam_unix(cron:session): session closed for user root
Feb 20 08:29:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 08:30:25 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 08:44:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 08:45:25 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 08:57:16 NODE systemd[1]: Starting apt-daily.service - Daily apt download activities...
Feb 20 08:57:16 NODE systemd[1]: apt-daily.service: Deactivated successfully.
Feb 20 08:57:16 NODE systemd[1]: Finished apt-daily.service - Daily apt download activities.
Feb 20 08:59:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:00:26 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:09:21 NODE pmxcfs[5969]: [dcdb] notice: data verification successful
Feb 20 09:14:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:15:26 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:15:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:17:01 NODE CRON[236344]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 20 09:17:01 NODE CRON[236345]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 20 09:17:01 NODE CRON[236344]: pam_unix(cron:session): session closed for user root
Feb 20 09:29:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:30:26 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:30:57 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:44:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:45:26 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:46:57 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 09:49:16 NODE systemd[1]: Starting man-db.service - Daily man-db regeneration...
Feb 20 09:49:16 NODE systemd[1]: man-db.service: Deactivated successfully.
Feb 20 09:49:16 NODE systemd[1]: Finished man-db.service - Daily man-db regeneration.
Feb 20 09:59:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:00:26 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:01:58 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:09:21 NODE pmxcfs[5969]: [dcdb] notice: data verification successful
Feb 20 10:14:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:15:26 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:17:01 NODE CRON[248136]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 20 10:17:01 NODE CRON[248137]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 20 10:17:01 NODE CRON[248136]: pam_unix(cron:session): session closed for user root
Feb 20 10:17:39 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:29:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:30:26 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:32:40 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:44:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:45:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:47:41 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 10:52:26 NODE pvedaemon[6121]: worker exit
Feb 20 10:52:26 NODE pvedaemon[6118]: worker 6121 finished
Feb 20 10:52:26 NODE pvedaemon[6118]: starting 1 worker(s)
Feb 20 10:52:26 NODE pvedaemon[6118]: worker 255083 started
Feb 20 11:00:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:01:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:02:41 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:09:21 NODE pmxcfs[5969]: [dcdb] notice: data verification successful
Feb 20 11:16:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:17:01 NODE CRON[260291]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 20 11:17:01 NODE CRON[260292]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 20 11:17:01 NODE CRON[260291]: pam_unix(cron:session): session closed for user root
Feb 20 11:17:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:32:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:33:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:48:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 11:49:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:04:35 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:05:45 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:09:21 NODE pmxcfs[5969]: [dcdb] notice: data verification successful
Feb 20 12:17:01 NODE CRON[272150]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 20 12:17:01 NODE CRON[272151]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 20 12:17:01 NODE CRON[272150]: pam_unix(cron:session): session closed for user root
Feb 20 12:19:50 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:20:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:34:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:36:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:50:35 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 12:52:39 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 13:05:50 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 13:07:54 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 13:09:21 NODE pmxcfs[5969]: [dcdb] notice: data verification successful
Feb 20 13:17:01 NODE CRON[283910]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Feb 20 13:17:01 NODE CRON[283911]: (root) CMD (cd / && run-parts --report /etc/cron.hourly)
Feb 20 13:17:01 NODE CRON[283910]: pam_unix(cron:session): session closed for user root
Feb 20 13:20:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 13:22:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 13:36:56 NODE pmxcfs[5969]: [status] notice: received log
Feb 20 13:38:56 NODE pmxcfs[5969]: [status] notice: received log
-- Boot 92ba7d03663349a7a057423098e1fb10 --