Good morning,
I have a data center with three nodes and identical machines:
CPU:Ryzen 9 5950X 16-Core Processor -
RAM: 128GB RAM ECC
MB: ASROCK X570D4U-2L2T
PVE version: 8.3.3
latest kernel
For the past two weeks, one of the nodes keeps crashing. The node goes offline, and the VMs are unreachable. When I access via IPMI, I see the Proxmox console login screen frozen, with no ability to type.I’ve tried searching the logs and journal, but there are no errors at the time of the freeze (I’ve set up an alert on one of the servers that sends a message when a VM goes down. I need to restart from console IPMI to bring the node back online.
"I noticed that every time it restarts, the following messages appear:
__common_interrupt: 1.55 No irq handler for vector
NODE kernel: __common_interrupt: 2.55 No irq handler for vector
NODE kernel: __common_interrupt: 3.55 No irq handler for vector
NODE kernel: __common_interrupt: 4.55 No irq handler for vector
NODE kernel: __common_interrupt: 5.55 No irq handler for vector
NODE kernel: __common_interrupt: 6.55 No irq handler for vector
NODE kernel: __common_interrupt: 7.55 No irq handler for vector
NODE kernel: __common_interrupt: 8.55 No irq handler for vector
NODE kernel: __common_interrupt: 9.55 No irq handler for vector
NODE kernel: __common_interrupt: 10.55 No irq handler for vector
NODE kernel: snd_hda_intel 0000:2e:00.4: no codecs found!
NODE pmxcfs[1694]: [quorum] crit: quorum_initialize failed: 2
NODE pmxcfs[1694]: [quorum] crit: can't initialize service
NODE pmxcfs[1694]: [confdb] crit: cmap_initialize failed: 2
NODE pmxcfs[1694]: [confdb] crit: can't initialize service
NODE pmxcfs[1694]: [dcdb] crit: cpg_initialize failed: 2
NODE pmxcfs[1694]: [dcdb] crit: can't initialize service
NODE pmxcfs[1694]: [status] crit: cpg_initialize failed: 2
NODE pmxcfs[1694]: [status] crit: can't initialize service"
I think it's due to the fact that the cluster services haven't started yet. If you try typing: systemctl status pve-cluster, everything is on
Already tried to:
enable low C1 on Bios, change the Kernel version, and perform a memory check with Memtest, but nothing has changed.
Could someone kindly help me?
thank you in advice
Lorenzo
I have a data center with three nodes and identical machines:
CPU:Ryzen 9 5950X 16-Core Processor -
RAM: 128GB RAM ECC
MB: ASROCK X570D4U-2L2T
PVE version: 8.3.3
latest kernel
For the past two weeks, one of the nodes keeps crashing. The node goes offline, and the VMs are unreachable. When I access via IPMI, I see the Proxmox console login screen frozen, with no ability to type.I’ve tried searching the logs and journal, but there are no errors at the time of the freeze (I’ve set up an alert on one of the servers that sends a message when a VM goes down. I need to restart from console IPMI to bring the node back online.
"I noticed that every time it restarts, the following messages appear:
__common_interrupt: 1.55 No irq handler for vector
NODE kernel: __common_interrupt: 2.55 No irq handler for vector
NODE kernel: __common_interrupt: 3.55 No irq handler for vector
NODE kernel: __common_interrupt: 4.55 No irq handler for vector
NODE kernel: __common_interrupt: 5.55 No irq handler for vector
NODE kernel: __common_interrupt: 6.55 No irq handler for vector
NODE kernel: __common_interrupt: 7.55 No irq handler for vector
NODE kernel: __common_interrupt: 8.55 No irq handler for vector
NODE kernel: __common_interrupt: 9.55 No irq handler for vector
NODE kernel: __common_interrupt: 10.55 No irq handler for vector
NODE kernel: snd_hda_intel 0000:2e:00.4: no codecs found!
NODE pmxcfs[1694]: [quorum] crit: quorum_initialize failed: 2
NODE pmxcfs[1694]: [quorum] crit: can't initialize service
NODE pmxcfs[1694]: [confdb] crit: cmap_initialize failed: 2
NODE pmxcfs[1694]: [confdb] crit: can't initialize service
NODE pmxcfs[1694]: [dcdb] crit: cpg_initialize failed: 2
NODE pmxcfs[1694]: [dcdb] crit: can't initialize service
NODE pmxcfs[1694]: [status] crit: cpg_initialize failed: 2
NODE pmxcfs[1694]: [status] crit: can't initialize service"
I think it's due to the fact that the cluster services haven't started yet. If you try typing: systemctl status pve-cluster, everything is on
Already tried to:
enable low C1 on Bios, change the Kernel version, and perform a memory check with Memtest, but nothing has changed.
Could someone kindly help me?
thank you in advice
Lorenzo
Last edited: