Node Reboot/unresponsive

tejasthakur123

New Member
Oct 4, 2023
21
0
1
I have 4 node proxmox cluster on consumer-grade machines 1 node of it goes unresponsive every time how can I fix this ?
I can able to login on the host
and as per my observation if I kill some process it will turn green

root@pve02:~# pvecm status
Cluster information
-------------------
Name: pve-clus
Config Version: 6
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Thu Apr 25 08:21:10 2024
Quorum provider: corosync_votequorum
Nodes: 4
Node ID: 0x00000003
Ring ID: 1.372
Quorate: Yes

Votequorum information
----------------------
Expected votes: 4
Highest expected: 4
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 10.12.0.4
0x00000002 1 10.12.0.3
0x00000003 1 10.12.0.2 (local)
0x00000004 1 10.12.0.6
root@pve02:~#


==
root@pve02:~# systemctl status pvestatd.service
× pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
Active: failed (Result: signal) since Thu 2024-04-25 03:58:23 EDT; 4h 25min ago
Duration: 7h 2min 1.990s
Process: 22840 ExecReload=/usr/bin/pvestatd restart (code=exited, status=0/SUCCESS)
Main PID: 982 (code=killed, signal=SEGV)
CPU: 30min 8.321s

Apr 24 20:57:10 pve02 pvestatd[982]: modified cpu set for lxc/119: 0
Apr 24 21:08:59 pve02 systemd[1]: Reloading pvestatd.service - PVE Status Daemon...
Apr 24 21:09:00 pve02 pvestatd[22840]: send HUP to 982
Apr 24 21:09:00 pve02 pvestatd[982]: received signal HUP
Apr 24 21:09:00 pve02 pvestatd[982]: server shutdown (restart)
Apr 24 21:09:00 pve02 systemd[1]: Reloaded pvestatd.service - PVE Status Daemon.
Apr 24 21:09:01 pve02 pvestatd[982]: restarting server
Apr 25 03:58:23 pve02 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 25 03:58:23 pve02 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 25 03:58:23 pve02 systemd[1]: pvestatd.service: Consumed 30min 8.321s CPU time.
root@pve02:~# journalctl -b0 -u pvestatd.service
Apr 24 20:56:19 pve02 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 24 20:56:20 pve02 pvestatd[982]: starting server
Apr 24 20:56:21 pve02 systemd[1]: Started pvestatd.service - PVE Status Daemon.
Apr 24 20:56:44 pve02 pvestatd[982]: modified cpu set for lxc/126: 0
Apr 24 20:57:07 pve02 pvestatd[982]: status update time (36.172 seconds)
Apr 24 20:57:10 pve02 pvestatd[982]: modified cpu set for lxc/119: 0
Apr 24 21:08:59 pve02 systemd[1]: Reloading pvestatd.service - PVE Status Daemon...
Apr 24 21:09:00 pve02 pvestatd[22840]: send HUP to 982
Apr 24 21:09:00 pve02 pvestatd[982]: received signal HUP
Apr 24 21:09:00 pve02 pvestatd[982]: server shutdown (restart)
Apr 24 21:09:00 pve02 systemd[1]: Reloaded pvestatd.service - PVE Status Daemon.
Apr 24 21:09:01 pve02 pvestatd[982]: restarting server
Apr 25 03:58:23 pve02 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Apr 25 03:58:23 pve02 systemd[1]: pvestatd.service: Failed with result 'signal'.
Apr 25 03:58:23 pve02 systemd[1]: pvestatd.service: Consumed 30min 8.321s CPU time.
root@pve02:~#




Solution: after restarting the process
node back online
root@pve02:~# systemctl start pvestatd.service
root@pve02:~# systemctl status pvestatd.service
● pvestatd.service - PVE Status Daemon
Loaded: loaded (/lib/systemd/system/pvestatd.service; enabled; preset: enabled)
Active: active (running) since Thu 2024-04-25 08:26:44 EDT; 3s ago
Process: 287846 ExecStart=/usr/bin/pvestatd start (code=exited, status=0/SUCCESS)
Main PID: 287847 (pvestatd)
Tasks: 1 (limit: 18944)
Memory: 96.5M
CPU: 745ms
CGroup: /system.slice/pvestatd.service
└─287847 pvestatd

Apr 25 08:26:43 pve02 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Apr 25 08:26:44 pve02 pvestatd[287847]: starting server
Apr 25 08:26:44 pve02 systemd[1]: Started pvestatd.service - PVE Status Daemon.


Can someone please suggest fix for this ?
 

Attachments

  • 1714047493878.png
    1714047493878.png
    7.8 KB · Views: 2
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!