Hi,
I noticed that pvestatd was not running on one of my nodes. It turns out that the service segfaulted
I was able to manually restart the service and it seems to work fine now
I checked for any other segfaults
and it looks like there was one segfault a couple of hours earlier (with no reboot in between)
The node was upgraded from Proxmox 8.4 to 9 on 27th August
A third segfault happened on Promox 8.4
The CPU is quite old : AMD Embedded G-Series GX-420GI Radeon R7E
I am not sure if this is more likely due to a faulty hardware or a bug. Happy to provide more details if that helps to investigate
I noticed that pvestatd was not running on one of my nodes. It turns out that the service segfaulted
Aug 31 16:39:55 pve1 kernel: pvestatd[1862]: segfault at 100000000000 ip 000063217ac22321 sp 00007ffc2cfa5140 error 4 in perl[95321,63217abd1000+1ae000] likely on CPU 3 (core 3, socket 1)
Aug 31 16:39:55 pve1 kernel: Code: 00 00 00 66 0f 1f 44 00 00 48 8d 4a 01 48 83 c0 08 49 89 0c 24 48 8b 75 00 48 3b 56 18 73 52 48 89 ca 48 8b 18 48 85 db 74 df <48> 8b 13 48 89 10 48 8b 45 00 48 83 68 10 01 83 b>
Aug 31 16:39:55 pve1 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Aug 31 16:39:55 pve1 systemd[1]: pvestatd.service: Failed with result 'signal'.
Aug 31 16:39:55 pve1 systemd[1]: pvestatd.service: Consumed 7h 27min 59.427s CPU time, 160.6M memory peak.
I was able to manually restart the service and it seems to work fine now
Sep 01 10:20:25 pve1 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Sep 01 10:20:27 pve1 pvestatd[2099919]: starting server
Sep 01 10:20:27 pve1 systemd[1]: Started pvestatd.service - PVE Status Daemon.
I checked for any other segfaults
root@pve1:~# journalctl | grep segfault
Nov 03 23:02:28 pve1 kernel: dnsmasq[228401]: segfault at 5ea02dc81e1e ip 00007445fd6c42d5 sp 00007ffc327293a8 error 4 in libdbus-1.so.3.32.4[7445fd6a1000+30000] likely on CPU 0 (core 0, socket 0)
Nov 28 08:55:59 pve1 kernel: dnsmasq[1685]: segfault at 616309463d64 ip 00007d02c4f892d5 sp 00007ffe04781958 error 4 in libdbus-1.so.3.32.4[7d02c4f66000+30000] likely on CPU 3 (core 3, socket 0)
Dec 08 22:11:05 pve1 kernel: dnsmasq[1265]: segfault at 5941068861ac ip 00007c80e1a0a2d5 sp 00007fff0b538e28 error 4 in libdbus-1.so.3.32.4[7c80e19e7000+30000] likely on CPU 1 (core 1, socket 0)
Aug 10 04:24:01 pve1 kernel: task UPIDve1:[2514843]: segfault at 2989dc5a8 ip 00007529bff58087 sp 00007ffc9b12f8c0 error 4 in libc.so.6[7529bfee9000+155000] likely on CPU 2 (core 2, socket 0)
Aug 21 14:29:43 pve1 kernel: python3[2469825]: segfault at ffffffffff8 ip 00007d1eaae60efa sp 00007ffcc1a05fa0 error 4 in libc.so.6[7d1eaadee000+155000] likely on CPU 0 (core 0, socket 0)
Aug 31 09:12:34 pve1 kernel: python3[1528251]: segfault at 100000000008 ip 00007d900a4b9653 sp 00007ffd7a3b4c90 error 4 in libcrypto.so.3[26d653,7d900a343000+381000] likely on CPU 0 (core 0, socket 1)
Aug 31 16:39:55 pve1 kernel: pvestatd[1862]: segfault at 100000000000 ip 000063217ac22321 sp 00007ffc2cfa5140 error 4 in perl[95321,63217abd1000+1ae000] likely on CPU 3 (core 3, socket 1)
and it looks like there was one segfault a couple of hours earlier (with no reboot in between)
Aug 31 09:12:34 pve1 kernel: python3[1528251]: segfault at 100000000008 ip 00007d900a4b9653 sp 00007ffd7a3b4c90 error 4 in libcrypto.so.3[26d653,7d900a343000+381000] likely on CPU 0 (core 0, socket 1)
Aug 31 09:12:35 pve1 kernel: Code: 89 ee 48 89 f5 49 c1 e6 03 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 49 8b 07 4a 8b 1c 30 48 85 db 74 19 0f 1f 40 00 48 89 d8 <48> 8b 5b 08 48 89 ee 48 8b 38 41 ff d4 48 85 db 75 eb 41 83 ed 01
The node was upgraded from Proxmox 8.4 to 9 on 27th August
A third segfault happened on Promox 8.4
Aug 21 14:29:43 pve1 kernel: python3[2469825]: segfault at ffffffffff8 ip 00007d1eaae60efa sp 00007ffcc1a05fa0 error 4 in libc.so.6[7d1eaadee000+155000] likely on CPU 0 (core 0, socket 0)
Aug 21 14:29:43 pve1 kernel: Code: ac 2c 10 00 e8 f7 62 fe ff 0f 1f 80 00 00 00 00 48 85 ff 0f 84 bf 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d e6 8e 13 00 <48> 8b 47 f8 64 8b 2b a8 02 75 5b 48 8b 15 6c 8e 13 00 64 48 83 3a
The CPU is quite old : AMD Embedded G-Series GX-420GI Radeon R7E
I am not sure if this is more likely due to a faulty hardware or a bug. Happy to provide more details if that helps to investigate