pvestatd segfaults

unterkomplex

New Member
Jul 30, 2024
8
1
3
Hi,


I noticed that pvestatd was not running on one of my nodes. It turns out that the service segfaulted
Aug 31 16:39:55 pve1 kernel: pvestatd[1862]: segfault at 100000000000 ip 000063217ac22321 sp 00007ffc2cfa5140 error 4 in perl[95321,63217abd1000+1ae000] likely on CPU 3 (core 3, socket 1)
Aug 31 16:39:55 pve1 kernel: Code: 00 00 00 66 0f 1f 44 00 00 48 8d 4a 01 48 83 c0 08 49 89 0c 24 48 8b 75 00 48 3b 56 18 73 52 48 89 ca 48 8b 18 48 85 db 74 df <48> 8b 13 48 89 10 48 8b 45 00 48 83 68 10 01 83 b>
Aug 31 16:39:55 pve1 systemd[1]: pvestatd.service: Main process exited, code=killed, status=11/SEGV
Aug 31 16:39:55 pve1 systemd[1]: pvestatd.service: Failed with result 'signal'.
Aug 31 16:39:55 pve1 systemd[1]: pvestatd.service: Consumed 7h 27min 59.427s CPU time, 160.6M memory peak.

I was able to manually restart the service and it seems to work fine now
Sep 01 10:20:25 pve1 systemd[1]: Starting pvestatd.service - PVE Status Daemon...
Sep 01 10:20:27 pve1 pvestatd[2099919]: starting server
Sep 01 10:20:27 pve1 systemd[1]: Started pvestatd.service - PVE Status Daemon.

I checked for any other segfaults
root@pve1:~# journalctl | grep segfault
Nov 03 23:02:28 pve1 kernel: dnsmasq[228401]: segfault at 5ea02dc81e1e ip 00007445fd6c42d5 sp 00007ffc327293a8 error 4 in libdbus-1.so.3.32.4[7445fd6a1000+30000] likely on CPU 0 (core 0, socket 0)
Nov 28 08:55:59 pve1 kernel: dnsmasq[1685]: segfault at 616309463d64 ip 00007d02c4f892d5 sp 00007ffe04781958 error 4 in libdbus-1.so.3.32.4[7d02c4f66000+30000] likely on CPU 3 (core 3, socket 0)
Dec 08 22:11:05 pve1 kernel: dnsmasq[1265]: segfault at 5941068861ac ip 00007c80e1a0a2d5 sp 00007fff0b538e28 error 4 in libdbus-1.so.3.32.4[7c80e19e7000+30000] likely on CPU 1 (core 1, socket 0)
Aug 10 04:24:01 pve1 kernel: task UPID:pve1:[2514843]: segfault at 2989dc5a8 ip 00007529bff58087 sp 00007ffc9b12f8c0 error 4 in libc.so.6[7529bfee9000+155000] likely on CPU 2 (core 2, socket 0)
Aug 21 14:29:43 pve1 kernel: python3[2469825]: segfault at ffffffffff8 ip 00007d1eaae60efa sp 00007ffcc1a05fa0 error 4 in libc.so.6[7d1eaadee000+155000] likely on CPU 0 (core 0, socket 0)
Aug 31 09:12:34 pve1 kernel: python3[1528251]: segfault at 100000000008 ip 00007d900a4b9653 sp 00007ffd7a3b4c90 error 4 in libcrypto.so.3[26d653,7d900a343000+381000] likely on CPU 0 (core 0, socket 1)
Aug 31 16:39:55 pve1 kernel: pvestatd[1862]: segfault at 100000000000 ip 000063217ac22321 sp 00007ffc2cfa5140 error 4 in perl[95321,63217abd1000+1ae000] likely on CPU 3 (core 3, socket 1)

and it looks like there was one segfault a couple of hours earlier (with no reboot in between)
Aug 31 09:12:34 pve1 kernel: python3[1528251]: segfault at 100000000008 ip 00007d900a4b9653 sp 00007ffd7a3b4c90 error 4 in libcrypto.so.3[26d653,7d900a343000+381000] likely on CPU 0 (core 0, socket 1)
Aug 31 09:12:35 pve1 kernel: Code: 89 ee 48 89 f5 49 c1 e6 03 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 49 8b 07 4a 8b 1c 30 48 85 db 74 19 0f 1f 40 00 48 89 d8 <48> 8b 5b 08 48 89 ee 48 8b 38 41 ff d4 48 85 db 75 eb 41 83 ed 01


The node was upgraded from Proxmox 8.4 to 9 on 27th August
A third segfault happened on Promox 8.4
Aug 21 14:29:43 pve1 kernel: python3[2469825]: segfault at ffffffffff8 ip 00007d1eaae60efa sp 00007ffcc1a05fa0 error 4 in libc.so.6[7d1eaadee000+155000] likely on CPU 0 (core 0, socket 0)
Aug 21 14:29:43 pve1 kernel: Code: ac 2c 10 00 e8 f7 62 fe ff 0f 1f 80 00 00 00 00 48 85 ff 0f 84 bf 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d e6 8e 13 00 <48> 8b 47 f8 64 8b 2b a8 02 75 5b 48 8b 15 6c 8e 13 00 64 48 83 3a


The CPU is quite old : AMD Embedded G-Series GX-420GI Radeon R7E

I am not sure if this is more likely due to a faulty hardware or a bug. Happy to provide more details if that helps to investigate
 
Hi!

As there are multiple programs segfaulting, I'd check if there are any problems with memory (memtest), filesystem corruptions or package corruptions (e.g. smart tests, checking packages with debsums -c, etc.).