Segfault error 7 in libpthread-2.24.so

Paul01

Member
Jan 13, 2018
11
1
8
27
Good morning,

yesterday a vm shut down because of a segfault error 7 in libpthread-2.24.so.
We are using the newest version of proxmox-ve on Debian Stretch.

Code:
root@proxmox /home/john # pveversion  -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-21-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-9
pve-kernel-4.15.18-21-pve: 4.15.18-48
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-16-pve: 4.15.18-41
pve-kernel-4.15.18-13-pve: 4.15.18-37
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: not correctly installed
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-55
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3

The whole kernellog at this time:
Code:
Oct 22 02:06:53 proxmox pveupdate\[11595\]: <root@pam> starting task UPID:proxmox:00002D5F:0AE88289:5DAE481D:aptupdate::root@pam:
Oct 22 02:06:56 proxmox pveupdate\[11595\]: <root@pam> end task UPID:proxmox:00002D5F:0AE88289:5DAE481D:aptupdate::root@pam: OK
Oct 22 21:33:46 proxmox kernel: \[1900106.581083\] kvm\[2308\]: segfault at ffffffffffffff81 ip 00007fd4da90e1e0 sp 00007ffe52bbf768 error 7 in libpthread-2.24.so\[7fd4da903000+18000\]
Oct 22 21:33:46 proxmox kernel: \[1900106.581094\] mce: \[Hardware Error\]: Machine check events logged
Oct 22 21:33:46 proxmox kernel: \[1900106.666663\] fwbr102i0: port 2(tap102i0) entered disabled state
Oct 22 21:33:46 proxmox kernel: \[1900106.666938\] fwbr102i0: port 2(tap102i0) entered disabled state
Oct 22 21:33:47 proxmox kernel: \[1900107.407575\] fwbr102i0: port 1(fwln102i0) entered disabled state
Oct 22 21:33:47 proxmox kernel: \[1900107.407634\] vmbr0: port 3(fwpr102p0) entered disabled state
Oct 22 21:33:47 proxmox kernel: \[1900107.407855\] device fwln102i0 left promiscuous mode
Oct 22 21:33:47 proxmox kernel: \[1900107.407886\] fwbr102i0: port 1(fwln102i0) entered disabled state
Oct 22 21:33:47 proxmox kernel: \[1900107.442702\] device fwpr102p0 left promiscuous mode
Oct 22 21:33:47 proxmox kernel: \[1900107.442737\] vmbr0: port 3(fwpr102p0) entered disabled state
Oct 23 02:14:53 proxmox pveupdate\[21930\]: <root@pam> starting task UPID:proxmox:000055BE:0B6D1406:5DAF9B7D:aptupdate::root@pam:
Oct 23 02:14:56 proxmox pveupdate\[21930\]: <root@pam> end task UPID:proxmox:000055BE:0B6D1406:5DAF9B7D:aptupdate::root@pam: OK

Last update of Proxmox was done 3-4 weeks ago.

I would appreciate some ideas! ;)

Best regards
Paul
 
Check with debsums $(dpkg -S libpthread-2.24.so | cut -d: -f1) if the file is still ok. Check also if there are any entries in the journal/syslog relating to hardware issues.
 
The files are okay. Changed it again.
Can'f find any issues in the logs on the proxmox server.
 
Memory issues? Otherwise it is hard to tell without more information. :/
 
Bad news. Does not seem to be a memory issue.
What information do u need?

If the server crashs i can see following lines on the proxmoxhost in the syslog and on the nodes (on the one windowshost i dont know):
Code:
Nov 23 05:06:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Nov 23 05:06:00 proxmox systemd[1]: Started Proxmox VE replication runner.
Nov 23 05:07:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Nov 23 05:07:00 proxmox systemd[1]: Started Check_MK (xxxxxxx:47910).
Nov 23 05:07:00 proxmox systemd[1]: Started Proxmox VE replication runner.
Nov 23 05:08:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Nov 23 05:08:01 proxmox systemd[1]: Started Proxmox VE replication runner.
Nov 23 05:08:01 proxmox systemd[1]: Started Check_MK (3xxxxxxx:47926).
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Nov 23 12:49:59 proxmox systemd[1]: Started Create list of required static device nodes for the current kernel.
 
Last edited:
You mean the slab cache?
Mhhh i dont think so. Looks like a total crash:

crash.png

Ahhh and FYI: CheckMK was installed after the first crashes.
 
I installed mcelog.
Let's have a look at the next crash..

Thank you for your create support!
 
Soo.. we got the next crash.

Syslog on the proxmox host:
Code:
Dec 11 06:42:56 proxmox systemd[1]: Started Check_MK (xxxxxxxxxx:48842).
Dec 11 06:43:00 proxmox systemd[1]: Starting Proxmox VE replication runner...
Dec 11 06:43:01 proxmox systemd[1]: Started Proxmox VE replication runner.
Dec 12 06:52:48 proxmox systemd-modules-load[327]: Inserted module 'vhost_net'
Dec 12 06:52:48 proxmox systemd[1]: Started Apply Kernel Variables.
Dec 12 06:52:48 proxmox systemd[1]: Mounted RPC Pipe File System.
Dec 12 06:52:48 proxmox systemd[1]: Started Remount Root and Kernel File Systems.

No "@@@@@@" ?

mcelog is empty but the daemon was running.

After server restart:
Code:
/var/log/syslog:Dec 12 06:52:48 proxmox mcelog: failed to prefill DIMM database from DMI data
/var/log/syslog:Dec 12 06:52:48 proxmox mcelog[847]: Starting Machine Check Exceptions decoder: mcelog.
/var/log/syslog:Dec 12 06:52:48 proxmox mcelog: warning: 32 bytes ignored in each record
/var/log/syslog:Dec 12 06:52:48 proxmox mcelog: consider an update

after mcelog restart:
Code:
Dec 12 19:03:41 proxmox systemd[1]: Starting LSB: Machine Check Exceptions (MCE) collector & decoder...
Dec 12 19:03:41 proxmox mcelog[18810]: failed to prefill DIMM database from DMI data
Dec 12 19:03:41 proxmox mcelog[18796]: Starting Machine Check Exceptions decoder: mcelog.
Dec 12 19:03:41 proxmox systemd[1]: Started LSB: Machine Check Exceptions (MCE) collector & decoder.
 
No "@@@@@@" ?
That isn't always written.

/var/log/syslog:Dec 12 06:52:48 proxmox mcelog: warning: 32 bytes ignored in each record
/var/log/syslog:Dec 12 06:52:48 proxmox mcelog: consider an update
It might not be able to parse the records on this platform. You can try to use EDAC (if not already enabled).
https://buttersideup.com/mediawiki/index.php/Main_Page

But besides that, the hardware has a fault and possibly the best solution is to replace parts to rule them out.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!