Segfault: error 4 in pmxcfs

erik.deneve · Sep 1, 2020

Hello,

This morning we had a major incident, all our proxmox nodes where fenced at the same time.
In the log, this seems to be the problem:
messages.log

Sep 1 10:50:14 hostname kernel: [932130.006753] show_signal_msg: 6 callbacks suppressed
Sep 1 10:50:14 hostname kernel: [932130.006757] cfs_loop[1832]: segfault at 7f7300c07f99 ip 000055bf7f70e7b0 sp 00007f730634f318 error 4 in pmxcfs[55bf7f6f5000+1b000]
Sep 1 10:50:14 hostname kernel: [932130.006771] Code: 10 48 89 c6 48 89 ef 48 89 10 48 8b 53 08 48 89 50 08 48 89 c2 e8 50 74 fe ff b8 01 00 00 00 e9 4a ff ff ff 66 0f 1f 44 00 00 <8b> 47 0c 8b 56 0c 39 d0 75 0d 48 8b 47 10 48 8b 56 10 48 39 d0 74

This is very annoying, because it's our production environment.
Does anyone knows what was triggering this issue? Is this a bug?

I found some related threads with the same problem, but no real solution:
https://forum.proxmox.com/threads/segfault-on-all-cluster-nodes.51960/
https://forum.proxmox.com/threads/a...cluster-mit-12-nodes-segfault-cfs_loop.51913/

Any ideas what triggered this segfault and how to avoid it in the future?
Attached the daemon.log with some more info.

Thanks!
Kind regards,
Erik

Proxmox version

Code:

# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

aaron · Sep 1, 2020

Do you have any coredumps in /var/lib/systemd/coredump? These could help tremendously to figure out what is going wrong.

If this is happening regularly, I recommend to temporarily disable the HA services to avoid any fencing.

To do so you can first stop the LRM on all nodes and then the CRM on all nodes.

Code:

systemctl stop pve-ha-lrm
systemctl stop pve-ha-crm

Just be aware that they will start again if the node reboots.

erik.deneve · Sep 2, 2020

aaron said:
Do you have any coredumps in /var/lib/systemd/coredump? These could help tremendously to figure out what is going wrong.

If this is happening regularly, I recommend to temporarily disable the HA services to avoid any fencing.

To do so you can first stop the LRM on all nodes and then the CRM on all nodes.

Code:

systemctl stop pve-ha-lrm systemctl stop pve-ha-crm

Just be aware that they will start again if the node reboots.

We don't have coredumps enabled.
Can you tell how to enable the coredumps (or point to the documentation), we will do that.

Thanks,
Erik

aaron · Sep 3, 2020

They should be generated automatically. Though if you have the HA services enabled it is possible that the nodes fence themselves before the coredump is safely written to disk.

Therefore, I suggest disabling them if you have the problem repeatedly to avoid the fencing of the cluster and to get the coredumps.

As usual, the Arch Linux wiki as a good article about coredumps: https://wiki.archlinux.org/index.php/Core_dump

Search

Search

Segfault: error 4 in pmxcfs

erik.deneve

Member

Attachments

aaron

Proxmox Staff Member

erik.deneve

Member

aaron

Proxmox Staff Member