Hello,
This morning we had a major incident, all our proxmox nodes where fenced at the same time.
In the log, this seems to be the problem:
messages.log
Sep 1 10:50:14 hostname kernel: [932130.006753] show_signal_msg: 6 callbacks suppressed
Sep 1 10:50:14 hostname kernel: [932130.006757] cfs_loop[1832]: segfault at 7f7300c07f99 ip 000055bf7f70e7b0 sp 00007f730634f318 error 4 in pmxcfs[55bf7f6f5000+1b000]
Sep 1 10:50:14 hostname kernel: [932130.006771] Code: 10 48 89 c6 48 89 ef 48 89 10 48 8b 53 08 48 89 50 08 48 89 c2 e8 50 74 fe ff b8 01 00 00 00 e9 4a ff ff ff 66 0f 1f 44 00 00 <8b> 47 0c 8b 56 0c 39 d0 75 0d 48 8b 47 10 48 8b 56 10 48 39 d0 74
This is very annoying, because it's our production environment.
Does anyone knows what was triggering this issue? Is this a bug?
I found some related threads with the same problem, but no real solution:
https://forum.proxmox.com/threads/segfault-on-all-cluster-nodes.51960/
https://forum.proxmox.com/threads/a...cluster-mit-12-nodes-segfault-cfs_loop.51913/
Any ideas what triggered this segfault and how to avoid it in the future?
Attached the daemon.log with some more info.
Thanks!
Kind regards,
Erik
Proxmox version
This morning we had a major incident, all our proxmox nodes where fenced at the same time.
In the log, this seems to be the problem:
messages.log
Sep 1 10:50:14 hostname kernel: [932130.006753] show_signal_msg: 6 callbacks suppressed
Sep 1 10:50:14 hostname kernel: [932130.006757] cfs_loop[1832]: segfault at 7f7300c07f99 ip 000055bf7f70e7b0 sp 00007f730634f318 error 4 in pmxcfs[55bf7f6f5000+1b000]
Sep 1 10:50:14 hostname kernel: [932130.006771] Code: 10 48 89 c6 48 89 ef 48 89 10 48 8b 53 08 48 89 50 08 48 89 c2 e8 50 74 fe ff b8 01 00 00 00 e9 4a ff ff ff 66 0f 1f 44 00 00 <8b> 47 0c 8b 56 0c 39 d0 75 0d 48 8b 47 10 48 8b 56 10 48 39 d0 74
This is very annoying, because it's our production environment.
Does anyone knows what was triggering this issue? Is this a bug?
I found some related threads with the same problem, but no real solution:
https://forum.proxmox.com/threads/segfault-on-all-cluster-nodes.51960/
https://forum.proxmox.com/threads/a...cluster-mit-12-nodes-segfault-cfs_loop.51913/
Any ideas what triggered this segfault and how to avoid it in the future?
Attached the daemon.log with some more info.
Thanks!
Kind regards,
Erik
Proxmox version
Code:
# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.44-2-pve)
pve-manager: 6.2-10 (running version: 6.2-10/a20769ed)
pve-kernel-5.4: 6.2-4
pve-kernel-helper: 6.2-4
pve-kernel-5.4.44-2-pve: 5.4.44-2
pve-kernel-5.4.44-1-pve: 5.4.44-1
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-2
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-5
libpve-guest-common-perl: 3.1-2
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.2-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-9
pve-cluster: 6.1-8
pve-container: 3.1-12
pve-docs: 6.2-5
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-3
pve-qemu-kvm: 5.0.0-11
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-11
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1