Hi,
I have a 4-node cluster running Proxmox/Ceph.
In the last week - two of the nodes have gone down multiple times - each time, the nodes seems responsive - however, it disappears from the cluster.
On the console I see a message about a segfault in pmxcfs
Here is the output of pveversion from one of the nodes as well:
Any ideas on what's going on?
(Underlying hardware is a AMD Rome-based (EPYC 7002) system if that matters).
Thanks,
Victor
I have a 4-node cluster running Proxmox/Ceph.
In the last week - two of the nodes have gone down multiple times - each time, the nodes seems responsive - however, it disappears from the cluster.
On the console I see a message about a segfault in pmxcfs
Here is the output of pveversion from one of the nodes as well:
Code:
# pveversion --verbose
proxmox-ve: 6.1-2 (running kernel: 5.3.18-3-pve)
pve-manager: 6.1-8 (running version: 6.1-8/806edfe1)
pve-kernel-helper: 6.1-8
pve-kernel-5.3: 6.1-6
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.8-pve1
ceph-fuse: 14.2.8-pve1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 2.0.1-1+pve8
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libpve-access-control: 6.0-6
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.0-19
libpve-guest-common-perl: 3.0-6
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-5
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.0-2
lxcfs: 4.0.2-pve1
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-3
pve-cluster: 6.1-5
pve-container: 3.1-1
pve-docs: 6.1-6
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.0-10
pve-firmware: 3.0-7
pve-ha-manager: 3.0-9
pve-i18n: 2.0-5
pve-qemu-kvm: 4.2.0-1
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-13
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1
Any ideas on what's going on?
(Underlying hardware is a AMD Rome-based (EPYC 7002) system if that matters).
Thanks,
Victor
Last edited: