Here are ceph logs and outputs of commands asked prior.
EDIT: I just now saw the
ceph -s
:
Code:
cluster:
id: 18acc013-3ecb-4f72-a025-86882a2a39a4
health: HEALTH_WARN
no active mgr
services:
mon: 3 daemons, quorum sol-ceres-pve,sol-eris-pve,sol-pluto-pve (age 2d)
mgr: no daemons active (since 38m)
mds: 1/1 daemons up, 2 standby
osd: 8 osds: 8 up (since 2d), 8 in (since 2d)
data:
volumes: 1/1 healthy
pools: 5 pools, 129 pgs
objects: 7.78k objects, 26 GiB
usage: 75 GiB used, 14 TiB / 15 TiB avail
pgs: 129 active+clean
pveversion -v
:
Code:
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.3 (running version: 9.0.3/025864202ebb6109)
proxmox-kernel-helper: 9.0.3
proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2
proxmox-kernel-6.14: 6.14.8-2
ceph: 19.2.3-pve1
ceph-fuse: 19.2.3-pve1
corosync: 3.1.9-pve2
criu: 4.1.1-1
frr-pythontools: 10.3.1-1+pve4
ifupdown2: 3.3.0-1+pmx9
intel-microcode: 3.20250512.1
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libproxmox-acme-perl: 1.7.0
libproxmox-backup-qemu0: 2.0.1
libproxmox-rs-perl: 0.4.1
libpve-access-control: 9.0.3
libpve-apiclient-perl: 3.4.0
libpve-cluster-api-perl: 9.0.6
libpve-cluster-perl: 9.0.6
libpve-common-perl: 9.0.9
libpve-guest-common-perl: 6.0.2
libpve-http-server-perl: 6.0.4
libpve-network-perl: 1.1.6
libpve-rs-perl: 0.10.10
libpve-storage-perl: 9.0.13
libspice-server1: 0.15.2-1+b1
lvm2: 2.03.31-2
lxc-pve: 6.0.4-2
lxcfs: 6.0.4-pve1
novnc-pve: 1.6.0-3
proxmox-backup-client: 4.0.11-1
proxmox-backup-file-restore: 4.0.11-1
proxmox-backup-restore-image: 1.0.0
proxmox-firewall: 1.1.1
proxmox-kernel-helper: 9.0.3
proxmox-mail-forward: 1.0.2
proxmox-mini-journalreader: 1.6
proxmox-offline-mirror-helper: 0.7.0
proxmox-widget-toolkit: 5.0.5
pve-cluster: 9.0.6
pve-container: 6.0.9
pve-docs: 9.0.8
pve-edk2-firmware: 4.2025.02-4
pve-esxi-import-tools: 1.0.1
pve-firewall: 6.0.3
pve-firmware: 3.16-3
pve-ha-manager: 5.0.4
pve-i18n: 3.5.2
pve-qemu-kvm: 10.0.2-4
pve-xtermjs: 5.5.0-2
qemu-server: 9.0.16
smartmontools: 7.4-pve1
spiceterm: 3.4.0
swtpm: 0.8.0+pve2
vncterm: 1.9.0
zfsutils-linux: 2.3.3-pve1
dpkg --list | grep -e 'ceph' -e 'rbd' -e 'rados'
:
Code:
ii ceph 19.2.3-pve1 amd64 distributed storage and file system
ii ceph-base 19.2.3-pve1 amd64 common ceph daemon libraries and management tools
ii ceph-common 19.2.3-pve1 amd64 common utilities to mount and interact with a ceph storage cluster
ii ceph-fuse 19.2.3-pve1 amd64 FUSE-based client for the Ceph distributed file system
ii ceph-mds 19.2.3-pve1 amd64 metadata server for the ceph distributed file system
ii ceph-mgr 19.2.3-pve1 amd64 manager for the ceph distributed storage system
ii ceph-mgr-modules-core 19.2.3-pve1 all ceph manager modules which are always enabled
ii ceph-mon 19.2.3-pve1 amd64 monitor server for the ceph storage system
ii ceph-osd 19.2.3-pve1 amd64 OSD server for the ceph storage system
ii ceph-volume 19.2.3-pve1 all tool to facilidate OSD deployment
ii libcephfs2 19.2.3-pve1 amd64 Ceph distributed file system client library
ii librados2 19.2.3-pve1 amd64 RADOS distributed object store client library
ii librados2-perl 1.5.0 amd64 Perl bindings for librados
ii libradosstriper1 19.2.3-pve1 amd64 RADOS striping interface
ii librbd1 19.2.3-pve1 amd64 RADOS block device client library
ii libsqlite3-mod-ceph 19.2.3-pve1 amd64 SQLite3 VFS for Ceph
ii python3-ceph-argparse 19.2.3-pve1 all Python 3 utility libraries for Ceph CLI
ii python3-ceph-common 19.2.3-pve1 all Python 3 utility libraries for Ceph
ii python3-cephfs 19.2.3-pve1 amd64 Python 3 libraries for the Ceph libcephfs library
ii python3-rados 19.2.3-pve1 amd64 Python 3 libraries for the Ceph librados library
ii python3-rbd 19.2.3-pve1 amd64 Python 3 libraries for the Ceph librbd library
ps faxl | grep ceph
:
Code:
1 0 23562 2 0 -20 0 0 rescue I< ? 0:00 \_ [kworker/R-ceph-msgr]
1 0 23592 2 0 -20 0 0 rescue I< ? 0:00 \_ [kworker/R-ceph-watch-notify]
1 0 23593 2 0 -20 0 0 rescue I< ? 0:00 \_ [kworker/R-ceph-completion]
0 0 2328367 2313577 20 0 6528 2108 pipe_r S+ pts/0 0:00 | \_ grep ceph
4 64045 8994 1 20 0 21984 14120 hrtime Ss ? 0:01 /usr/bin/python3 /usr/bin/ceph-crash
4 64045 9498 1 20 0 727416 491468 futex_ Ssl ? 17:22 /usr/bin/ceph-mon -f --cluster ceph --id sol-ceres-pve --setuser ceph --setgroup ceph
4 64045 10211 1 20 0 189748 41776 futex_ Ssl ? 0:44 /usr/bin/ceph-mds -f --cluster ceph --id sol-ceres-pve --setuser ceph --setgroup ceph
4 64045 12355 1 20 0 1249604 591580 futex_ Ssl ? 11:54 /usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
4 64045 14046 1 20 0 1209484 549916 futex_ Ssl ? 12:20 /usr/bin/ceph-osd -f --cluster ceph --id 1 --setuser ceph --setgroup ceph
EDIT: I just now saw the
trash purge
commands and, after running them, my managers seem to be back online. So that does seem to be the issue. How can I prevent crashes in the future automatically?Attachments
Last edited: