I have a three node proxmox ceph cluster. I needed to re-organize the hardware in the rack, which required removing all the hardware, including the ceph/proxmox nodes.
Before I powered off the proxmox/ceph nodes I set norecover, noout and norebalance. Then I powered off the nodes.
Now, a few days later the rack rebuild is complete and I've booted up the nodes. The proxmox cluster joined up without issue, but the ceph cluster is in a strange state.
* All `ceph ..` commands hang
* The ceph web ui throws a timeout
*
* journalctl for that ceph-volume service shows:
*
Here is a sample /etc/pve/ceph.conf from one of my three nodes:
I'm really not sure how to proceed with troubleshooting this.
PVE: 7.0-11
Before I powered off the proxmox/ceph nodes I set norecover, noout and norebalance. Then I powered off the nodes.
Now, a few days later the rack rebuild is complete and I've booted up the nodes. The proxmox cluster joined up without issue, but the ceph cluster is in a strange state.
* All `ceph ..` commands hang
* The ceph web ui throws a timeout
*
systemctl status "ceph*"
shows everything active except ceph-volume@lvm-UUID.service on each node.* journalctl for that ceph-volume service shows:
Nov 28 17:45:48 mill.mgmt.socozy.casa sh[1065]: Running command: /usr/sbin/ceph-volume lvm trigger 5-199f906e-6dae-4fa6-9c2f-2f2e927dafbf
*
ceph-volume lvm activate --all
hangs with:
Code:
root@mill:/var/log/ceph# ceph-volume lvm activate --all
--> Activating OSD ID 2 FSID 8fd557f9-52ed-48fd-9297-ab3ce3372841
Running command: /usr/bin/ceph --cluster ceph --name client.osd-lockbox.8fd557f9-52ed-48fd-9297-ab3ce3372841 --keyring /var/lib/ceph/osd/ceph-2/lockbox.keyring config-key get dm-crypt/osd/8fd557f9-52ed-48fd-9297-ab3ce3372841/luks
Here is a sample /etc/pve/ceph.conf from one of my three nodes:
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 10.9.10.13/23
fsid = ddc81dc1-c8a4-42b0-8005-ce22ef4d1635
mon_allow_pool_delete = true
mon_host = 10.9.10.18 10.9.10.16 10.9.10.13
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.9.10.13/23
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mon.ibnsina]
public_addr = 10.9.10.16
[mon.mill]
public_addr = 10.9.10.13
[mon.peirce]
public_addr = 10.9.10.18
I'm really not sure how to proceed with troubleshooting this.
PVE: 7.0-11
Last edited: