ok, I managed to export all my VMs and CTs. Before I wiped the Servers Down/Out/Destroyed all my OSDs. I rebuilt my proxmox cluster, recreated the OSDs and everything went well... until... I checked on my Ceph cluster status... I have not even started to do any of the restores to my CEPH. I have copied all the files that I backed up back over to the local-ZFS on PVE1
ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 768 pgs inactive
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.pve5-NAS(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 1879 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 768 pgs inactive
pg 6.cd is stuck inactive for 2m, current state unknown, last acting []
pg 6.ce is stuck inactive for 2m, current state unknown, last acting []
pg 6.cf is stuck inactive for 2m, current state unknown, last acting []
pg 6.d0 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d1 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d2 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d3 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d4 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d5 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d6 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d7 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d8 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d9 is stuck inactive for 2m, current state unknown, last acting []
pg 6.da is stuck inactive for 2m, current state unknown, last acting []
pg 6.db is stuck inactive for 2m, current state unknown, last acting []
pg 6.dc is stuck inactive for 2m, current state unknown, last acting []
pg 6.dd is stuck inactive for 2m, current state unknown, last acting []
pg 6.de is stuck inactive for 2m, current state unknown, last acting []
pg 6.df is stuck inactive for 2m, current state unknown, last acting []
pg 6.e0 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e1 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e2 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e3 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e4 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e5 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e6 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e7 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e8 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e9 is stuck inactive for 2m, current state unknown, last acting []
pg 6.ea is stuck inactive for 2m, current state unknown, last acting []
pg 6.eb is stuck inactive for 2m, current state unknown, last acting []
pg 6.ec is stuck inactive for 2m, current state unknown, last acting []
pg 6.ed is stuck inactive for 2m, current state unknown, last acting []
pg 6.ee is stuck inactive for 2m, current state unknown, last acting []
pg 6.ef is stuck inactive for 2m, current state unknown, last acting []
pg 6.f0 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f1 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f2 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f3 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f4 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f5 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f6 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f7 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f8 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f9 is stuck inactive for 2m, current state unknown, last acting []
pg 6.fa is stuck inactive for 2m, current state unknown, last acting []
pg 6.fb is stuck inactive for 2m, current state unknown, last acting []
pg 6.fc is stuck inactive for 2m, current state unknown, last acting []
pg 6.fd is stuck inactive for 2m, current state unknown, last acting []
pg 6.fe is stuck inactive for 2m, current state unknown, last acting []
pg 6.ff is stuck inactive for 2m, current state unknown, last acting []
ceph -s
cluster:
id: 564cbcd9-4e27-4c54-a46d-214f262b503d
health: HEALTH_WARN
1 MDSs report slow metadata IOs
Reduced data availability: 768 pgs inactive
services:
mon: 3 daemons, quorum pve7-NAS,pve5-NAS,pve1 (age 33m)
mgr: pve7-NAS(active, since 3m), standbys: pve1, pve5-NAS
mds: 1/1 daemons up, 1 standby
osd: 24 osds: 24 up (since 34m), 24 in (since 58m)
data:
volumes: 1/1 healthy
pools: 7 pools, 801 pgs
objects: 24 objects, 1.9 MiB
usage: 833 MiB used, 164 TiB / 164 TiB avail
pgs: 95.880% pgs unknown
768 unknown
33 active+clean
cat /etc/pve/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.30.250.0/24
fsid = 564cbcd9-4e27-4c54-a46d-214f262b503d
mon_allow_pool_delete = true
mon_host = 10.10.104.13 10.10.104.14 10.10.104.16 10.10.104.10
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.10.104.0/24
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.pve5-NAS]
host = pve5-NAS
mds_standby_for_name = pve
[mds.pve7-NAS]
host = pve7-NAS
mds_standby_for_name = pve
[mon.pve1]
public_addr = 10.10.104.10
[mon.pve5-NAS]
public_addr = 10.10.104.14
[mon.pve7-NAS]
public_addr = 10.10.104.16
CrusMap
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class nvme
device 1 osd.1 class nvme
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class nvme
device 6 osd.6 class nvme
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class nvme
device 23 osd.23 class nvme
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host pve5-NAS {
id -3 # do not change unnecessarily
id -4 class nvme # do not change unnecessarily
id -5 class ssd # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
# weight 80.04298
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.81940
item osd.1 weight 1.81940
item osd.2 weight 3.63869
item osd.3 weight 3.63869
item osd.4 weight 3.63869
item osd.10 weight 3.63869
item osd.11 weight 3.63869
item osd.14 weight 14.55269
item osd.15 weight 14.55269
item osd.16 weight 14.55269
item osd.17 weight 14.55269
}
host pve7-NAS {
id -7 # do not change unnecessarily
id -8 class nvme # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
id -11 class hdd # do not change unnecessarily
# weight 83.68178
alg straw2
hash 0 # rjenkins1
item osd.5 weight 1.81940
item osd.6 weight 1.81940
item osd.7 weight 3.63869
item osd.8 weight 3.63869
item osd.9 weight 3.63869
item osd.12 weight 3.63869
item osd.13 weight 3.63869
item osd.18 weight 14.55269
item osd.19 weight 14.55269
item osd.20 weight 14.55269
item osd.21 weight 14.55269
item osd.22 weight 1.81940
item osd.23 weight 1.81940
}
root default {
id -1 # do not change unnecessarily
id -2 class nvme # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
id -12 class hdd # do not change unnecessarily
# weight 163.72476
alg straw2
hash 0 # rjenkins1
item pve5-NAS weight 80.04298
item pve7-NAS weight 83.68178
}
# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule NVME {
id 1
type replicated
step take default class nvme
step chooseleaf firstn 0 type root
step emit
}
rule SSD {
id 2
type replicated
step take default class ssd
step chooseleaf firstn 0 type root
step emit
}
rule Spinner {
id 3
type replicated
step take default class hdd
step chooseleaf firstn 0 type root
step emit
}
# end crush map