No VMs with CEPH Storage will start after update to 8.3.2 and CEPH Squid

alexskysilk · Jan 2, 2025

you're not a pain, I should have anticipated the question. open a shell prompt to proceed:

rbd -p [poolname] ls will get you a list of virtual disks, which will be named vm-[vmid]-disk-[n]

for each you want to backup:
1. take a snapshot: rbd snap create [poolname]/vm-xxx-disk-n@[name] (name can be anything, I'd use something like $(date +%Y%m%d-%Hh%M)
2. next, write it out, like so:
rbd export poolname/vm-xxx-disk-n@snapname - | nice zstd -T4 -o /path/to/backup/location/vm-xxx-disk-n.zst
3. copy the vm config file:
cp /etc/pve/nodes/[node hosting the vm]/qemu-server/vmid.conf /path/to/backup/location/

To restore:
zstd -d -c /path/to/backup/location/vm-xxx-disk-n.zst | rbd import - newpoolname/vm-xxx-disk-n@snapname (can be anything)
cp /path/to/backup/location/vmid.conf /etc/pve/[new node to host]/qemu-server/

The above is, more or less, what proxmox's own vzdump tool does; You should probably do this now in any case since you have no other backups.

Chaparral Wireless · Jan 2, 2025

alexskysilk said:
you're not a pain, I should have anticipated the question. open a shell prompt to proceed:

rbd -p [poolname] ls will get you a list of virtual disks, which will be named vm-[vmid]-disk-[n]

for each you want to backup:
1. take a snapshot: rbd snap create [poolname]/vm-xxx-disk-n@[name] (name can be anything, I'd use something like $(date +%Y%m%d-%Hh%M)
2. next, write it out, like so:
rbd export poolname/vm-xxx-disk-n@snapname - | nice zstd -T4 -o /path/to/backup/location/vm-xxx-disk-n.zst
3. copy the vm config file:
cp /etc/pve/nodes/[node hosting the vm]/qemu-server/vmid.conf /path/to/backup/location/

To restore:
zstd -d -c /path/to/backup/location/vm-xxx-disk-n.zst | rbd import - newpoolname/vm-xxx-disk-n@snapname (can be anything)
cp /path/to/backup/location/vmid.conf /etc/pve/[new node to host]/qemu-server/

The above is, more or less, what proxmox's own vzdump tool does; You should probably do this now in any case since you have no other backups.

Thank you so much for your help and patience!
I am going to give this a shot. This is going to take a while... looking at about 10TB of usage between the 3 pools...

Chaparral Wireless · Jan 2, 2025

Chaparral Wireless said:
Thank you so much for your help and patience!
I am going to give this a shot. This is going to take a while... looking at about 10TB of usage between the 3 pools...

I am guessing there is no "backup all" function...

alexskysilk · Jan 2, 2025

Where there is shell, there is a way

fire up those shell scripting skills.

Chaparral Wireless · Jan 2, 2025

alexskysilk said:
Where there is shell, there is a way fire up those shell scripting skills.

**I should have anticipated the response dusting off my old scripting hat... thank you again for all your help!

Chaparral Wireless · Jan 2, 2025

one last thing... sorry...
PBS is probably the only place I have enough storage to back everything up to... I have this added to my cluster but I am not sure how to browse or point to it from pve... is there an easy way in PVE?

alexskysilk · Jan 2, 2025

Chaparral Wireless said:
I have this added to my cluster but I am not sure how to browse or point to it from pve... is there an easy way in PVE?

there's nothing stopping you from creating a share on your pbs machine and mounting it on the host you're working on.

Chaparral Wireless · Jan 4, 2025

ok, I managed to export all my VMs and CTs. Before I wiped the Servers Down/Out/Destroyed all my OSDs. I rebuilt my proxmox cluster, recreated the OSDs and everything went well... until... I checked on my Ceph cluster status... I have not even started to do any of the restores to my CEPH. I have copied all the files that I backed up back over to the local-ZFS on PVE1

ceph health detail
HEALTH_WARN 1 MDSs report slow metadata IOs; Reduced data availability: 768 pgs inactive
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.pve5-NAS(mds.0): 1 slow metadata IOs are blocked > 30 secs, oldest blocked for 1879 secs
[WRN] PG_AVAILABILITY: Reduced data availability: 768 pgs inactive
pg 6.cd is stuck inactive for 2m, current state unknown, last acting []
pg 6.ce is stuck inactive for 2m, current state unknown, last acting []
pg 6.cf is stuck inactive for 2m, current state unknown, last acting []
pg 6.d0 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d1 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d2 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d3 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d4 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d5 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d6 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d7 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d8 is stuck inactive for 2m, current state unknown, last acting []
pg 6.d9 is stuck inactive for 2m, current state unknown, last acting []
pg 6.da is stuck inactive for 2m, current state unknown, last acting []
pg 6.db is stuck inactive for 2m, current state unknown, last acting []
pg 6.dc is stuck inactive for 2m, current state unknown, last acting []
pg 6.dd is stuck inactive for 2m, current state unknown, last acting []
pg 6.de is stuck inactive for 2m, current state unknown, last acting []
pg 6.df is stuck inactive for 2m, current state unknown, last acting []
pg 6.e0 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e1 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e2 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e3 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e4 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e5 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e6 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e7 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e8 is stuck inactive for 2m, current state unknown, last acting []
pg 6.e9 is stuck inactive for 2m, current state unknown, last acting []
pg 6.ea is stuck inactive for 2m, current state unknown, last acting []
pg 6.eb is stuck inactive for 2m, current state unknown, last acting []
pg 6.ec is stuck inactive for 2m, current state unknown, last acting []
pg 6.ed is stuck inactive for 2m, current state unknown, last acting []
pg 6.ee is stuck inactive for 2m, current state unknown, last acting []
pg 6.ef is stuck inactive for 2m, current state unknown, last acting []
pg 6.f0 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f1 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f2 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f3 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f4 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f5 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f6 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f7 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f8 is stuck inactive for 2m, current state unknown, last acting []
pg 6.f9 is stuck inactive for 2m, current state unknown, last acting []
pg 6.fa is stuck inactive for 2m, current state unknown, last acting []
pg 6.fb is stuck inactive for 2m, current state unknown, last acting []
pg 6.fc is stuck inactive for 2m, current state unknown, last acting []
pg 6.fd is stuck inactive for 2m, current state unknown, last acting []
pg 6.fe is stuck inactive for 2m, current state unknown, last acting []
pg 6.ff is stuck inactive for 2m, current state unknown, last acting []

ceph -s
cluster:
id: 564cbcd9-4e27-4c54-a46d-214f262b503d
health: HEALTH_WARN
1 MDSs report slow metadata IOs
Reduced data availability: 768 pgs inactive

services:
mon: 3 daemons, quorum pve7-NAS,pve5-NAS,pve1 (age 33m)
mgr: pve7-NAS(active, since 3m), standbys: pve1, pve5-NAS
mds: 1/1 daemons up, 1 standby
osd: 24 osds: 24 up (since 34m), 24 in (since 58m)

data:
volumes: 1/1 healthy
pools: 7 pools, 801 pgs
objects: 24 objects, 1.9 MiB
usage: 833 MiB used, 164 TiB / 164 TiB avail
pgs: 95.880% pgs unknown
768 unknown
33 active+clean

cat /etc/pve/ceph.conf
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = 172.30.250.0/24
fsid = 564cbcd9-4e27-4c54-a46d-214f262b503d
mon_allow_pool_delete = true
mon_host = 10.10.104.13 10.10.104.14 10.10.104.16 10.10.104.10
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = 10.10.104.0/24

[client]
keyring = /etc/pve/priv/$cluster.$name.keyring

[client.crash]
keyring = /etc/pve/ceph/$cluster.$name.keyring

[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[mds.pve5-NAS]
host = pve5-NAS
mds_standby_for_name = pve

[mds.pve7-NAS]
host = pve7-NAS
mds_standby_for_name = pve

[mon.pve1]
public_addr = 10.10.104.10

[mon.pve5-NAS]
public_addr = 10.10.104.14

[mon.pve7-NAS]
public_addr = 10.10.104.16

CrusMap

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class nvme
device 1 osd.1 class nvme
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class nvme
device 6 osd.6 class nvme
device 7 osd.7 class ssd
device 8 osd.8 class ssd
device 9 osd.9 class ssd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class hdd
device 15 osd.15 class hdd
device 16 osd.16 class hdd
device 17 osd.17 class hdd
device 18 osd.18 class hdd
device 19 osd.19 class hdd
device 20 osd.20 class hdd
device 21 osd.21 class hdd
device 22 osd.22 class nvme
device 23 osd.23 class nvme

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pve5-NAS {
id -3 # do not change unnecessarily
id -4 class nvme # do not change unnecessarily
id -5 class ssd # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
# weight 80.04298
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.81940
item osd.1 weight 1.81940
item osd.2 weight 3.63869
item osd.3 weight 3.63869
item osd.4 weight 3.63869
item osd.10 weight 3.63869
item osd.11 weight 3.63869
item osd.14 weight 14.55269
item osd.15 weight 14.55269
item osd.16 weight 14.55269
item osd.17 weight 14.55269
}
host pve7-NAS {
id -7 # do not change unnecessarily
id -8 class nvme # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
id -11 class hdd # do not change unnecessarily
# weight 83.68178
alg straw2
hash 0 # rjenkins1
item osd.5 weight 1.81940
item osd.6 weight 1.81940
item osd.7 weight 3.63869
item osd.8 weight 3.63869
item osd.9 weight 3.63869
item osd.12 weight 3.63869
item osd.13 weight 3.63869
item osd.18 weight 14.55269
item osd.19 weight 14.55269
item osd.20 weight 14.55269
item osd.21 weight 14.55269
item osd.22 weight 1.81940
item osd.23 weight 1.81940
}
root default {
id -1 # do not change unnecessarily
id -2 class nvme # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
id -12 class hdd # do not change unnecessarily
# weight 163.72476
alg straw2
hash 0 # rjenkins1
item pve5-NAS weight 80.04298
item pve7-NAS weight 83.68178
}

# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule NVME {
id 1
type replicated
step take default class nvme
step chooseleaf firstn 0 type root
step emit
}
rule SSD {
id 2
type replicated
step take default class ssd
step chooseleaf firstn 0 type root
step emit
}
rule Spinner {
id 3
type replicated
step take default class hdd
step chooseleaf firstn 0 type root
step emit
}

# end crush map

gurubert · Jan 4, 2025

Does Pool 6 still have size=3?

It is really not recommended to run OSDs on just two nodes.

Chaparral Wireless · Jan 4, 2025

I understand it is not recommended to run on just 2 nodes. this will be fixed soon however for the moment this is what i have to do. We have another server that will become pve6-NAS however it is still running for our old VMware environment that had critical systems that we could not migrate and fortunately it was still running and we were able to revert some of our systems back to the old vmware when proxmox went down.

alexskysilk · Jan 4, 2025

check the logs of your osds hosting an affected pg. answers would be there. logs can be found at /var/log/ceph/ceph-osd.osdid.log

still looks like network problems.

Chaparral Wireless · Jan 4, 2025

this makes no sense I cannot find where these 384 "unknown" pgs are. It does not show what OSD(s) these are on. I cannot create new VMs or migrate storage to any of my CEPH pools.

There is literally no data in my CEPH pools at this point. I have tried killing all my OSDs and rebuilding them gain making sure I did a full disk wipe before adding them back in. this is happening on a fresh install of Proxmox 8.3.2 with CEPH Squid

root@pve5-NAS:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
8 hdd 14.55269 1.00000 15 TiB 73 MiB 11 MiB 1 KiB 61 MiB 15 TiB 0 0.36 0 up
9 hdd 14.55269 1.00000 15 TiB 73 MiB 11 MiB 1 KiB 61 MiB 15 TiB 0 0.36 0 up
10 hdd 14.55269 1.00000 15 TiB 73 MiB 11 MiB 1 KiB 61 MiB 15 TiB 0 0.36 0 up
11 hdd 14.55269 1.00000 15 TiB 73 MiB 12 MiB 1 KiB 61 MiB 15 TiB 0 0.37 1 up
0 nvme 1.81940 1.00000 1.8 TiB 37 MiB 11 MiB 1 KiB 26 MiB 1.8 TiB 0.00 1.50 0 up
1 nvme 1.81940 1.00000 1.8 TiB 37 MiB 11 MiB 1 KiB 26 MiB 1.8 TiB 0.00 1.50 0 up
2 nvme 1.81940 1.00000 1.8 TiB 37 MiB 11 MiB 1 KiB 26 MiB 1.8 TiB 0.00 1.50 0 up
3 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
4 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
5 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
6 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
7 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
20 hdd 14.55269 1.00000 15 TiB 439 MiB 12 MiB 4 KiB 27 MiB 15 TiB 0.00 2.20 1 up
21 hdd 14.55269 1.00000 15 TiB 438 MiB 11 MiB 4 KiB 27 MiB 15 TiB 0.00 2.19 0 up
22 hdd 14.55269 1.00000 15 TiB 38 MiB 11 MiB 4 KiB 27 MiB 15 TiB 0 0.19 0 up
23 hdd 14.55269 1.00000 15 TiB 438 MiB 11 MiB 4 KiB 27 MiB 15 TiB 0.00 2.19 0 up
12 nvme 1.81940 1.00000 1.8 TiB 38 MiB 11 MiB 4 KiB 27 MiB 1.8 TiB 0.00 1.52 0 up
13 nvme 1.81940 1.00000 1.8 TiB 38 MiB 11 MiB 4 KiB 27 MiB 1.8 TiB 0.00 1.52 0 up
14 nvme 1.81940 1.00000 1.8 TiB 38 MiB 11 MiB 4 KiB 27 MiB 1.8 TiB 0.00 1.52 0 up
15 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
16 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
17 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
18 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
19 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
TOTAL 164 TiB 2.2 GiB 267 MiB 70 KiB 779 MiB 164 TiB 0.00

root@pve7-NAS:~# ceph osd df
ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS
8 hdd 14.55269 1.00000 15 TiB 73 MiB 11 MiB 1 KiB 61 MiB 15 TiB 0 0.36 0 up
9 hdd 14.55269 1.00000 15 TiB 73 MiB 11 MiB 1 KiB 61 MiB 15 TiB 0 0.36 0 up
10 hdd 14.55269 1.00000 15 TiB 73 MiB 11 MiB 1 KiB 61 MiB 15 TiB 0 0.36 0 up
11 hdd 14.55269 1.00000 15 TiB 73 MiB 12 MiB 1 KiB 61 MiB 15 TiB 0 0.37 1 up
0 nvme 1.81940 1.00000 1.8 TiB 37 MiB 11 MiB 1 KiB 26 MiB 1.8 TiB 0.00 1.50 0 up
1 nvme 1.81940 1.00000 1.8 TiB 37 MiB 11 MiB 1 KiB 26 MiB 1.8 TiB 0.00 1.50 0 up
2 nvme 1.81940 1.00000 1.8 TiB 37 MiB 11 MiB 1 KiB 26 MiB 1.8 TiB 0.00 1.50 0 up
3 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
4 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
5 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
6 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
7 ssd 3.63869 1.00000 3.6 TiB 37 MiB 11 MiB 1 KiB 26 MiB 3.6 TiB 0 0.75 0 up
20 hdd 14.55269 1.00000 15 TiB 439 MiB 12 MiB 4 KiB 27 MiB 15 TiB 0.00 2.20 1 up
21 hdd 14.55269 1.00000 15 TiB 438 MiB 11 MiB 4 KiB 27 MiB 15 TiB 0.00 2.19 0 up
22 hdd 14.55269 1.00000 15 TiB 38 MiB 11 MiB 4 KiB 27 MiB 15 TiB 0 0.19 0 up
23 hdd 14.55269 1.00000 15 TiB 438 MiB 11 MiB 4 KiB 27 MiB 15 TiB 0.00 2.19 0 up
12 nvme 1.81940 1.00000 1.8 TiB 38 MiB 11 MiB 4 KiB 27 MiB 1.8 TiB 0.00 1.52 0 up
13 nvme 1.81940 1.00000 1.8 TiB 38 MiB 11 MiB 4 KiB 27 MiB 1.8 TiB 0.00 1.52 0 up
14 nvme 1.81940 1.00000 1.8 TiB 38 MiB 11 MiB 4 KiB 27 MiB 1.8 TiB 0.00 1.52 0 up
15 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
16 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
17 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
18 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
19 ssd 3.63869 1.00000 3.6 TiB 38 MiB 11 MiB 4 KiB 27 MiB 3.6 TiB 0 0.76 0 up
TOTAL 164 TiB 2.2 GiB 267 MiB 70 KiB 779 MiB 164 TiB 0.00

root@pve7-NAS:~# ceph -s
cluster:
id: 564cbcd9-4e27-4c54-a46d-214f262b503d
health: HEALTH_WARN
Reduced data availability: 384 pgs inactive

services:
mon: 3 daemons, quorum pve7-NAS,pve5-NAS,pve1 (age 43m)
mgr: pve1(active, since 12h), standbys: pve5-NAS, pve7-NAS
osd: 24 osds: 24 up (since 12h), 24 in (since 13h)

data:
pools: 4 pools, 385 pgs
objects: 2 objects, 577 KiB
usage: 2.2 GiB used, 164 TiB / 164 TiB avail
pgs: 99.740% pgs unknown
384 unknown
1 active+clean

alexskysilk · Jan 4, 2025

Chaparral Wireless said:
this makes no sense I cannot find where these 384 "unknown" pgs are.

yeah, agreed- makes no sense- but we have no way of knowing what you did when you set up the cluster.

my advice:
1. delete all daemons
2. pveceph purge
3. pveceph init. use the SAME INTERFACES for public and private; dont try to "fix it"
4. pveceph create monitors
5. pveceph create osds. ceph health should show Health_OK
6. create ONE pool. pick whatever osd class you want to start with.
7. map the rbd pool in proxmox.
8. create a test vm. is all working?

A word on advice on your configuration. Since you're on 10gb interconnects, dont bother to separate SSD/NVME drives to different classes; your bandwidth neutralizes any practical difference, and you're better off having more osd targets then less osds/pool. Also, redistribute your disks so each OSD node has the same capacity for the same OSD class. there is no point to having one node with more capacity since it cannot be used anyway.

Chaparral Wireless · Jan 4, 2025

ok I will give this a shot. thank you for all your help!

Chaparral Wireless · Jan 5, 2025

alexskysilk I know you said to forget about the NVME vs SSD however, my NVME drives are 1TB ea and my SSD are 4TB ea so for grins I spun up a pool for NVME, no issue, CEPH was healthy and happy. Created a SSD pool and instantly got Unknown pg issues again... removed the SSD pool and OSDs Unkown PGs went away. added HDD pool even more Unknown pgs appeared. removed the pool and they went away...

when trying to create a new CTs or VM I get the following...

TASK ERROR: unable to create CT 100 - rbd error: 'storage-NVME'-locked command timed out - aborting

**this after a completely fresh install of Proxmox on all nodes again to be safe.

it seems like the only real option I have is NVME for the moment.

Chaparral Wireless · Jan 5, 2025

it looks like there is an issue with class based crushmap. when I put the pool on "replicated_rule" I can build VMs and everything seems happy. if I put the pool on one of my class based like NVME everything goes haywire. what did I do wrong?

commands ran to create class based crush:
ceph osd crush rule create-replicated NVME default root nvme
ceph osd crush rule create-replicated SSD default root ssd
ceph osd crush rule create-replicated Spinner default root hdd

# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54

# devices
device 0 osd.0 class nvme
device 1 osd.1 class nvme
device 2 osd.2 class nvme
device 3 osd.3 class nvme
device 4 osd.4 class nvme
device 5 osd.5 class nvme
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class ssd
device 11 osd.11 class ssd
device 12 osd.12 class ssd
device 13 osd.13 class ssd
device 14 osd.14 class ssd
device 15 osd.15 class ssd
device 16 osd.16 class ssd
device 17 osd.17 class ssd
device 18 osd.18 class hdd
device 19 osd.19 class hdd

# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root

# buckets
host pve7-NAS {
id -3 # do not change unnecessarily
id -4 class nvme # do not change unnecessarily
id -7 class ssd # do not change unnecessarily
id -10 class hdd # do not change unnecessarily
# weight 52.75700
alg straw2
hash 0 # rjenkins1
item osd.0 weight 1.81940
item osd.1 weight 1.81940
item osd.2 weight 1.81940
item osd.7 weight 3.63869
item osd.14 weight 3.63869
item osd.15 weight 3.63869
item osd.16 weight 3.63869
item osd.17 weight 3.63869
item osd.8 weight 14.55269
item osd.9 weight 14.55269
}
host pve5-NAS {
id -5 # do not change unnecessarily
id -6 class nvme # do not change unnecessarily
id -8 class ssd # do not change unnecessarily
id -11 class hdd # do not change unnecessarily
# weight 52.75700
alg straw2
hash 0 # rjenkins1
item osd.3 weight 1.81940
item osd.4 weight 1.81940
item osd.5 weight 1.81940
item osd.6 weight 3.63869
item osd.10 weight 3.63869
item osd.11 weight 3.63869
item osd.12 weight 3.63869
item osd.13 weight 3.63869
item osd.18 weight 14.55269
item osd.19 weight 14.55269
}
root default {
id -1 # do not change unnecessarily
id -2 class nvme # do not change unnecessarily
id -9 class ssd # do not change unnecessarily
id -12 class hdd # do not change unnecessarily
# weight 105.51401
alg straw2
hash 0 # rjenkins1
item pve7-NAS weight 52.75700
item pve5-NAS weight 52.75700
}

# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
rule NVME {
id 1
type replicated
step take default class nvme
step chooseleaf firstn 0 type root
step emit
}
rule SSD {
id 2
type replicated
step take default class ssd
step chooseleaf firstn 0 type root
step emit
}
rule Spinner {
id 3
type replicated
step take default class hdd
step chooseleaf firstn 0 type root
step emit
}

# end crush map

Chaparral Wireless · Jan 5, 2025

FIXED IT!!
This did not work...
ceph osd crush rule create-replicated NVME default root nvme
ceph osd crush rule create-replicated SSD default root ssd
ceph osd crush rule create-replicated Spinner default root hdd

Created new rules using host instead of root and how my pools work and CEPH is helathy...
ceph osd crush rule create-replicated NVME-1 default host nvme
ceph osd crush rule create-replicated SSD-1 default host ssd
ceph osd crush rule create-replicated Spinner-1 default host hdd

alexskysilk · Jan 5, 2025

Chaparral Wireless said:
FIXED IT!!

As long as this is just for play, well done. understand that this configuration is not sane for production use.

Chaparral Wireless · Jan 5, 2025

This is production... if this is not sane... I really need some help. Any suggestions/advice would be greatly appreciated!

alexskysilk · Jan 5, 2025

Chaparral Wireless said:
Any suggestions/advice would be greatly appreciated!

For starters, your cluster planning is backwards- your trying to fit pools into disks. Lets pretend you dont have ANY existing disk mechanisms. What are you trying to accomplish? what are the use cases you're using the subsystem for? Write down what your considerations are, like so:

1. required usable capacity per application. Add growth projections in TB/Mo.
2. required MINIMUM throughput, in iops or mb/s depending on application
3. fault tolerance requirements, including impact on #2 above. rebalance/rebuild can and will impact. Think of this in terms of revenue impact of a slowness/outage. This can be just a nuisance, create a customer complaint, create revenue loss, all the way to customer loss or lawsuit.

We can proceed to discuss possible solutions consequent to the above.

No VMs with CEPH Storage will start after update to 8.3.2 and CEPH Squid

Distinguished Member

Member

Member

Distinguished Member

Member

Member

Distinguished Member

Member

Attachments

Distinguished Member

Member

Distinguished Member

Member

Distinguished Member

Member

Member

when trying to create a new CTs or VM I get the following...​

​

TASK ERROR: unable to create CT 100 - rbd error: 'storage-NVME'-locked command timed out - aborting​

​

**this after a completely fresh install of Proxmox on all nodes again to be safe.​

it seems like the only real option I have is NVME for the moment.​

Member

Member

Distinguished Member

Member

Distinguished Member

We value your privacy

when trying to create a new CTs or VM I get the following...

TASK ERROR: unable to create CT 100 - rbd error: 'storage-NVME'-locked command timed out - aborting

**this after a completely fresh install of Proxmox on all nodes again to be safe.

it seems like the only real option I have is NVME for the moment.