No VMs with CEPH Storage will start after update to 8.3.2 and CEPH Squid

Chaparral Wireless · Jan 5, 2025

When I say Production understand that there is very little that is external customer facing. 98% of the VMs and CTs are running monitoring and management tools for our Tech Support and Customer Service teams... i.e. GenieACS, LibreNMS... etc... These are critical for my teams to be able to function and provide quality service to our customers but do not have any real impact on network performance. The bulk of our usage is for DevOPs and internal production. The most resource intensive things that we run are

LibreNMS - Monitoring
FreeNAS - Local SMB Shares
HSNM - Hotspot Manager for Customer Pay-2-Play venues
GenieACS - Customer Router Configuration and management
Mikrotik CHRs for DUDE, VPN Concentrators and Lab Environment

This being said I understand that when we built this environment it was done with extreme overkill on system resources. We have not even began to touch the Storage, RAM and CPU capabilities of the cluster. In the future we are planning on hosting for customers however we are not there yet. I would very much like to pick your brain on specifics but not on open forum would you mind taking this conversation offline?

alexskysilk said:
you're not a pain, I should have anticipated the question. open a shell prompt to proceed:

rbd -p [poolname] ls will get you a list of virtual disks, which will be named vm-[vmid]-disk-[n]

for each you want to backup:
1. take a snapshot: rbd snap create [poolname]/vm-xxx-disk-n@[name] (name can be anything, I'd use something like $(date +%Y%m%d-%Hh%M)
2. next, write it out, like so:
rbd export poolname/vm-xxx-disk-n@snapname - | nice zstd -T4 -o /path/to/backup/location/vm-xxx-disk-n.zst
3. copy the vm config file:
cp /etc/pve/nodes/[node hosting the vm]/qemu-server/vmid.conf /path/to/backup/location/

To restore:
zstd -d -c /path/to/backup/location/vm-xxx-disk-n.zst | rbd import - newpoolname/vm-xxx-disk-n@snapname (can be anything)
cp /path/to/backup/location/vmid.conf /etc/pve/[new node to host]/qemu-server/

The above is, more or less, what proxmox's own vzdump tool does; You should probably do this now in any case since you have no other backups.

When I run the above restore command I get this.
zstd -d -c /mnt/pve/ColdStorage/backup/backup/vm-1001-disk-0-NVME.zst |rbd import - NVME/vm-1001-disk-0-NVME@restored
rbd: destination snapshot name specified for a command that doesn't use it

if I leave the @ restored off it works, the drive shows up under VMdisks in CEPH
zstd -d -c /mnt/pve/ColdStorage/backup/backup/vm-1001-disk-0-NVME.zst |rbd import - NVME/vm-1001-disk-0-NVME
Importing image: 100% complete...done.

but then the VM will not start and I get the follwoing error
kvm: -drive file=rbd:NVME/vm-1001-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/NVME.keyring,if=none,id=drive-ide0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: error reading header from vm-1001-disk-0: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1

What am I missing?
Any help with this would be great...

Chaparral Wireless · Jan 5, 2025

Chaparral Wireless said:
When I say Production understand that there is very little that is external customer facing. 98% of the VMs and CTs are running monitoring and management tools for our Tech Support and Customer Service teams... i.e. GenieACS, LibreNMS... etc... These are critical for my teams to be able to function and provide quality service to our customers but do not have any real impact on network performance. The bulk of our usage is for DevOPs and internal production. The most resource intensive things that we run are

LibreNMS - Monitoring
FreeNAS - Local SMB Shares
HSNM - Hotspot Manager for Customer Pay-2-Play venues
GenieACS - Customer Router Configuration and management
Mikrotik CHRs for DUDE, VPN Concentrators and Lab Environment

This being said I understand that when we built this environment it was done with extreme overkill on system resources. We have not even began to touch the Storage, RAM and CPU capabilities of the cluster. In the future we are planning on hosting for customers however we are not there yet. I would very much like to pick your brain on specifics but not on open forum would you mind taking this conversation offline?

When I run the above restore command I get this.
zstd -d -c /mnt/pve/ColdStorage/backup/backup/vm-1001-disk-0-NVME.zst |rbd import - NVME/vm-1001-disk-0-NVME@restored
rbd: destination snapshot name specified for a command that doesn't use it

if I leave the @ restored off it works, the drive shows up under VMdisks in CEPH
zstd -d -c /mnt/pve/ColdStorage/backup/backup/vm-1001-disk-0-NVME.zst |rbd import - NVME/vm-1001-disk-0-NVME
Importing image: 100% complete...done.

but then the VM will not start and I get the follwoing error
kvm: -drive file=rbd:NVME/vm-1001-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/NVME.keyring,if=none,id=drive-ide0,format=raw,cache=none,aio=io_uring,detect-zeroes=on: error reading header from vm-1001-disk-0: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1

What am I missing?
Any help with this would be great...

nevermind!! I found it...

zstd -d -c /mnt/pve/ColdStorage/backup/backup/vm-1001-disk-0-NVME.zst |rbd import - NVME/vm-1001-disk-0-NVME
VS
zstd -d -c /mnt/pve/ColdStorage/backup/backup/vm-1001-disk-0-NVME.zst |rbd import - NVME/vm-1001-disk-0

now however, I have an issue... all the extra imports with -NVME at the end are there... but I cannot delete them...

gurubert · Jan 5, 2025

Chaparral Wireless said:
ceph osd crush rule create-replicated Spinner default root hdd

This would define the availability zone at the root of the CRUSH map tree and will surely not work.

Search

Search

No VMs with CEPH Storage will start after update to 8.3.2 and CEPH Squid

Chaparral Wireless

Member

Chaparral Wireless

Member

gurubert

Distinguished Member

We value your privacy