[SOLVED] Experimenting with a non-Proxmox Ceph cluster as VM store

mgaudette112

Member
Dec 21, 2023
30
1
8
I've been toying/experimenting with a non-Proxmox Ceph cluster (Reef), with some success.

I do have a Proxmox related question about the interaction between Proxmox and the RBD store - I've set the Ceph cluster as an RDB store for a test VM and created a test VM on it. Then, after much cloning, moving, copying, I shut down the Ceph cluster for the night, after having shutdown the relevant VMs on Proxmox. (but keeping the other VMs functioning)

When the cluster was turned back on this AM, as I was getting ready for more learning, Proxmox (pve-manager/8.2.4) couldn't quite connect to it. Here is what's happening.

1) I can see the "Summary" page of the Ceph storage, and it's updated properly as far as I can tell from the graphs and the known activities going on. So, at a certain level the connection is made

2) If I click on the "VM Disks" page, it times out and Proxmox becomes unresponsive, to the point where I need to restart services to log in and have a working Proxmox UI.

How do I cleanly tell Proxmox to wake up and reestablish a propre connection to RBD, short of a reboot? (Reboot did work the first time it happened, but I'm trying to simulate a production situation in which I want Ceph to start functioning as a store when the cluster comes back up.

Here is a snippet of my storage.cfg

Code:
rbd: ceph
        content images
        krbd 0
        monhost 10.0.8.51,10.0.8.52
        pool proxmoxv2
        username proxmox
 
Last edited:
The Ceph cluster reports HEALTH_OK? Can the PVE node reach(ping) the Ceph MONs and other potential Ceph nodes?
You don't by any chance use a large MTU that might not work as expected anymore?
 
The Ceph cluster reports HEALTH_OK? Can the PVE node reach(ping) the Ceph MONs and other potential Ceph nodes?
You don't by any chance use a large MTU that might not work as expected anymore?
The Ceph cluster reports being healty. I haven't played with the MTU for weeks (months?)

pvesm shows the ceph storage to be active. I can ping it.

At this point I figured I'd poke ceph a little more - I create a ceph namespace in the same pool (something I hadn't done before) and connected Proxmox to it as a new ceph storage.

Situation now - I have two ceph storage, one with a namespace (new one) and one without (the broken one).

The new/namespace storage works. The broken one is still broken. Same ceph pool. Since they are the same pool, usign the same ceph cluster user, same OSDs, same everything, I can't explain it.

A reboot of both the cluster and Proxmox did not help. I am not in any panic situation, as these were VMs meant to learn Ceph, but I'd like to get to the bottom of this would allow a little boost of confidence in my planned prod system.
 
Ok, here is what seems to have happened. There were images in the pool which were in a broken state. I would love to get explanations on exactly what could have happened, but to help others this was my solution:

These commands were run on the Ceph cluster (proxmoxv2 is the name of the pool)
Code:
rbd ls -l proxmoxv2
2024-09-08T10:18:14.624-0400 7f3597fff6c0 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
rbd: error opening vm-3005-disk-0: (2) No such file or directory
rbd: error opening vm-3004-disk-1: (2) No such file or directory
NAME  SIZE  PARENT  FMT  PROT  LOCK
rbd: listing images failed: (2) No such file or directory
root@ceph-test1:~# rbd rm vm-3005-disk-0 -p proxmoxv2
2024-09-08T10:18:33.077-0400 7f63a3fff6c0 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
2024-09-08T10:18:33.081-0400 7f63a3fff6c0 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
2024-09-08T10:18:33.209-0400 7f63a3fff6c0 -1 librbd::image::OpenRequest: failed to retrieve initial metadata: (2) No such file or directory
Removing image: 100% complete...done.

So, from my very limited understanding, there was metadata showing an image file should have been there, and it wasn't. When formally removing the images from CLI (the Ceph dashboard did not list those images, but it did throw errors on the image page) it all started working again.

The why is my remaining question.
 
It happened again, but this time the rbd list of image is indeed empty. So that solution might not have been a universal one.

What do I do with a ceph storage that seems to work, but won't even get me a list of images in the Proxmox UI? (let alone do any I/O for Proxmox)

The cluster is healthy and pvesm shows "Active" for the ceph storage.
 
Wait a second, what if you go to the web UI and list the contents of the storage? Do you see the disks? Because in a non-HCI variant, you need to provide all the necessary infos to access the external cluster to the RBD command. The RBD storage plugin of PVE is doing the same https://git.proxmox.com/?p=pve-stor...6b252da45c47b6bb1be22dc07fd952084;hb=HEAD#l85

(edited as I pressed enter by mistake)

I do, I'm not sure what's going on right now, but nothing writes properly to the ceph storage. (I do see the disks). If I clone/create a new VM to the ceph storage, the I/O dies almost immediately.

The ceph cluster shows healthy. One thing I have now noticed is that running "rbd bench" freezes from Proxmox, but works fine directly on the ceph cluster.

Code:
rbd bench --io-type write bench  -p proxmoxv4 -m 10.0.8.51 --auth_supported cephx -n client.proxmox2 --keyring /etc/pve/priv/ceph/proxmoxv5.keyring
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     16896   16928.4    66 MiB/s
    2     26032   12991.1    51 MiB/s
    3     30880   10298.3    40 MiB/s
    4     32448   8095.49    32 MiB/s

(yes, it dies right here, no error, nothing)

You can see it slow down from one line to the other - on this run it's not as obvious, but on other runs it goers from 80Mbit/s to 7Mbit/s before freezing. I can even hear it start to function, then stop.

If I run the command (well, without the external auth) on the ceph cluster, it works perfectly.

The thing is 2 days ago I could clone Proxmox VMs to it with 500GB of data with no issues other than the wait (1Gb network, 15K SFF HDD - but this is just a lab, so I don't mind the wait). Now it's as if it tires itself and gives up.
 
Last edited:
Again, for posterity and in case it helps someone else. the problem seems to have been Proxmox's, not Ceph.

I got the following failure in my nightly backup:
ERROR: Backup of VM 2001 failed - VM 2001 qmp command 'backup' failed - Node 'drive-scsi0' is busy: block device is in use by block job: mirror

I don't know at all what this was, but multiple reboots of the PVE didn't help. As soon as I stopped the VM (right-click/stop) my backups started working properly, and a subsequent test of the Ceph storage showed it working well.

I don't have an explanation for how it got into that state, but I have been experimenting and cloning, clicking, erasing images a lot more than I would in a production system.

Leaving this here in case it helps someone. Open question, if someone can help, is what does this errors means and how do we get to that state?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!