Help needed: VMs stuck on "Booting from Hard Disk" with Ceph RBD

petwri

New Member
Jan 27, 2025
2
0
1
I am using Ceph RBD as my VM-Image storage backend. Just recently one of my Ceph OSDs crashed, I had to replace it. Now a recovery is running, a few PGs are degraded there.

This resulted in VMs not being able to boot anymore. I don't know if it's just a coincidence that it happended at the same time or if it's related to ceph doing a recovery, but some VMs are stuck at "Booting from Hard Disk" and nothing happens. I already tried playing around with the HDD options of the VMs (skip replication, boot order, caching) but it doesn't make any difference. I can migrate VMs between nodes (which happens instantly - makes sense since the images are on ceph rbd), but regardless of the node I am using, I cannot start any.

I am a little stuck here, since I don't really know how to proceed or what logs I should check. I don't want to restart any other VMs because I am afraid none of them will come up. The RBD images are definitely there.

Update: I just tried to export the image to local storage, and the export is stuck after a few MBs - seems like my ceph rbd storage is compromised. I'll let ceph do all recovery and then check again.
 
Last edited:
Hi, 12 OSDs and 3 nodes. Smartmontool reported disk was about to die, I set it to out, let the recovery finish, then down. Thats when pg's became degraded.