Ceph problem

Doubleclic

New Member
Oct 24, 2024
5
0
1
Hello, we have a proxmox server cluster with CEPH storage. Today one of the nodes (1/3) was crashed, we rebooted it physically but one of the VMs doesn't boot anymore because the disk is not visible in the CEPH pool. With the command rbd ls “poolname” the disk doesn't appear. Any ideas?
 
A bit more infos would be useful. Config of the VM (qm config {vmid}).
Is Ceph healthy or does it show any warnings?
What is the actual error in the task log of the VM start?
 
agent: 0
balloon: 8192
bios: ovmf
boot: order=net0
cores: 4
cpu: x86-64-v2-AES
description: <div align='center'>%0A<a href='https%3A//double-clic.eu' target='_blank' rel='noopener noreferrer'>%0A<img src='https%3A//get.teamviewer.com/common/logo/get.ashx?configID=jvz35bz&systemName=GetTeamviewerCom' />%0A</a>%0A%0A # Double-clic%0A</div>%0Ascsi0%3A ceph-hybrid-storage%3Avm-80624-disk-0,aio=threads,backup=0,cache=writeback,size=80G,ssd=1
efidisk0: ssd-pool:vm-80624-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
memory: 16384
meta: creation-qemu=8.1.5,ctime=1717483719
name: VM-DEB12-TACTICALPROD
net0: virtio=BC:24:11:D1:2B:72,bridge=vmbr0
numa: 0
ostype: l26
scsi0: ssd-pool:vm-80624-disk-0,size=80G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=7d3cc15f-9097-40a2-ab18-27c7a34cf896
sockets: 2
tags: prod;debian12
vga: virtio
vmgenid: 0449becc-17d2-4277-ab99-a4d8b95e4c4c
 
base-60424-disk-0
base-60424-disk-1
base-956250724-disk-0
vm-100624-disk-0
vm-100624-disk-1
vm-10724-disk-1
vm-110624-disk-0
vm-110624-disk-1
vm-200924-disk-0
vm-230924-disk-0
vm-241024-disk-0
vm-251024-disk-0
vm-251024-disk-1
vm-261024-disk-0
vm-261024-disk-1
vm-270524-disk-0
vm-270524-disk-1
vm-30624-disk-0
vm-30624-disk-1
vm-30624-disk-2
vm-30624-disk-3
vm-31024-disk-0
vm-40624-disk-0
vm-40624-disk-2
vm-80624-disk-0
vm-80624-disk-1
 
well your disks are present. What makes you say
one of the VMs doesn't boot anymore because the disk is not visible in the CEPH pool.
relevent logs would be helpful.

--edit
efidisk0: ssd-pool:vm-80624-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
...
scsi0: ssd-pool:vm-80624-disk-0,size=80G,ssd=1
You have your efi disk and boot on the same disk. this is probably why it wont boot.
 
Last edited:
  • Like
Reactions: VictorSTS
Hello Alex, thank you for your help, the disk is there but there is no more data on it so the ceph system is healthy. I don't understand why it's as if the disk had been destroyed.
 
we imported a backup to restore the VM because the virtual disk is no longer working and there is no data on it.
 
There had to be some misconfiguration at some point because in your configuration the EFI and the data disk are using the same Ceph image and that's not possible. Pretty sure this problem isn't related to Ceph, as I've had all kinds of problems with powered down nodes or clusters in many situations and never lost a bit.
Check the restored VM configuration and which RBD image is used for each disk.
 
There are (were?) two disks on the filesystem. my guess is that vm-80624-disk-0 is the efi disk.

I would attach vm-80624-disk-1 to the vm, and see what happens; It may just boot correctly in which case you would remove the replace scsi0 with that image.

SOMEONE was clearly editing that VM's config.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!