Ceph problem

Doubleclic · Thursday at 11:50

Hello, we have a proxmox server cluster with CEPH storage. Today one of the nodes (1/3) was crashed, we rebooted it physically but one of the VMs doesn't boot anymore because the disk is not visible in the CEPH pool. With the command rbd ls “poolname” the disk doesn't appear. Any ideas?

aaron · Thursday at 12:47

A bit more infos would be useful. Config of the VM (qm config {vmid}).
Is Ceph healthy or does it show any warnings?
What is the actual error in the task log of the VM start?

Doubleclic · Thursday at 17:35

agent: 0
balloon: 8192
bios: ovmf
boot: order=net0
cores: 4
cpu: x86-64-v2-AES
description: <div align='center'>%0A<a href='https%3A//double-clic.eu' target='_blank' rel='noopener noreferrer'>%0A<img src='https%3A//get.teamviewer.com/common/logo/get.ashx?configID=jvz35bz&systemName=GetTeamviewerCom' />%0A</a>%0A%0A # Double-clic%0A</div>%0Ascsi0%3A ceph-hybrid-storage%3Avm-80624-disk-0,aio=threads,backup=0,cache=writeback,size=80G,ssd=1
efidisk0: ssd-pool:vm-80624-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
memory: 16384
meta: creation-qemu=8.1.5,ctime=1717483719
name: VM-DEB12-TACTICALPROD
net0: virtio=BC:24:11

1:2B:72,bridge=vmbr0
numa: 0
ostype: l26
scsi0: ssd-pool:vm-80624-disk-0,size=80G,ssd=1
scsihw: virtio-scsi-pci
smbios1: uuid=7d3cc15f-9097-40a2-ab18-27c7a34cf896
sockets: 2
tags: prod;debian12
vga: virtio
vmgenid: 0449becc-17d2-4277-ab99-a4d8b95e4c4c

alexskysilk · Thursday at 18:14

please post the output of

rbd ls -p "poolname for ssd-pool"

you can grep for 80624 but if you believe its not there, may as well see what is.

Doubleclic · Thursday at 18:31

base-60424-disk-0
base-60424-disk-1
base-956250724-disk-0
vm-100624-disk-0
vm-100624-disk-1
vm-10724-disk-1
vm-110624-disk-0
vm-110624-disk-1
vm-200924-disk-0
vm-230924-disk-0
vm-241024-disk-0
vm-251024-disk-0
vm-251024-disk-1
vm-261024-disk-0
vm-261024-disk-1
vm-270524-disk-0
vm-270524-disk-1
vm-30624-disk-0
vm-30624-disk-1
vm-30624-disk-2
vm-30624-disk-3
vm-31024-disk-0
vm-40624-disk-0
vm-40624-disk-2
vm-80624-disk-0
vm-80624-disk-1

alexskysilk · Thursday at 19:07

well your disks are present. What makes you say

Doubleclic said:
one of the VMs doesn't boot anymore because the disk is not visible in the CEPH pool.

relevent logs would be helpful.

--edit

efidisk0: ssd-pool:vm-80624-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
...
scsi0: ssd-pool:vm-80624-disk-0,size=80G,ssd=1

You have your efi disk and boot on the same disk. this is probably why it wont boot.

Doubleclic · Friday at 07:36

Hello Alex, thank you for your help, the disk is there but there is no more data on it so the ceph system is healthy. I don't understand why it's as if the disk had been destroyed.

Doubleclic · Friday at 16:56

we imported a backup to restore the VM because the virtual disk is no longer working and there is no data on it.

VictorSTS · Friday at 17:55

There had to be some misconfiguration at some point because in your configuration the EFI and the data disk are using the same Ceph image and that's not possible. Pretty sure this problem isn't related to Ceph, as I've had all kinds of problems with powered down nodes or clusters in many situations and never lost a bit.
Check the restored VM configuration and which RBD image is used for each disk.

alexskysilk · Friday at 17:58

There are (were?) two disks on the filesystem. my guess is that vm-80624-disk-0 is the efi disk.

I would attach vm-80624-disk-1 to the vm, and see what happens; It may just boot correctly in which case you would remove the replace scsi0 with that image.

SOMEONE was clearly editing that VM's config.

Search

Search

Ceph problem

Doubleclic

New Member

aaron

Proxmox Staff Member

Doubleclic

New Member

alexskysilk

Distinguished Member

Doubleclic

New Member

alexskysilk

Distinguished Member

Doubleclic

New Member

Doubleclic

New Member

VictorSTS

Famous Member

alexskysilk

Distinguished Member