KVM disk has disappeared from Ceph pool

gkovacs · Feb 18, 2017

I have a small 5 node Ceph (hammer) test cluster. Every node runs Proxmox, a Ceph MON and 1 or 2 OSDs. There are two pools defined, one with 2 copies (pool2), and one with 3 copies of data (pool3). Ceph has a dedicated 1Gbps network. There are a few RAW disks stored on pool2 at the moment, belonging to an OpenMediaVault KVM guest and some others.

Last night one of the nodes spontaneously rebooted during backups (probably because of memory exhaustion on ZFS, no idea really as nothing is in the logs), and since then the RAW disk on Ceph pool2 that's attached to a KVM guest has vanished.

When I try to start the KVM guest it gives a "no such file" error:

Code:

root@proxmox:~# qm start 126
kvm: -drive file=rbd:rbd/vm-126-disk-1:mon_host=192.168.0.7;192.168.0.6;192.168.0.5:id=admin:auth_supported=cephx:keyring=/etc/pve/priv/ceph/pool2.keyring,if=none,id=drive-virtio2,cache=writeback,format=raw,aio=threads,detect-zeroes=on: error reading header from vm-126-disk-1: No such file or directory

When checking Ceph from the Proxmox web interface, everything looks fine: all monitors running with quorum, all OSDs Up and In, all PGs active+clean. Ceph status also shows no errors:

Code:

root@proxmox:~# ceph status
    cluster 98c9d762-ea24-4e28-88b9-a0a585d53cfd
     health HEALTH_WARN
            too many PGs per OSD (537 > max 300)
     monmap e5: 5 mons at {0=192.168.0.3:6789/0,1=192.168.0.4:6789/0,2=192.168.0.5:6789/0,3=192.168.0.6:6789/0,4=192.168.0.7:6789/0}
            election epoch 368, quorum 0,1,2,3,4 0,1,2,3,4
     osdmap e329: 5 osds: 5 up, 5 in
      pgmap v1053656: 1088 pgs, 3 pools, 2764 GB data, 691 kobjects
            5536 GB used, 4679 GB / 10216 GB avail
                1088 active+clean

If I check the pool2 storage from the web interface, it shows correct usage of 54.19% (5.41 TiB of 9.98 TiB), yet in Content no RAW disks are visible:

Ceph logs show nothing in particular. Has anyone seen anything like this? Where is my data?

Ashley · Feb 19, 2017

What does

"rbd ls pool2" show via terminal

gkovacs · Feb 19, 2017

Ashley said:
What does

"rbd ls pool2" show via terminal

Code:

root@proxmox:~# rbd ls pool2
vm-102-disk-1
vm-112-disk-1
vm-126-disk-1

Isn't that interesting? The virtual disks are there, yet Proxmox does not see them.

I have since updated and restarted all cluster nodes, but the problem persists, Proxmox does not see the contents of the Ceph pools.

Ashley · Feb 19, 2017

Can you paste the contents of your VM's config file?

located @ /etc/pve/qemu-server

gkovacs · Feb 19, 2017

Ashley said:
Can you paste the contents of your VM's config file?

located @ /etc/pve/qemu-server

Sure, here you go:

Code:

root@proxmox:/etc/pve/qemu-server# cat 126.conf
balloon: 0
bootdisk: virtio0
cores: 2
cpu: SandyBridge
ide2: none,media=cdrom
memory: 1024
name: OMV3
net0: virtio=3A:A9:18:97:A9:75,bridge=vmbr2
net1: virtio=0A:57:30:EB:EF:01,bridge=vmbr0
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-single
smbios1: uuid=0bdc04c1-e194-4431-9e6d-9997edf432b0
sockets: 1
virtio0: local-zfs:vm-126-disk-1,cache=writeback,size=32G
virtio1: lvm3tb:vm-126-disk-2,backup=0,cache=writeback,size=2760G
virtio2: pool2:vm-126-disk-1,cache=writeback,size=6T

BTW I have since created a new VM with it's disk on pool3, it shows up correctly in the web interface. These old disks do not.

gkovacs · Feb 19, 2017

Also here is my ceph.conf, I remember disabling the cache writethrough until flush setting (probably not connected to this issue). This is clearly a Proxmox problem, since the data is there on the rbd pools, only Proxmox does not seem to see them.

Code:

root@proxmox:~# cat /etc/pve/ceph.conf
[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 192.168.0.0/24
         filestore xattr use omap = true
         fsid = 98c9d762-ea24-4e28-88b9-a0a585d53cfd
         keyring = /etc/pve/priv/$cluster.$name.keyring
         osd journal size = 5120
         osd pool default min size = 1
         osd pool default size = 2
         public network = 192.168.0.0/24

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.1]
         host = proxmox2
         mon addr = 192.168.0.4:6789

[mon.4]
         host = proxmox5
         mon addr = 192.168.0.7:6789

[mon.3]
         host = proxmox4
         mon addr = 192.168.0.6:6789

[mon.0]
         host = proxmox
         mon addr = 192.168.0.3:6789

[mon.2]
         host = proxmox3
         mon addr = 192.168.0.5:6789

[client]
        rbd_cache = true
        rbd_cache_writethrough_until_flush = false
        rbd_cache_size = 67108864
        rbd_cache_max_dirty = 50331648
        rbd_cache_target_dirty = 33554432
        rbd_cache_max_dirty_age = 1

Ashley · Feb 19, 2017

Will leave this down to the dev's, may be a bug or just a one off.

Main thing is as you have proven your storage is still there and useable, to me does not look a CEPH issue otherwise you would not be able to create a new disk and make use of this.

gkovacs · Feb 19, 2017

@tom @fabian @wolfgang any idea what might have happened here? Proxmox still can't access my VM disks on the rbd pool...

fabian · Feb 20, 2017

how long does "rbd ls -l" take for the pool where you don't see content? we need the extra metadata to get the size and the full volid, but unfortunately this is very expensive in ceph.. the GUI runs this "list content" API call synchronously, which AFAIR has a 30 second timeout.

gkovacs · Feb 20, 2017

fabian said:
how long does "rbd ls -l" take for the pool where you don't see content? we need the extra metadata to get the size and the full volid, but unfortunately this is very expensive in ceph.. the GUI runs this "list content" API call synchronously, which AFAIR has a 30 second timeout.

This took about 1 second:

Code:

root@proxmox:~# rbd ls -l pool2
NAME             SIZE PARENT FMT PROT LOCK
vm-102-disk-1  43008M          2
vm-112-disk-1 102400M          2
vm-126-disk-1   6144G          2

The GUI list is empty since the spontaneous node reboot Saturday night, and neither OSD repairs nor full reboots on the whole cluster helped.

fabian · Feb 20, 2017

what does "pvesm list STORAGENAME" output?

gkovacs · Feb 20, 2017

fabian said:
what does "pvesm list STORAGENAME" output?

It does not give any output (the PVE storage is named the same as the RBD pool):

Code:

root@proxmox:~# pvesm list pool2
root@proxmox:~#

I have even tried to remove and re-add the RBD storage with different monitor IPs, to the same effect: Proxmox does not see the content of the RBD pools.

fabian · Feb 20, 2017

gkovacs said:
It does not give any output (the PVE storage is named the same as the RBD pool):

Code:

root@proxmox:~# pvesm list pool2 root@proxmox:~#

I have even tried to remove and re-add the RBD storage with different monitor IPs, to the same effect: Proxmox does not see the content of the RBD pools.

then something must be wrong with your storage configuration or your ceph configuration - PVE does not do anything special, it generates an "rbd ls" command line and executes it. it uses the ceph config and keyring from /etc/pve/priv/ceph/STORAGE.conf (and .keyring), so make sure those two are identical with whatever is used when running commands manually. the monitor addresses, username and pool name are taken from the storage configuration.

gkovacs · Feb 20, 2017

fabian said:
then something must be wrong with your storage configuration or your ceph configuration - PVE does not do anything special, it generates an "rbd ls" command line and executes it. it uses the ceph config and keyring from /etc/pve/priv/ceph/STORAGE.conf (and .keyring), so make sure those two are identical with whatever is used when running commands manually. the monitor addresses, username and pool name are taken from the storage configuration.

Okay, so I re-added pool2 in the storage UI (did not touch the Ceph pool itself), and checked keyrings:

Code:

root@proxmox:~# cat /etc/pve/priv/ceph/pool2.keyring
[client.admin]
        key = PQDDRU9YX9u7HhAAEo3wLAFVCgVL+JsrEcs6HA==
root@proxmox:~# cat /etc/pve/priv/ceph/pool3.keyring
[client.admin]
        key = PQDDRU9YX9u7HhAAEo3wLAFVCgVL+JsrEcs6HA==
root@proxmox:~# cat /etc/ceph/ceph.client.admin.keyring
[client.admin]
        key = PQDDRU9YX9u7HhAAEo3wLAFVCgVL+JsrEcs6HA==

Then I added a new disk to a VM on pool2, it shows up on pool2 in the web interface:

Unfortunately, it does not show up in rbd ls pool2:

Code:

root@proxmox:~# rbd ls pool2
vm-102-disk-1
vm-112-disk-1
vm-126-disk-1

However, it does show up on rbd ls (not sure what that lists):

Code:

root@proxmox:~# rbd ls
vm-120-disk-1

So it looks like pool2 for Proxmox is not the same as pool2 for Ceph. Any idea how can I get back my data?

dcsapak · Feb 20, 2017

what is the content of /etc/pve/storage.cfg ?

dcsapak · Feb 20, 2017

as far as i see from the first posting, it tries to access the images in the ceph pool 'rbd' and not pool2
so i would check that the correct pool is in the storage config

gkovacs · Feb 20, 2017

dcsapak said:
what is the content of /etc/pve/storage.cfg ?

Ok, that solved it. For some unknown reason, the RBD pool definitions in storage.cfg were overwritten with "rbd":

Code:

rbd: pool3
        monhost 192.168.0.6,192.168.0.7,192.168.0.5
        krbd 0
        username admin
        content images
        pool rbd

rbd: pool2
        monhost 192.168.0.3,192.168.0.6,192.168.0.4
        krbd 0
        username admin
        content images
        pool rbd

I corrected these back to the original pool names (pool2 and pool3), and I can access my storage again.
Any idea how can these get overwritten without user intervention?

Thanks for your help.

dcsapak · Feb 20, 2017

gkovacs said:
Okay, so I re-added pool2 in the storage UI (did not touch the Ceph pool itself), and checked keyrings:

you wrote that you readded the pool, and the problem existed directly after
maybe you just readded the storages at some point and forgot to enter the right pool?

gkovacs · Feb 20, 2017

dcsapak said:
you wrote that you readded the pool, and the problem existed directly after
maybe you just readded the storages at some point and forgot to enter the right pool?

No, actually the problem surfaced right after one of our nodes unexpectedly rebooted Saturday night. During that reboot storage.cfg somehow got modified, because the pools worked just fine the previous day, and no one touched the configuration. (And I only readded pool2, so even if I made an error, there is no explanation to how pool3 got edited to rbd.) Weird.

KVM disk has disappeared from Ceph pool

Renowned Member

Member

Renowned Member

Member

Renowned Member

Renowned Member

Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Proxmox Staff Member

Renowned Member

Proxmox Staff Member

Renowned Member