Proxmox 6.0/CEPH OSD disk hot-plug detection

maff1989

New Member
Jul 25, 2019
2
1
3
35
I'm running a Dell R710 with disk hot-plug capability, running the latest Proxmox 6.0 and Ceph Nautilus. All of the disks are configured as OSDs via the latest GUI tools.

The problem I'm seeing, however, is that when unplugging a drive from the hot-plug bay, the OSD is not detected as "down". It isn't until I manually `down` a different disk that the original hot-plugged disk is detected as "down". Waiting the default 10m for an OSD to detect a down disk never actually sets the OSD as down. Even after waiting multiple hours, the OSD doesn't automatically "down" itself.

I'm curious why the OSD would not detect when a hot-plugged disk is removed and set itself "down" after the default 10m interval. I notice that there are "Input/Output errors" and that the device-mapper (i.e. `/dev/mapper/*`) devices aren't automatically removed to accommodate the hot-plug event. The old OSD format (i.e. using `ceph-disk`) didn't have these issues since the OSD interfaced directly with disk partitions, and not through device-mapper/LVM, which I think is part of the issue here.

Any help and/or insight is appreciated.
 
  • Like
Reactions: zacko11288
i just tested this here and it works without problems...
do you have any activity on the ceph cluster while you are doing those tests?
it is possible that with the replacement of ceph-disk with ceph-volume it may be that such errors are only detected when there is activity on the disks
 
do you have any activity on the ceph cluster while you are doing those tests?
it is possible that with the replacement of ceph-disk with ceph-volume it may be that such errors are only detected when there is activity on the disks

Thank you for the reply. This was the exact reason for my issue. I did some further testing yesterday after creating and running a VM and the disks were indeed marked "down" immediately after unplugging the drive due to the I/O error. It makes sense that an OSD will only be marked "down" once I/O fails.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!