ceph: using uuid for OSD instead of /dev/sdx ?

Belokan

Active Member
Apr 27, 2016
155
16
38
Hello,

I have a 3 nodes PM 4.4 running CEPH with a dedicated physical disk as OSD on each node (home/lab usage).

root@pve2:~# pveversion
pve-manager/4.4-12/e71b7a74 (running kernel: 4.4.44-1-pve)
root@pve2:~# ceph --version
ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)

I've defined my OSDs using ceph-disk prepare/activate /dev/sdX but "sometimes", at boot, disks are mixed-up and for instance the /dev/sdb1 XFS OSD partition becomes /dev/sdc1 or ... or ...

As I'd prefer to avoid using static udev rule to "lock" sdb name, could it be possible to use something more "reliable" than /dev/sdX when creating/activating OSDs ? For instance /dev/disk/by-uuid/* ?

Thanks a lot in advance,

Olivier
 
Ceph already use uuid to identify osd id.
(you can even unplug a disk from a server and plug it in another server, the osd daemon will be started with correct id).

what exactly is your problem ? (because, yes, kernel can attribute /dev/sdx in random, but it shouldn't impact ceph)
 
Hi Spirit,

Thanks for taking some time to answer.
In order to give you an idea of my home lab here is a quick description:

I have 3 physical PVE nodes, 2 NUCs, one HP µserver G8, and a Synology NAS.

Each physical node has a boot SSD (Proxmox) and an attached USB3 disk used for CEPH (3/1). The G8 has 4 extra HDDs passed thru an OMV3 VM used for some replication of the Syno volumes and so on.

All my VM disks (including the OMW's boot one) are located on CEPH. I've set a dedicated switch/subnet for CEPH operations + backups.

I know it's only USB3 and 1Gbps but it's enough for my usage. I don't need high performance but more "high" availability ...

Now regarding the logs, I know (as a sysadmin) that I'm more trying to adapt to a problem instead of solving the problem but ...

For "some reason", the USB (/dev/sdb OSD) disconnected on one NUC:

Jul 1 22:06:04 pve2 kernel: [833838.186658] usb 4-5: Disable of device-initiated U1 failed.
Jul 1 22:06:09 pve2 kernel: [833843.186322] usb 4-5: Disable of device-initiated U2 failed.
Jul 1 22:06:14 pve2 kernel: [833848.313882] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Jul 1 22:06:19 pve2 kernel: [833853.529561] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Jul 1 22:06:25 pve2 kernel: [833858.937142] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Jul 1 22:06:30 pve2 kernel: [833864.152772] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Jul 1 22:06:35 pve2 kernel: [833869.560434] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Jul 1 22:06:40 pve2 kernel: [833874.776008] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command
Jul 1 22:06:41 pve2 kernel: [833875.577354] usb 4-5: reset SuperSpeed USB device number 2 using xhci_hcd
Jul 1 22:06:41 pve2 kernel: [833875.615973] usb 4-5: USB disconnect, device number 2
Jul 1 22:06:41 pve2 kernel: [833875.623947] sd 0:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul 1 22:06:41 pve2 kernel: [833875.623951] sd 0:0:0:0: [sdb] tag#0 CDB: Write(10) 2a 00 0f 45 05 a0 00 00 30 00
Jul 1 22:06:41 pve2 kernel: [833875.625849] XFS (sdb1): xfs_do_force_shutdown(0x2) called from line 1197 of file fs/xfs/xfs_log.c. Return address = 0xffffffffc0ad6888
Jul 1 22:06:41 pve2 kernel: [833875.625866] XFS (sdb1): xfs_log_force: error -5 returned.
Jul 1 22:06:41 pve2 kernel: [833875.627991] sd 0:0:0:0: [sdb] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK
Jul 1 22:06:41 pve2 kernel: [833875.627995] sd 0:0:0:0: [sdb] tag#0 CDB: Write(10) 2a 00 00 20 bf 50 00 00 18 00
Jul 1 22:06:41 pve2 kernel: [833875.629138] XFS (sdb1): xfs_log_force: error -5 returned.
Jul 1 22:06:42 pve2 kernel: [833876.465576] sd 0:0:0:0: [sdb] Synchronizing SCSI cache
Jul 1 22:06:42 pve2 kernel: [833876.465602] sd 0:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Then right after it reconnected but as /dev/sdc:

Jul 1 22:06:42 pve2 kernel: [833876.652036] usb 4-5: new SuperSpeed USB device number 4 using xhci_hcd
Jul 1 22:06:42 pve2 kernel: [833876.670310] usb 4-5: New USB device found, idVendor=0480, idProduct=a202
Jul 1 22:06:42 pve2 kernel: [833876.670313] usb 4-5: New USB device strings: Mfr=1, Product=2, SerialNumber=3
Jul 1 22:06:42 pve2 kernel: [833876.670314] usb 4-5: Product: External USB 3.0
Jul 1 22:06:42 pve2 kernel: [833876.670316] usb 4-5: Manufacturer: TOSHIBA
Jul 1 22:06:42 pve2 kernel: [833876.670317] usb 4-5: SerialNumber: 20161030000181C
Jul 1 22:06:42 pve2 kernel: [833876.671090] usb-storage 4-5:1.0: USB Mass Storage device detected
Jul 1 22:06:42 pve2 kernel: [833876.671145] scsi host7: usb-storage 4-5:1.0
Jul 1 22:06:43 pve2 kernel: [833877.670830] scsi 7:0:0:0: Direct-Access TOSHIBA External USB 3.0 5438 PQ: 0 ANSI: 6
Jul 1 22:06:43 pve2 kernel: [833877.671067] sd 7:0:0:0: Attached scsi generic sg2 type 0
Jul 1 22:06:43 pve2 kernel: [833877.672451] sd 7:0:0:0: [sdc] 976773164 512-byte logical blocks: (500 GB/466 GiB)
Jul 1 22:06:43 pve2 kernel: [833877.672780] sd 7:0:0:0: [sdc] Write Protect is off
Jul 1 22:06:43 pve2 kernel: [833877.673096] sd 7:0:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jul 1 22:06:45 pve2 kernel: [833879.576518] sdc: sdc1 sdc2
Jul 1 22:06:45 pve2 kernel: [833879.578834] sd 7:0:0:0: [sdc] Attached SCSI disk

Problem is that CEPH tried to mount the filesystem (using UUID you're right !) while it was probably still mount/failed:

Jul 1 22:06:46 pve2 kernel: [833880.145999] XFS (sdc1): Filesystem has duplicate UUID 7f15baec-8e84-4442-9e80-5650a8419548 - can't mount
Jul 1 22:06:46 pve2 kernel: [833880.430268] XFS (sdc1): Filesystem has duplicate UUID 7f15baec-8e84-4442-9e80-5650a8419548 - can't mount
Jul 1 22:06:58 pve2 kernel: [833891.926829] XFS (sdb1): xfs_log_force: error -5 returned.
Jul 1 22:07:28 pve2 kernel: [833922.004716] XFS (sdb1): xfs_log_force: error -5 returned.

Now maybe the only solution, expect to understand why it disconnects/reconnects, should be to use a static udev rule to be sure it keeps the device name after-all ...

Regards
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!