Ceph upgrade to Nautilus - error mount point and no "uuid"

Kaboom · Nov 4, 2019

Dear all,

I am busy with upgrading Ceph to Nautilus https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus

But I get this error when running the ceph-volume simple scan

root@node002:/dev/disk/by-partuuid# ceph-volume simple scan /dev/sdc1
Running command: /sbin/cryptsetup status /dev/sdc1
Running command: /bin/mount -v /dev/sdc1 /tmp/tmpC53VLj
stderr: mount: /tmp/tmpC53VLj: /dev/sdc1 already mounted or mount point busy.
--> RuntimeError: command returned non-zero exit status: 32

root@node002:/dev/disk/by-partuuid# ceph-volume simple activate --all
--> activating OSD specified in /etc/ceph/osd/2-9fef792d-e0fd-4d9f-9b99-3040e636cf16.json
--> RuntimeError: Unable to activate OSD None - no "uuid" key found for data

Can anyone help me out?

Thanks!

Kaboom · Nov 4, 2019

root@node002:/dev/disk/by-partuuid# cat /etc/ceph/osd/2-9fef792d-e0fd-4d9f-9b99-3040e636cf16.json
{
"active": "ok",
"block": {
"path": "/dev/disk/by-partuuid/8755dd67-fee5-46f2-b0eb-e9fd75725722",
"uuid": "8755dd67-fee5-46f2-b0eb-e9fd75725722"
},
"block_uuid": "8755dd67-fee5-46f2-b0eb-e9fd75725722",
"bluefs": 1,
"ceph_fsid": "09935360-cfe7-48d4-ac76-c02e0fdd95de",
"cluster_name": "ceph",
"data": {
"path": "../dm-8",
"uuid": ""
},
"fsid": "9fef792d-e0fd-4d9f-9b99-3040e636cf16",
"keyring": "AQBAUPVaHbCsFhAAkHybC1sITAfeFsCJTshPHA==",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"ready": "ready",
"require_osd_release": 12,
"systemd": "",
"type": "bluestore",
"whoami": 2
}

Alwin · Nov 4, 2019

Code:

ceph-volume simple scan
ceph-volume simple activate --all

Run this command to get all OSDs activated. You can re-run it after the reboot too. The only thing is that the OSDs won't start if that step was missed.

Kaboom · Nov 4, 2019

Thanks for your fast answer, but I get this error or is this no problem?

root@node003:/dev# ceph-volume simple activate --all
--> activating OSD specified in /etc/ceph/osd/9-168c72e2-02a2-480e-818b-861f3e2dff0c.json
--> RuntimeError: Unable to activate OSD None - no "uuid" key found for data

Alwin · Nov 4, 2019

Kaboom said:
--> activating OSD specified in /etc/ceph/osd/9-168c72e2-02a2-480e-818b-861f3e2dff0c.json
--> RuntimeError: Unable to activate OSD None - no "uuid" key found for data

OSD.9 doesn't seem to have an UUID for its partition. Check with lsblk -l -o NAME,UUID if it has one like the others. If not, it may be easier to destroy & create the OSD again.

Also check if you didn't miss a step on our upgrade guide.
https://pve.proxmox.com/wiki/Ceph_Luminous_to_Nautilus#Restart_the_OSD_daemon_on_all_nodes

Kaboom · Nov 5, 2019

I have destroyed the OSD (with hdparm, time fswipe and zap) but it stays unused in Proxmox if I want to add it 'No disks unused'. Should I partition it first?

=====

fdisk =>

Unpartitioned space /dev/sdd: 447.1 GiB, 480102932480 bytes, 937701040 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes

Start End Sectors Size
2048 937703087 937701040 447.1G

=====

pveceph createosd /dev/sdd =>

device '/dev/sdd' is already in use

=====

ls -la /dev/sdd =>

brw-rw---- 1 root disk 8, 48 Nov 5 10:12 sdd

There is no ssd1 or ssd2

=====

pvesm status =>

Name Type Status Total Used Available %
NFS008 nfs disabled 0 0 0 N/A
ceph_ssd rbd active 3883469120 3333127424 550341696 85.83%
local dir active 1120317312 536248448 584068864 47.87%
local-thin-lvm lvmthin disabled 0 0 0 N/A
local-zfs zfspool disabled 0 0 0

=====

Thanks!

Alwin · Nov 5, 2019

Check with lsblk if the disks is still show with partitions. The kernel might not have picked it up. If so, run a partprobe to tell the kernel that the layout changed.

Kaboom · Nov 5, 2019

lsblk =>

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 1.1T 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 1.1T 0 part
└─sda9 8:9 0 8M 0 part
sdb 8:16 0 1.1T 0 disk
├─sdb1 8:17 0 1007K 0 part
├─sdb2 8:18 0 1.1T 0 part
└─sdb9 8:25 0 8M 0 part
sdc 8:32 0 447.1G 0 disk
└─355cd2e414e2e5527 253:0 0 447.1G 0 mpath
sdd 8:48 0 447.1G 0 disk
└─355cd2e414e2e5ec1 253:1 0 447.1G 0 mpath
sde 8:64 0 447.1G 0 disk
└─355cd2e414f491349 253:2 0 447.1G 0 mpath
sdf 8:80 0 447.1G 0 disk
└─355cd2e414f491f34 253:3 0 447.1G 0 mpath
sdg 8:96 0 447.1G 0 disk
└─355cd2e414f492d87 253:4 0 447.1G 0 mpath
sdh 8:112 0 447.1G 0 disk
└─355cd2e414f482739 253:5 0 447.1G 0 mpath
zd0 230:0 0 8G 0 disk [SWAP]

======

I ran partprobe, but still No disks unused in Proxmox. I want to add SSD's sdc until sdh.

Alwin · Nov 5, 2019

Please post such output in CODE tags (triple dot), its hard to read now.

Kaboom said:
sdd 8:48 0 447.1G 0 disk
└─355cd2e414e2e5ec1 253:1 0 447.1G 0 mpath

Our tooling doesn't allow iSCSI devices to be used. It is in any case not a good idea to use a SAN/NAS for OSDs. You need to use ceph-volume by itself.

Kaboom · Nov 5, 2019

But these are all local SSD's (per node 6x SSD), I don't understand what you mean... this is not a NAS.

Alwin · Nov 5, 2019

Kaboom said:
└─355cd2e414f482739 253:5 0 447.1G 0 mpath

These seem to be multipathed. This usually originates from a SAN/NAS multipath disk. What does ls -lah /sys/block/sdd show?

Kaboom · Nov 5, 2019

lrwxrwxrwx 1 root root 0 Nov 5 14:49 /sys/block/sdd -> ../devices/pci0000:ae/0000:ae:00.0/0000:af:00.0/host0/port-0:3/end_device-0:3/target0:0:3/0:0:3:0/block/sdd

Alwin · Nov 5, 2019

And sfdisk -l /dev/sdd? If the disk is empty, does a OSD creation on the CLI work pveceph osd create /dev/sdd?

EDIT: otherwise run a sgdisk -Z /dev/sdd to remove any GPT or MBR leftover.

Kaboom · Nov 5, 2019

Code:

sgdisk -Z /dev/sdd

Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.

====

Code:

pveceph osd create /dev/sdd

device '/dev/sdd' is already in use

====

Code:

sfdisk -l /dev/sdd

Disk /dev/sdd: 447.1 GiB, 480103981056 bytes, 937703088 sectors
Disk model: INTEL SSDSC2KB48
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes

It keeps saying it is already in use.

Alwin · Nov 5, 2019

Please reboot the node and try again, the kernel may got stuck with an old partition layout.

Kaboom · Nov 5, 2019

I did that already several times, even tried to load an older kernel. Now running
Linux 5.0.21-3-pve #1 SMP PVE 5.0.21-7 (Mon, 30 Sep 2019 09:11:02 +0200)

Any other ideas?

Alwin · Nov 5, 2019

Try to use ceph-volume directly. What version of Ceph are you running ceph versions?

Kaboom · Nov 5, 2019

Code:

ceph versions
{
    "mon": {
        "ceph version 14.2.4 (65249672c6e6d843510e7e01f8a4b976dcac3db1) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.4 (65249672c6e6d843510e7e01f8a4b976dcac3db1) nautilus (stable)": 3
    },
    "osd": {
        "ceph version 14.2.4 (65249672c6e6d843510e7e01f8a4b976dcac3db1) nautilus (stable)": 30
    },
    "mds": {},
    "overall": {
        "ceph version 14.2.4 (65249672c6e6d843510e7e01f8a4b976dcac3db1) nautilus (stable)": 36
    }
}

=====

Code:

Running command: /sbin/vgcreate -s 1G --force --yes ceph-e0618136-83c9-4bfd-b0a0-139ce1c72c39 /dev/sdd
 stderr: Device /dev/sdd excluded by a filter.
-->  RuntimeError: command returned non-zero exit status: 5

=====

Is this helpful?

Code:

ceph-volume inventory /dev/sdd

====== Device report /dev/sdd ======

     available                 False
     rejected reasons          locked
     path                      /dev/sdd
     scheduler mode            mq-deadline
     rotational                0
     vendor                    ATA
     human readable size       447.13 GB
     sas address               0x4433221103000000
     removable                 0
     model                     INTEL SSDSC2KB48
     ro                        0

Alwin · Nov 5, 2019

Kaboom said:
stderr: Device /dev/sdd excluded by a filter.

So Ceph is blocking the creation for some reason. Is there anything more in the journal/syslog and ceph logs?

Kaboom · Nov 5, 2019

I can't find a thing

I really don't understand why I can't re-add these SSD's. They were working great before the Ceph upgrade.

Ceph upgrade to Nautilus - error mount point and no "uuid"

Well-Known Member

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member

Proxmox Retired Staff

Well-Known Member