Unable to create OSDs in new Proxmox/Ceph cluster - RADOS object not found (error connecting to the cluster)

victorhooi

Well-Known Member
Apr 3, 2018
255
20
58
38
I've setup a new 4-node Proxmox/Ceph cluster.

I have run pveceph install on each node.

I have also setup ceph mon and ceph mgr on each node.

Here is the output of /etc/pve/ceph.conf:
Code:
# cat /etc/pve/ceph.conf
[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.7.15.3/24
     fsid = f17ee24c-0562-44c3-80ab-e7ba8366db86
     mon_allow_pool_delete = true
     mon_host = 10.7.15.3 10.7.15.4 10.7.15.5 10.7.15.6
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.7.15.3/24

[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring

[mon.examplemtv-vm01]
     public_addr = 10.7.15.3

[mon.examplemtv-vm02]
     public_addr = 10.7.15.4

[mon.examplemtv-vm03]
     public_addr = 10.7.15.5

[mon.examplemtv-vm04]
     public_addr = 10.7.15.6

My OSD tree:
Code:
# ceph osd tree
ID  CLASS  WEIGHT  TYPE NAME     STATUS  REWEIGHT  PRI-AFF
-1              0  root default
Ceph status:
Code:
# ceph status
  cluster:
    id:     f17ee24c-0562-44c3-80ab-e7ba8366db86
    health: HEALTH_WARN
            Module 'volumes' has failed dependency: No module named 'distutils.util'
            Reduced data availability: 1 pg inactive
            OSD count 0 < osd_pool_default_size 3

  services:
    mon: 4 daemons, quorum examplemtv-vm01,examplemtv-vm02,examplemtv-vm03,examplemtv-vm04 (age 3h)
    mgr: examplemtv-vm02(active, since 3h), standbys: examplemtv-vm03, examplemtv-vm01, examplemtv-vm04
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:     100.000% pgs unknown
             1 unknown
Here are the available disks for the first node:
Code:
# ceph-volume inventory

Device Path               Size         rotates available Model name
/dev/nvme0n1              894.25 GB    False   True      INTEL SSDPED1D960GAY
/dev/nvme2n1              3.64 TB      False   True      INTEL SSDPE2KX040T7
/dev/nvme3n1              3.64 TB      False   True      INTEL SSDPE2KX040T7
/dev/nvme4n1              3.64 TB      False   True      INTEL SSDPE2KX040T7
/dev/nvme5n1              3.64 TB      False   True      INTEL SSDPE2KX040T7
/dev/nvme6n1              3.64 TB      False   True      INTEL SSDPE2KX040T7
/dev/nvme7n1              3.64 TB      False   True      INTEL SSDPE2KX040T7
/dev/nvme1n1              931.51 GB    False   False     Samsung SSD 960 EVO 1TB

I am now trying to create OSDs on the first node - however, I get a RADOS object not found error - error connecting to the cluster:

Code:
# ceph-volume lvm batch --osds-per-device 4 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1 /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 --db-devices /dev/nvme0n1

Total OSDs: 24

Solid State VG:
  Targets:   block.db                  Total size: 893.00 GB
  Total LVs: 96                        Size per LV: 37.21 GB
  Devices:   /dev/nvme0n1

  Type            Path                                                    LV Size         % of device
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme2n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme3n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme3n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme3n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme3n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme4n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme4n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme4n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme4n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme5n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme5n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme5n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme5n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme6n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme6n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme6n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme6n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme7n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme7n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme7n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
----------------------------------------------------------------------------------------------------
  [data]          /dev/nvme7n1                                            931.25 GB       25.0%
  [block.db]      vg: vg/lv                                               37.21 GB        4%
--> The above OSDs would be created if the operation continues
--> do you want to proceed? (yes/no) y
Running command: /usr/sbin/vgcreate --force --yes ceph-block-f4847ec1-5108-4438-9bd6-3c7ecdf496d4 /dev/nvme2n1
 stdout: Physical volume "/dev/nvme2n1" successfully created.
 stdout: Volume group "ceph-block-f4847ec1-5108-4438-9bd6-3c7ecdf496d4" successfully created
Running command: /usr/sbin/vgcreate --force --yes ceph-block-cf6ae065-c2c2-45bd-a9fa-96ddc70d654f /dev/nvme3n1
 stdout: Physical volume "/dev/nvme3n1" successfully created.
 stdout: Volume group "ceph-block-cf6ae065-c2c2-45bd-a9fa-96ddc70d654f" successfully created
Running command: /usr/sbin/vgcreate --force --yes ceph-block-fd3bf495-1907-4eb3-8122-df5ec83c2c10 /dev/nvme4n1
 stdout: Physical volume "/dev/nvme4n1" successfully created.
 stdout: Volume group "ceph-block-fd3bf495-1907-4eb3-8122-df5ec83c2c10" successfully created
Running command: /usr/sbin/vgcreate --force --yes ceph-block-a3526a34-d438-41ce-9d03-3e101b84dfb7 /dev/nvme5n1
 stdout: Physical volume "/dev/nvme5n1" successfully created.
 stdout: Volume group "ceph-block-a3526a34-d438-41ce-9d03-3e101b84dfb7" successfully created
Running command: /usr/sbin/vgcreate --force --yes ceph-block-657a14c8-c9d9-48ed-9537-5bd1354b93b4 /dev/nvme6n1
 stdout: Physical volume "/dev/nvme6n1" successfully created.
 stdout: Volume group "ceph-block-657a14c8-c9d9-48ed-9537-5bd1354b93b4" successfully created
Running command: /usr/sbin/vgcreate --force --yes ceph-block-0c4edee0-3221-456a-9bec-49128de4f2b5 /dev/nvme7n1
 stdout: Physical volume "/dev/nvme7n1" successfully created.
 stdout: Volume group "ceph-block-0c4edee0-3221-456a-9bec-49128de4f2b5" successfully created
Running command: /usr/sbin/vgcreate --force --yes ceph-block-dbs-6d1c858b-b673-4cc5-a0ce-f32e6791c96c /dev/nvme0n1
 stdout: Physical volume "/dev/nvme0n1" successfully created.
 stdout: Volume group "ceph-block-dbs-6d1c858b-b673-4cc5-a0ce-f32e6791c96c" successfully created
Running command: /usr/sbin/lvcreate --yes -l 238465 -n osd-block-e43407fc-85c4-47a6-9549-c5cfd1275e06 ceph-block-f4847ec1-5108-4438-9bd6-3c7ecdf496d4
 stdout: Logical volume "osd-block-e43407fc-85c4-47a6-9549-c5cfd1275e06" created.
Running command: /usr/sbin/lvcreate --yes -l 9538 -n osd-block-db-efc00ed9-9c8a-4745-8e7b-3c73952b77f5 ceph-block-dbs-6d1c858b-b673-4cc5-a0ce-f32e6791c96c
 stdout: Logical volume "osd-block-db-efc00ed9-9c8a-4745-8e7b-3c73952b77f5" created.
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 859045ef-e2fc-4798-bf4c-b437b2b21eea
 stderr: [errno 2] RADOS object not found (error connecting to the cluster)
-->  RuntimeError: Unable to create a new OSD id

Does anybody know what the above error means?

Thanks,
Victor
 
Yes, there is a symlink there:
Code:
# ls -lah /etc/ceph
total 23K
drwxr-xr-x  2 ceph ceph   5 Nov 12 11:29 .
drwxr-xr-x 90 root root 179 Nov 20 05:58 ..
-rw-------  1 ceph ceph 151 Nov 12 11:29 ceph.client.admin.keyring
lrwxrwxrwx  1 root root  18 Nov 12 11:29 ceph.conf -> /etc/pve/ceph.conf
-rw-r--r--  1 root root  92 Aug 28  2019 rbdmap
However, I think I found the issue - it was actually from one of my older threads (link) - thanks for helping me there =).

To save time, the fix was this:
Code:
# ceph auth get client.bootstrap-osd > /var/lib/ceph/bootstrap-osd/ceph.keyring
exported keyring for client.bootstrap-osd
It seems when Proxmox sets up Ceph - it doesn't setup the above keyring, and the standard Ceph commands like ceph-volume won't work.

Does it make sense to integrate setting up the keyring as part of the normal Proxmox Ceph install process?
 
It looks like both the symlink shown by @victorhooi AND installing apt install python3-distutils are needed to make out-of-the-box Ceph on Proxmox actually work flawlessly.
 
It looks like both the symlink shown by @victorhooi AND installing apt install python3-distutils are needed to make out-of-the-box Ceph on Proxmox actually work flawlessly.
The symlink is needed (and automatically created) by Ceph itself, since all tooling is looking for it. The distutils is a missing dependency, only recently introduced by upstream.