Ceph OSD not being created

watnow101

Active Member
Apr 9, 2018
15
0
41
42
Hi

I am trying to add additional OSD's to my cluster, but it is not being created. I do not get any errors, after the createosd command it runs through everything and stops/freeze at "The operation has completed successfully" See below.

After that "Ceph OSD sdc - Create" just runs under status without a result and the OSD does not get created.

The only change I made lately was that I removed Cephfs from the cluster. Do you perhaps have any advice where I can look further?

Thank you.

create OSD on /dev/sdc (bluestore)
wipe disk/partition: /dev/sdc
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.71704 s, 122 MB/s
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.


Code:
proxmox-ve: 5.4-2 (running kernel: 4.15.18-12-pve)
pve-manager: 5.4-3 (running version: 5.4-3/0a6eaa62)
pve-kernel-4.15: 5.3-3
pve-kernel-4.15.18-12-pve: 4.15.18-36
ceph: 12.2.12-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-56
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-7
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-41
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

OSD Tree

Code:
root@clusternode8:~# ceph osd tree
ID  CLASS WEIGHT   TYPE NAME             STATUS REWEIGHT PRI-AFF
 -1       24.01205 root default                                 
 -3        3.63837     host clusternode1                         
  0   hdd  0.90959         osd.0             up  1.00000 1.00000
  2   hdd  0.90959         osd.2             up  1.00000 1.00000
  1   ssd  0.90959         osd.1             up  1.00000 1.00000
  3   ssd  0.90959         osd.3             up  1.00000 1.00000
 -7        3.63837     host clusternode2                         
  7   hdd  0.90959         osd.7             up  1.00000 1.00000
 10   hdd  0.90959         osd.10            up  1.00000 1.00000
  4   ssd  0.90959         osd.4             up  1.00000 1.00000
  5   ssd  0.90959         osd.5             up  1.00000 1.00000
-10        2.91037     host clusternode3                         
  8   hdd  0.27280         osd.8             up  1.00000 1.00000
  9   hdd  0.27280         osd.9             up  1.00000 1.00000
 14   hdd  0.27280         osd.14            up  1.00000 1.00000
 15   hdd  0.27280         osd.15            up  1.00000 1.00000
  6   ssd  0.90959         osd.6             up  1.00000 1.00000
 12   ssd  0.90959         osd.12            up  1.00000 1.00000
-13        2.91055     host clusternode4                         
 16   hdd  0.54568         osd.16            up  1.00000 1.00000
 17   hdd  0.54568         osd.17            up  1.00000 1.00000
 11   ssd  0.90959         osd.11            up  1.00000 1.00000
 13   ssd  0.90959         osd.13            up  1.00000 1.00000
-16        2.54665     host clusternode5                         
 18   hdd  0.54568         osd.18            up  1.00000 1.00000
 19   hdd  0.54568         osd.19            up  1.00000 1.00000
 20   hdd  0.54568         osd.20            up  1.00000 1.00000
 21   ssd  0.90959         osd.21            up  1.00000 1.00000
-19        2.54665     host clusternode6                         
 23   hdd  0.54568         osd.23            up  1.00000 1.00000
 24   hdd  0.54568         osd.24            up  1.00000 1.00000
 25   hdd  0.54568         osd.25            up  1.00000 1.00000
 22   ssd  0.90959         osd.22            up  1.00000 1.00000
-22        2.18274     host clusternode7                         
 26   hdd  0.54568         osd.26            up  1.00000 1.00000
 27   hdd  0.54568         osd.27            up  1.00000 1.00000
 28   hdd  0.54568         osd.28            up  1.00000 1.00000
 29   hdd  0.54568         osd.29            up  1.00000 1.00000
-25        3.63837     host clusternode8                         
 30   hdd  0.90959         osd.30            up  1.00000 1.00000
 31   hdd  0.90959         osd.31            up  1.00000 1.00000
 32   ssd  0.90959         osd.32            up  1.00000 1.00000
 33   ssd  0.90959         osd.33            up  1.00000 1.00000






Ceph.conf

Code:
root@clusternode8:~# cat /etc/pve/ceph.conf
[global]
         auth client required = cephx
         auth cluster required = cephx
         auth service required = cephx
         cluster network = 10.10.10.0/24
         fsid = 6e3be20d-062e-4a0f-af3f-83c4bd06f048
         keyring = /etc/pve/priv/$cluster.$name.keyring
         mon allow pool delete = true
         osd journal size = 5120
         osd pool default min size = 2
         osd pool default size = 3
         public network = 10.10.10.0/24

[mds]
         keyring = /var/lib/ceph/mds/ceph-$id/keyring

[osd]
         keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.clusternode2]
         host = clusternode2
         mon addr = 10.10.10.2:6789

[mon.clusternode3]
         host = clusternode3
         mon addr = 10.10.10.3:6789

[mon.clusternode1]
         host = clusternode1
         mon addr = 10.10.10.1:6789


Ceph-osd.admin.log

Code:
2019-12-12 07:28:39.180396 7f148b402e00  0 ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable), process ceph-osd, pid 240667
2019-12-12 07:28:39.180792 7f148b402e00  1 journal _open /dev/sdc2 fd 4: 1000098959360 bytes, block size 4096 bytes, directio = 0, aio = 0
2019-12-12 07:28:39.180958 7f148b402e00  1 journal close /dev/sdc2
2019-12-12 07:28:39.180992 7f148b402e00  0 probe_block_device_fsid /dev/sdc2 is filestore, 00000000-0000-0000-0000-000000000000
2019-12-12 07:28:40.131988 7fe1ded89e00  0 ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable), process ceph-osd, pid 240766
2019-12-12 07:28:40.132350 7fe1ded89e00  1 journal _open /dev/sdc2 fd 4: 1000098959360 bytes, block size 4096 bytes, directio = 0, aio = 0
2019-12-12 07:28:40.132502 7fe1ded89e00  1 journal close /dev/sdc2
2019-12-12 07:28:40.132551 7fe1ded89e00  0 probe_block_device_fsid /dev/sdc2 is filestore, 00000000-0000-0000-0000-000000000000
2019-12-12 07:28:40.746901 7f89b9ed1e00  0 ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable), process ceph-osd, pid 240810
2019-12-12 07:28:40.747260 7f89b9ed1e00  1 journal _open /dev/sdc2 fd 4: 1000098959360 bytes, block size 4096 bytes, directio = 0, aio = 0
2019-12-12 07:28:40.747420 7f89b9ed1e00  1 journal close /dev/sdc2
2019-12-12 07:28:40.747464 7f89b9ed1e00  0 probe_block_device_fsid /dev/sdc2 is filestore, 00000000-0000-0000-0000-000000000000
2019-12-12 07:28:41.272088 7f3187667e00  0 ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable), process ceph-osd, pid 240848
2019-12-12 07:28:41.272463 7f3187667e00  1 journal _open /dev/sdc2 fd 4: 1000098959360 bytes, block size 4096 bytes, directio = 0, aio = 0
2019-12-12 07:28:41.272644 7f3187667e00  1 journal close /dev/sdc2
2019-12-12 07:28:41.272688 7f3187667e00  0 probe_block_device_fsid /dev/sdc2 is filestore, 00000000-0000-0000-0000-000000000000
2019-12-12 07:28:41.789697 7f7525e72e00  0 ceph version 12.2.12 (39cfebf25a7011204a9876d2950e4b28aba66d11) luminous (stable), process ceph-osd, pid 240929
2019-12-12 07:28:41.790097 7f7525e72e00  1 journal _open /dev/sdc2 fd 4: 1000098959360 bytes, block size 4096 bytes, directio = 0, aio = 0
2019-12-12 07:28:41.790254 7f7525e72e00  1 journal close /dev/sdc2
2019-12-12 07:28:41.790294 7f7525e72e00  0 probe_block_device_fsid /dev/sdc2 is filestore, 00000000-0000-0000-0000-000000000000
 
Did you reboot the node and see if it resides? And afterwards, can you try to create an OSD on the CLI, pveceph osd create?
 
I am not in a position to reboot the node yet because it is in production and no space on the other nodes. I am planning to add another node in the following week. I was just wondering if there is not a service I can reboot that might resolve this as my current pool utilization is at 90%. I will revert back once I have rebooted the node.
 
I am not in a position to reboot the node yet because it is in production and no space on the other nodes.
This sounds alarming, regardless of the current issue. IISC, when one node dies, then there would not be enough space to replicate the undersized + new data.
 
I have added a new node and created additional OSD's. Migrated all the VM's and rebooted the problematic node, the reboot fixed my issue. Thank you.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!