Failing to add an OSD

fibo_fr

Member
Apr 21, 2014
40
2
8
Remoulins, France
I have repeating failure at installing an osd on one node.
- installing thru GUI seems to work... but the OSD is not visible
- installing thru command line seems to work... but the OSD is not visible either
eg::
# ceph-disk zap /dev/sdb
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.

Then
# fdisk -l
...
Disk /dev/sdb: 3.7 TiB, 4000225165312 bytes, 7812939776 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: xxxxxxxxx
....
Then
# pveceph createosd /dev/sdb
command '/sbin/zpool list -HPLv' failed: open3: exec of /sbin/zpool list -HPLv failed: No such file or directory at /usr/share/perl5/PVE/Tools.pm line 429.

create OSD on /dev/sdb (bluestore)
wipe disk: /dev/sdb
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 0.205136 s, 1.0 GB/s
Creating new GPT entries.
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
Setting name!
partNum is 1
REALLY setting name!
The operation has completed successfully.
The operation has completed successfully.
meta-data=/dev/sdb1 isize=2048 agcount=4, agsize=6400 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0
data = bsize=4096 blocks=25600, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal log bsize=4096 blocks=864, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Warning: The kernel is still using the old partition table.
The new table will be used at the next reboot or after you
run partprobe(8) or kpartx(8)
The operation has completed successfully.

I reboot (same with partprobe)... and the OSD is still not visible in GUI

Additional info:
The cluster has 3 nodes.
This node has one additional HD to create the OSD
The other 2 nodes each have 2 additional HD, each one being one OSD

Any suggestion / hint / question?
 

tom

Proxmox Staff Member
Staff member
Aug 29, 2006
15,659
973
173
please add your:

> pveversion -v
 

fibo_fr

Member
Apr 21, 2014
40
2
8
Remoulins, France
# pveversion -v
proxmox-ve: 5.3-1 (running kernel: 4.15.18-9-pve)
pve-manager: 5.3-5 (running version: 5.3-5/97ae681d)
pve-kernel-4.15: 5.2-12
pve-kernel-4.13: 5.2-2
pve-kernel-4.15.18-9-pve: 4.15.18-30
pve-kernel-4.15.18-1-pve: 4.15.18-19
pve-kernel-4.15.17-3-pve: 4.15.17-14
pve-kernel-4.15.17-2-pve: 4.15.17-10
pve-kernel-4.15.17-1-pve: 4.15.17-9
pve-kernel-4.13.16-4-pve: 4.13.16-51
pve-kernel-4.13.16-3-pve: 4.13.16-50
pve-kernel-4.13.16-2-pve: 4.13.16-48
pve-kernel-4.13.16-1-pve: 4.13.16-46
pve-kernel-4.13.13-6-pve: 4.13.13-42
pve-kernel-4.13.13-5-pve: 4.13.13-38
pve-kernel-4.4.98-5-pve: 4.4.98-105
pve-kernel-4.4.98-3-pve: 4.4.98-103
pve-kernel-4.4.83-1-pve: 4.4.83-96
pve-kernel-4.4.59-1-pve: 4.4.59-87
pve-kernel-4.4.40-1-pve: 4.4.40-82
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.24-1-pve: 4.4.24-72
pve-kernel-4.4.15-1-pve: 4.4.15-60
pve-kernel-4.4.13-2-pve: 4.4.13-58
pve-kernel-4.4.13-1-pve: 4.4.13-56
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.2.8-1-pve: 4.2.8-41
ceph: 12.2.8-pve1
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-3
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-43
libpve-guest-common-perl: 2.0-18
libpve-http-server-perl: 2.0-11
libpve-storage-perl: 5.0-33
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.0.2+pve1-5
lxcfs: 3.0.2-2
novnc-pve: 1.0.0-2
proxmox-widget-toolkit: 1.0-22
pve-cluster: 5.0-31
pve-container: 2.0-31
pve-docs: 5.3-1
pve-edk2-firmware: 1.20181023-1
pve-firewall: 3.0-16
pve-firmware: 2.0-6
pve-ha-manager: 2.0-5
pve-i18n: 1.0-9
pve-libspice-server1: 0.14.1-1
pve-qemu-kvm: 2.12.1-1
pve-xtermjs: 1.0-5
qemu-server: 5.0-43
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
 

Alwin

Proxmox Retired Staff
Retired Staff
Aug 1, 2017
4,617
458
88
Are you using a raid controller?
 

fibo_fr

Member
Apr 21, 2014
40
2
8
Remoulins, France
The machine has 2SSDs in hardware Raid 10.
It also has an extra HD which is connected in non-raid to the raid controller
 
Last edited:

fibo_fr

Member
Apr 21, 2014
40
2
8
Remoulins, France
The other 2 machines in the cluster have a similar config (but different hardware models): 2 SSDs in RAID 10 with hard controller, and 2 extra HD connected in non-Raid mode on the RAID-controller, each HD hosting an OSD (so the clster is 3 hosts, and aims at 5 OSDs, currently 2x2=4 OSDs)
 

fibo_fr

Member
Apr 21, 2014
40
2
8
Remoulins, France
Thx for the link.
My understanding of this text was "do not use RAID disk-arrays as Ceph disks", not "do not user Raid controllers even in non-Raid mode".

It so happens that previuosly I had configured and used Ceph with this computer and disk and 2 other hosts each with 1 additional HD. One of the failed, so the ceph cluster became unusable, hence I changed the "2 others" hosts to their present config with 2 extra disk each. The Ceph cluster seems to work with these 2+2 HD, but I don't want to use it because if any probleme occurs on one of the 2 hosts the Ceph will become unusable...

Any other track I might explore to get this HD visible as an OSD (sine it seems that formatting it as an OSD works)?
 

Alwin

Proxmox Retired Staff
Retired Staff
Aug 1, 2017
4,617
458
88
The whole paragraph in the link has the bottom line "Avoid RAID controller, use host bus adapter (HBA) instead." While the controller my present the disk as a non-RAID device, it very well keeps optimizing in the background. We have other people in the forum writing the same behavior with their disks on RAID controller, when they are in non-RAID mode.
 

Alwin

Proxmox Retired Staff
Retired Staff
Aug 1, 2017
4,617
458
88

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!