Proxmox 4.2 & CEPH Hammer, create OSD failed

leihnix

Member
Mar 22, 2016
26
3
23
Dear Community,

I have been using Proxmox for many years, and our DC grew. So I decided to get CEPH implemented to have a better density on my Proxmox nodes. Currently wer are running 7 Nodes, and on 4 Nodes i have installed CEPH, ecah of thos 4 Nodes has 2 OSDs, the journal device is one SSD for the two OSD disks.

It happend what was prone to happen and one OSD failed. The cluster did wat is was supposed to do, it started distributing the data so the 1:3 ration wich is set in my pool is kept. so far so good.

i removed the OSD usind the proxmox gui, then the cluster started to distribute data even more, causing some vms to be unreachable.

Now i want to get the OSD back, so i replaced and formated the failed disk. then is did:

pveceph createosd /dev/sdd -journal_dev /dev/sdc2
(/dev/sdd is the spinning drive and /dev/sdc2 is the second partition of the SSD drive)

I get the following error:

=== SNIP ===
create OSD on /dev/sdd (xfs)
using device '/dev/sdc2' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
WARNING:ceph-disk:Journal /dev/sdc2 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sdd1 isize=2048 agcount=4, agsize=122094597 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.
ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', 'xfs', '-o', 'rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M', '--', '/dev/sdd1', '/var/lib/ceph/tmp/mnt.yZ_7xR']' returned non-zero exit status 32
command 'ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid 07b29f85-04b6-4788-89e6-703bda8f0f33 --journal-dev /dev/sdd /dev/sdc2' failed: exit code 1

=== SNAP ===

I wonder if i have to zap the journal partition of the SSD, to get this working. I have two partitions on that journald SSD, so i am not sure if I zap, then thw whole SSD gets zapped and i loose another OSD :(

Can somebody help me please i am a bit desperate?

Leihnix
 
I have been reading the following thread:
https://forum.proxmox.com/threads/ceph-create-osd.27154/

symmcom states:
=
1. Stop OSD : ceph osd down osd.1 (it was already down)
2. Out OSD : ceph osd out osd.1 (it was already out)
3. Remove OSD : ceph osd rm osd.1 (was removed before)
4. Remove Authentication : ceph auth del osd.1 (was deleted already)
In some cases i had to manually delete the old Osd folder in /var/lib/ceph/<osd folder> (removed folder an retired creating the OSD)
=

tried that, no success, I get the same error message.

meanwhile thanks to Udo, I implemented some ceph configs to reduce the impact while recovering :)

[osd]
osd max backfills = 1
osd recovery max active = 1
osd_disk_threads = 1
osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7
 
Last edited:
what prints
Code:
dmesg | tail
directly after the error?
 
Dominik, thank you for answering.

/var/log# dmesg | tail
[868527.894393] sdd:
[868560.021687] Alternate GPT is invalid, using primary GPT.
[868560.021697] sdd:
[868562.520757] sdd:
[868562.571841] sdd:
[868562.611262] sdd:
[868562.769870] sdd:
[868563.899798] sdd: sdd1
[868564.027464] sdd: sdd1
[868575.317410] XFS (sdd1): unknown mount option [delaylog].

Best regards,
Leihnix
 
could you please post your versions with
Code:
pveversion -v
 
/var/log# pveversion -v
proxmox-ve: 4.2-52 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
fence-agents-pve: not correctly installed
ceph: 0.94.7-1~bpo80+1
 
could you also post your ceph config?
afaik the delaylog mount option was removed for xfs, but i don't know where it comes from since i cannot reproduce it here
 
[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 172.20.44.0/22
filestore xattr use omap = true
fsid = 07b29f85-04b6-4788-89e6-703bda8f0f33
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 172.20.44.0/22

#osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
#osd_op_threads = 4
#osd_disk_threads = 4

# Disable in-memory logs
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

[osd]
osd max backfills = 1
osd recovery max active = 1
osd_disk_threads = 1
osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
osd_op_threads = 4

keyring = /var/lib/ceph/osd/ceph-$id/keyring
# debug lockdep = 0/0
# debug context = 0/0
# debug crush = 0/0
# debug buffer = 0/0
# debug timer = 0/0
# debug journaler = 0/0
# debug osd = 0/0
# debug optracker = 0/0
# debug objclass = 0/0
# debug filestore = 0/0
# debug journal = 0/0
# debug ms = 0/0
# debug monc = 0/0
# debug tp = 0/0
# debug auth = 0/0
# debug finisher = 0/0
# debug heartbeatmap = 0/0
# debug perfcounter = 0/0
# debug asok = 0/0
# debug throttle = 0/0
[mon.0]
host = vmhost01
mon addr = 172.20.44.40:6789

debug lockdep = 0/0
debug context = 0/0
debug crush = 0/0
debug buffer = 0/0
debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0
[mon.2]
host = vmhost03
mon addr = 172.20.44.42:6789

debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

[mon.1]
host = vmhost02
mon addr = 172.20.44.41:6789

debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0
 
Hello Dominik,

you were right.

I have changed

FROM:
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

TO:
osd mount options xfs = rw,noatime,inode64,logbsize=256k,allocsize=4M

Its working now. Ceph is happily accepting the new OSD and distributing the data to the new OSD.

Thank you very much for your time and the hint!
 
hi, with lastest pve and ceph 0.94.9, can't create osd, too.

Code:
# pveversion -v
proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.16-1-pve: 4.4.16-64
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-44
qemu-server: 4.0-86
pve-firmware: 1.1-9
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-57
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-2
pve-container: 1.0-73
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
openvswitch-switch: 2.5.0-1
ceph: 0.94.9-1~bpo80+1

# ceph -v
ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)

Code:
# cat /etc/pve/ceph.conf
[global]
  auth client required = cephx
  auth cluster required = cephx
  auth service required = cephx
  auth supported = cephx
  cluster network = 192.168.0.0/16
  filestore xattr use omap = true
  fsid = babd2e4d-a6b9-4c21-9b46-98bc87cbe28d
  keyring = /etc/pve/priv/$cluster.$name.keyring
  max open files = 131072
  mon clock drift allowed = 1
  mon clock drift warn backoff = 30
  mon osd down out interval = 600
  mon osd full ratio = .95
  mon osd nearfull ratio = .75
  mon osd report timeout = 300
  osd journal size = 20480
  osd pool default min size = 1
  osd pool default size = 2
  public network = 192.168.0.0/16

[osd]
  filestore max sync interval = 15
  filestore min sync interval = 10
  filestore queue committing max bytes = 10485760000
  filestore queue committing max ops = 5000
  filestore queue max bytes = 10485760
  filestore queue max ops = 25000
  journal max write bytes = 1073714824
  journal max write entries = 10000
  journal queue max bytes = 10485760000
  journal queue max ops = 50000
  keyring = /var/lib/ceph/osd/ceph-$id/keyring
  osd client message size cap = 2147483648
  osd deep scrub stride = 131072
  osd disk threads = 4
  osd map cache bl size = 128
  osd map cache size = 1024
  osd max backfills = 4
  osd max write size = 512
  osd mkfs options xfs = -f
  osd mkfs type = xfs
  osd mount options xfs = "rw,noatime,inode64,logbsize=256k,allocsize=4M,nodiratime,nobarrier"
  osd op threads = 8
  osd recovery max active = 10
  osd recovery op priority = 4
  rbd cache = true
  rbd cache max dirty = 134217728
  rbd cache max dirty age = 5
  rbd cache size = 268435456
  rbd cache writethrough until flush = false

[mon.2]
  host = test03
  mon addr = 192.168.7.5:6789

[mon.0]
  host = test01
  mon addr = 192.168.7.1:6789

[mon.1]
  host = test02
  mon addr = 192.168.7.3:6789
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!