Proxmox 4.2 & CEPH Hammer, create OSD failed

leihnix · Jun 27, 2016

Dear Community,

I have been using Proxmox for many years, and our DC grew. So I decided to get CEPH implemented to have a better density on my Proxmox nodes. Currently wer are running 7 Nodes, and on 4 Nodes i have installed CEPH, ecah of thos 4 Nodes has 2 OSDs, the journal device is one SSD for the two OSD disks.

It happend what was prone to happen and one OSD failed. The cluster did wat is was supposed to do, it started distributing the data so the 1:3 ration wich is set in my pool is kept. so far so good.

i removed the OSD usind the proxmox gui, then the cluster started to distribute data even more, causing some vms to be unreachable.

Now i want to get the OSD back, so i replaced and formated the failed disk. then is did:

pveceph createosd /dev/sdd -journal_dev /dev/sdc2
(/dev/sdd is the spinning drive and /dev/sdc2 is the second partition of the SSD drive)

I get the following error:

=== SNIP ===
create OSD on /dev/sdd (xfs)
using device '/dev/sdc2' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.

****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
WARNING:ceph-disk:Journal /dev/sdc2 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sdd1 isize=2048 agcount=4, agsize=122094597 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
missing codepage or helper program, or other error

In some cases useful info is found in syslog - try
dmesg | tail or so.
ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', 'xfs', '-o', 'rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M', '--', '/dev/sdd1', '/var/lib/ceph/tmp/mnt.yZ_7xR']' returned non-zero exit status 32
command 'ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid 07b29f85-04b6-4788-89e6-703bda8f0f33 --journal-dev /dev/sdd /dev/sdc2' failed: exit code 1

=== SNAP ===

I wonder if i have to zap the journal partition of the SSD, to get this working. I have two partitions on that journald SSD, so i am not sure if I zap, then thw whole SSD gets zapped and i loose another OSD

Can somebody help me please i am a bit desperate?

Leihnix

leihnix · Jun 27, 2016

I have been reading the following thread:
https://forum.proxmox.com/threads/ceph-create-osd.27154/

symmcom states:
=
1. Stop OSD : ceph osd down osd.1 (it was already down)
2. Out OSD : ceph osd out osd.1 (it was already out)
3. Remove OSD : ceph osd rm osd.1 (was removed before)
4. Remove Authentication : ceph auth del osd.1 (was deleted already)
In some cases i had to manually delete the old Osd folder in /var/lib/ceph/<osd folder> (removed folder an retired creating the OSD)
=

tried that, no success, I get the same error message.

meanwhile thanks to Udo, I implemented some ceph configs to reduce the impact while recovering

[osd]
osd max backfills = 1
osd recovery max active = 1
osd_disk_threads = 1
osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7

dcsapak · Jun 28, 2016

what prints

Code:

dmesg | tail

directly after the error?

leihnix · Jun 28, 2016

Dominik, thank you for answering.

/var/log# dmesg | tail
[868527.894393] sdd:
[868560.021687] Alternate GPT is invalid, using primary GPT.
[868560.021697] sdd:
[868562.520757] sdd:
[868562.571841] sdd:
[868562.611262] sdd:
[868562.769870] sdd:
[868563.899798] sdd: sdd1
[868564.027464] sdd: sdd1
[868575.317410] XFS (sdd1): unknown mount option [delaylog].

Best regards,
Leihnix

dcsapak · Jun 28, 2016

could you please post your versions with

Code:

pveversion -v

leihnix · Jun 28, 2016

/var/log# pveversion -v
proxmox-ve: 4.2-52 (running kernel: 4.4.8-1-pve)
pve-manager: 4.2-11 (running version: 4.2-11/2c626aa1)
pve-kernel-4.4.8-1-pve: 4.4.8-52
pve-kernel-4.2.8-1-pve: 4.2.8-41
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-40
qemu-server: 4.0-79
pve-firmware: 1.1-8
libpve-common-perl: 4.0-67
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-51
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-19
pve-container: 1.0-67
pve-firewall: 2.0-29
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
fence-agents-pve: not correctly installed
ceph: 0.94.7-1~bpo80+1

dcsapak · Jun 28, 2016

could you also post your ceph config?
afaik the delaylog mount option was removed for xfs, but i don't know where it comes from since i cannot reproduce it here

leihnix · Jun 28, 2016

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 172.20.44.0/22
filestore xattr use omap = true
fsid = 07b29f85-04b6-4788-89e6-703bda8f0f33
keyring = /etc/pve/priv/$cluster.$name.keyring
osd journal size = 5120
osd pool default min size = 1
public network = 172.20.44.0/22

#osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
#osd_op_threads = 4
#osd_disk_threads = 4

# Disable in-memory logs
debug_lockdep = 0/0
debug_context = 0/0
debug_crush = 0/0
debug_buffer = 0/0
debug_timer = 0/0
debug_filer = 0/0
debug_objecter = 0/0
debug_rados = 0/0
debug_rbd = 0/0
debug_journaler = 0/0
debug_objectcatcher = 0/0
debug_client = 0/0
debug_osd = 0/0
debug_optracker = 0/0
debug_objclass = 0/0
debug_filestore = 0/0
debug_journal = 0/0
debug_ms = 0/0
debug_monc = 0/0
debug_tp = 0/0
debug_auth = 0/0
debug_finisher = 0/0
debug_heartbeatmap = 0/0
debug_perfcounter = 0/0
debug_asok = 0/0
debug_throttle = 0/0
debug_mon = 0/0
debug_paxos = 0/0
debug_rgw = 0/0

[osd]
osd max backfills = 1
osd recovery max active = 1
osd_disk_threads = 1
osd_disk_thread_ioprio_class = idle
osd_disk_thread_ioprio_priority = 7
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M
osd_op_threads = 4

keyring = /var/lib/ceph/osd/ceph-$id/keyring
# debug lockdep = 0/0
# debug context = 0/0
# debug crush = 0/0
# debug buffer = 0/0
# debug timer = 0/0
# debug journaler = 0/0
# debug osd = 0/0
# debug optracker = 0/0
# debug objclass = 0/0
# debug filestore = 0/0
# debug journal = 0/0
# debug ms = 0/0
# debug monc = 0/0
# debug tp = 0/0
# debug auth = 0/0
# debug finisher = 0/0
# debug heartbeatmap = 0/0
# debug perfcounter = 0/0
# debug asok = 0/0
# debug throttle = 0/0
[mon.0]
host = vmhost01
mon addr = 172.20.44.40:6789

debug lockdep = 0/0
debug context = 0/0
debug crush = 0/0
debug buffer = 0/0
debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0
[mon.2]
host = vmhost03
mon addr = 172.20.44.42:6789

debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

[mon.1]
host = vmhost02
mon addr = 172.20.44.41:6789

debug timer = 0/0
debug journaler = 0/0
debug osd = 0/0
debug optracker = 0/0
debug objclass = 0/0
debug filestore = 0/0
debug journal = 0/0
debug ms = 0/0
debug monc = 0/0
debug tp = 0/0
debug auth = 0/0
debug finisher = 0/0
debug heartbeatmap = 0/0
debug perfcounter = 0/0
debug asok = 0/0
debug throttle = 0/0

leihnix · Jun 28, 2016

Hello Dominik,

you were right.

I have changed

FROM:
osd mount options xfs = rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M

TO:
osd mount options xfs = rw,noatime,inode64,logbsize=256k,allocsize=4M

Its working now. Ceph is happily accepting the new OSD and distributing the data to the new OSD.

Thank you very much for your time and the hint!

lynn_yudi · Sep 13, 2016

hi, with lastest pve and ceph 0.94.9, can't create osd, too.

Code:

# pveversion -v
proxmox-ve: 4.2-64 (running kernel: 4.4.16-1-pve)
pve-manager: 4.2-18 (running version: 4.2-18/158720b9)
pve-kernel-4.4.16-1-pve: 4.4.16-64
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-44
qemu-server: 4.0-86
pve-firmware: 1.1-9
libpve-common-perl: 4.0-72
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-57
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-qemu-kvm: 2.6.1-2
pve-container: 1.0-73
pve-firewall: 2.0-29
pve-ha-manager: 1.0-33
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u2
lxc-pve: 2.0.4-1
lxcfs: 2.0.3-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
novnc-pve: 0.5-8
zfsutils: 0.6.5.7-pve10~bpo80
openvswitch-switch: 2.5.0-1
ceph: 0.94.9-1~bpo80+1

# ceph -v
ceph version 0.94.9 (fe6d859066244b97b24f09d46552afc2071e6f90)

Code:

# cat /etc/pve/ceph.conf
[global]
  auth client required = cephx
  auth cluster required = cephx
  auth service required = cephx
  auth supported = cephx
  cluster network = 192.168.0.0/16
  filestore xattr use omap = true
  fsid = babd2e4d-a6b9-4c21-9b46-98bc87cbe28d
  keyring = /etc/pve/priv/$cluster.$name.keyring
  max open files = 131072
  mon clock drift allowed = 1
  mon clock drift warn backoff = 30
  mon osd down out interval = 600
  mon osd full ratio = .95
  mon osd nearfull ratio = .75
  mon osd report timeout = 300
  osd journal size = 20480
  osd pool default min size = 1
  osd pool default size = 2
  public network = 192.168.0.0/16

[osd]
  filestore max sync interval = 15
  filestore min sync interval = 10
  filestore queue committing max bytes = 10485760000
  filestore queue committing max ops = 5000
  filestore queue max bytes = 10485760
  filestore queue max ops = 25000
  journal max write bytes = 1073714824
  journal max write entries = 10000
  journal queue max bytes = 10485760000
  journal queue max ops = 50000
  keyring = /var/lib/ceph/osd/ceph-$id/keyring
  osd client message size cap = 2147483648
  osd deep scrub stride = 131072
  osd disk threads = 4
  osd map cache bl size = 128
  osd map cache size = 1024
  osd max backfills = 4
  osd max write size = 512
  osd mkfs options xfs = -f
  osd mkfs type = xfs
  osd mount options xfs = "rw,noatime,inode64,logbsize=256k,allocsize=4M,nodiratime,nobarrier"
  osd op threads = 8
  osd recovery max active = 10
  osd recovery op priority = 4
  rbd cache = true
  rbd cache max dirty = 134217728
  rbd cache max dirty age = 5
  rbd cache size = 268435456
  rbd cache writethrough until flush = false

[mon.2]
  host = test03
  mon addr = 192.168.7.5:6789

[mon.0]
  host = test01
  mon addr = 192.168.7.1:6789

[mon.1]
  host = test02
  mon addr = 192.168.7.3:6789

jeffwadsworth · Sep 13, 2016

lynn_yudi said:
hi, with lastest pve and ceph 0.94.9, can't create osd, too.

[]

Hello. You should create a new thread for this issue.

lynn_yudi · Sep 13, 2016

jeffwadsworth said:
Hello. You should create a new thread for this issue.

thanks

Search

Search

Proxmox 4.2 & CEPH Hammer, create OSD failed

leihnix

Member

leihnix

Member

dcsapak

Proxmox Staff Member

leihnix

Member

dcsapak

Proxmox Staff Member

leihnix

Member

dcsapak

Proxmox Staff Member

leihnix

Member

leihnix

Member

lynn_yudi

Active Member

jeffwadsworth

Member

lynn_yudi

Active Member