Dear Community,
I have been using Proxmox for many years, and our DC grew. So I decided to get CEPH implemented to have a better density on my Proxmox nodes. Currently wer are running 7 Nodes, and on 4 Nodes i have installed CEPH, ecah of thos 4 Nodes has 2 OSDs, the journal device is one SSD for the two OSD disks.
It happend what was prone to happen and one OSD failed. The cluster did wat is was supposed to do, it started distributing the data so the 1:3 ration wich is set in my pool is kept. so far so good.
i removed the OSD usind the proxmox gui, then the cluster started to distribute data even more, causing some vms to be unreachable.
Now i want to get the OSD back, so i replaced and formated the failed disk. then is did:
pveceph createosd /dev/sdd -journal_dev /dev/sdc2
(/dev/sdd is the spinning drive and /dev/sdc2 is the second partition of the SSD drive)
I get the following error:
=== SNIP ===
create OSD on /dev/sdd (xfs)
using device '/dev/sdc2' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
WARNING:ceph-disk:Journal /dev/sdc2 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sdd1 isize=2048 agcount=4, agsize=122094597 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', 'xfs', '-o', 'rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M', '--', '/dev/sdd1', '/var/lib/ceph/tmp/mnt.yZ_7xR']' returned non-zero exit status 32
command 'ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid 07b29f85-04b6-4788-89e6-703bda8f0f33 --journal-dev /dev/sdd /dev/sdc2' failed: exit code 1
=== SNAP ===
I wonder if i have to zap the journal partition of the SSD, to get this working. I have two partitions on that journald SSD, so i am not sure if I zap, then thw whole SSD gets zapped and i loose another OSD
Can somebody help me please i am a bit desperate?
Leihnix
I have been using Proxmox for many years, and our DC grew. So I decided to get CEPH implemented to have a better density on my Proxmox nodes. Currently wer are running 7 Nodes, and on 4 Nodes i have installed CEPH, ecah of thos 4 Nodes has 2 OSDs, the journal device is one SSD for the two OSD disks.
It happend what was prone to happen and one OSD failed. The cluster did wat is was supposed to do, it started distributing the data so the 1:3 ration wich is set in my pool is kept. so far so good.
i removed the OSD usind the proxmox gui, then the cluster started to distribute data even more, causing some vms to be unreachable.
Now i want to get the OSD back, so i replaced and formated the failed disk. then is did:
pveceph createosd /dev/sdd -journal_dev /dev/sdc2
(/dev/sdd is the spinning drive and /dev/sdc2 is the second partition of the SSD drive)
I get the following error:
=== SNIP ===
create OSD on /dev/sdd (xfs)
using device '/dev/sdc2' for journal
Caution: invalid backup GPT header, but valid main header; regenerating
backup header from main header.
****************************************************************************
Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
verification and recovery are STRONGLY recommended.
****************************************************************************
GPT data structures destroyed! You may now partition the disk using fdisk or
other utilities.
Creating new GPT entries.
The operation has completed successfully.
WARNING:ceph-disk:OSD will not be hot-swappable if journal is not the same device as the osd data
WARNING:ceph-disk:Journal /dev/sdc2 was not prepared with ceph-disk. Symlinking directly.
Setting name!
partNum is 0
REALLY setting name!
The operation has completed successfully.
meta-data=/dev/sdd1 isize=2048 agcount=4, agsize=122094597 blks
= sectsz=512 attr=2, projid32bit=1
= crc=0 finobt=0
data = bsize=4096 blocks=488378385, imaxpct=5
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=0
log =internal log bsize=4096 blocks=238466, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
mount: wrong fs type, bad option, bad superblock on /dev/sdd1,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so.
ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', 'xfs', '-o', 'rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M', '--', '/dev/sdd1', '/var/lib/ceph/tmp/mnt.yZ_7xR']' returned non-zero exit status 32
command 'ceph-disk prepare --zap-disk --fs-type xfs --cluster ceph --cluster-uuid 07b29f85-04b6-4788-89e6-703bda8f0f33 --journal-dev /dev/sdd /dev/sdc2' failed: exit code 1
=== SNAP ===
I wonder if i have to zap the journal partition of the SSD, to get this working. I have two partitions on that journald SSD, so i am not sure if I zap, then thw whole SSD gets zapped and i loose another OSD
Can somebody help me please i am a bit desperate?
Leihnix