I'm pretty new to ceph so bear with me. I set up a 3 node hyperconverged cluster on proxmox 6.1 with (4) 900gb 10k disks on each and they're communicating on a 10Gb mesh network in broadcast mode. I recently notice a ton of errors on a specific disk (osd.2) and ceph automatically marked it down and out. So I clicked destroy in the GUI and replaced it with a new disk. I tried using the GUI to create a new OSD and it keeps failing. I'm not sure what's wrong or what to do next.
I also noticed this in my status page, it still shows 12 disks and makes reference to osd.2 even though I destroyed it.
here's the error I'm getting...
here's my pveversion -v...
I also noticed this in my status page, it still shows 12 disks and makes reference to osd.2 even though I destroyed it.
here's the error I'm getting...
Code:
create OSD on /dev/sde (bluestore)
wipe disk/partition: /dev/sde
/bin/dd: fdatasync failed for '/dev/sde': Input/output error
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 3.15714 s, 66.4 MB/s
command '/bin/dd 'if=/dev/zero' 'bs=1M' 'conv=fdatasync' 'count=200' 'of=/dev/sde'' failed: exit code 1
Running command: /bin/ceph-authtool --gen-print-key
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new 77126751-3c89-4057-abc7-dcc9e7db2fce
Running command: /usr/sbin/vgcreate -s 1G --force --yes ceph-9f7f3565-994e-493a-8ac3-3df60cbb55fd /dev/sde
stderr: Error writing device /dev/sde at 4096 length 4096.
stderr: bcache_invalidate: block (5, 0) still dirty
Failed to wipe new metadata area on /dev/sde at 4096 len 4096
Failed to add metadata area for new physical volume /dev/sde
Failed to setup physical volume "/dev/sde".
--> Was unable to complete a new OSD, will rollback changes
Running command: /bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.12 --yes-i-really-mean-it
stderr: 2020-02-24 22:01:09.260 7fb65483f700 -1 auth: unable to find a keyring on /etc/pve/priv/ceph.client.bootstrap-osd.keyring: (2) No such file or directory
2020-02-24 22:01:09.260 7fb65483f700 -1 AuthRegistry(0x7fb650080e78) no keyring found at /etc/pve/priv/ceph.client.bootstrap-osd.keyring, disabling cephx
stderr: purged osd.12
--> RuntimeError: command returned non-zero exit status: 5
TASK ERROR: command 'ceph-volume lvm create --cluster-fsid 31cf4c55-e5ad-42b3-b122-f5af2b6f4d1d --data /dev/sde' failed: exit code 1
here's my pveversion -v...
root@pxmx1:~# pveversion -v
proxmox-ve: 6.1-2 (running kernel: 5.3.13-2-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-2
pve-kernel-helper: 6.1-2
pve-kernel-5.3.13-2-pve: 5.3.13-2
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph: 14.2.6-pve1
ceph-fuse: 14.2.6-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 2.0.1-1+pve2
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-10
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-2
pve-cluster: 6.1-3
pve-container: 3.0-18
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191002-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-4
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1