Hi all
My cluster consists of 6 nodes with 3 OSDs each (18 OSDs total), pve 6.2-6 and ceph 14.2.9. BTW, it's been up and running fine for 7 months now and went through all updates flawlessly so far.
However, after rebooting the nodes one after the other upon updating to 6.2-6, the 3 OSDs on one nodes didn't come up again. After ceph was back to clean, and the 3 OSDs being "out", I decided to destroy them; and waitet for the clean-state again. Then (on the respective node) I tried
The attempt to add the OSD anyway using
also failed, returning:
The same happens with the other two HDDs in that node (/dev/sdb and /dev/sdc) So, I'm kind'a stuck - and I'd appreciate any hint & help on this
Kind regards
lucentwolf
My cluster consists of 6 nodes with 3 OSDs each (18 OSDs total), pve 6.2-6 and ceph 14.2.9. BTW, it's been up and running fine for 7 months now and went through all updates flawlessly so far.
However, after rebooting the nodes one after the other upon updating to 6.2-6, the 3 OSDs on one nodes didn't come up again. After ceph was back to clean, and the 3 OSDs being "out", I decided to destroy them; and waitet for the clean-state again. Then (on the respective node) I tried
- ceph-volume lvm zap /dev/sda --destroy
Code:
Traceback (most recent call last):
File "/usr/sbin/ceph-volume", line 6, in <module>
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 83, in <module>
__import__('pkg_resources.extern.packaging.specifiers')
File "/usr/lib/python2.7/dist-packages/pkg_resources/extern/__init__.py", line 43, in load_module
__import__(extant)
ValueError: bad marshal data (unknown type code)
The attempt to add the OSD anyway using
- pveceph osd create /dev/sda
also failed, returning:
Code:
wipe disk/partition: /dev/sda
200+0 records in
200+0 records out
209715200 bytes (210 MB, 200 MiB) copied, 1.0918 s, 192 MB/s
Traceback (most recent call last):
File "/sbin/ceph-volume", line 6, in <module>
from pkg_resources import load_entry_point
File "/usr/lib/python2.7/dist-packages/pkg_resources/__init__.py", line 83, in <module>
__import__('pkg_resources.extern.packaging.specifiers')
File "/usr/lib/python2.7/dist-packages/pkg_resources/extern/__init__.py", line 43, in load_module
__import__(extant)
ValueError: bad marshal data (unknown type code)
command 'ceph-volume lvm create --cluster-fsid a8d6705a-74c4-4904-9000-0db5742043fc --data /dev/sda' failed: exit code 1
The same happens with the other two HDDs in that node (/dev/sdb and /dev/sdc) So, I'm kind'a stuck - and I'd appreciate any hint & help on this
Kind regards
lucentwolf
Last edited: