Cephs not working and rbd locked problem

Mar 23, 2024
24
11
3
Hello!

We have recently installed a PVE cluster with three nodes and want to use Cephs. The hosts are (almost) identical. Host 3 has a different network interface.

We have a public network channel and an exclusive cluster network channel for Ceph.

Each host has four discs for OSDs.

Creating a VM or LXC ends in
Code:
TASK ERROR: unable to create CT 100 - rbd error: 'storage-ceph-vsan'-locked command timed out - aborting

The created OSDs also go down from time to time and then come up again after a few minutes. There are some errors in ceph health detail and the status is HEALTH_WARN.

Attached some files and screenshots of config and logs.

Can someone help me where to start troubleshooting? I've read quite a bit here in the forum and elsewhere on the net, but I can't find a clue. I would be very pleased as we want to use the system soon. Thank you!
 

Attachments

  • some_ceph_log.txt
    some_ceph_log.txt
    14.1 KB · Views: 2
  • ceph_health_detail.txt
    ceph_health_detail.txt
    5.3 KB · Views: 2
  • interfaces.txt
    interfaces.txt
    1.9 KB · Views: 1
  • ceph_config.txt
    ceph_config.txt
    1.1 KB · Views: 1
  • CleanShot 2024-03-23 at 18.25.07@2x.png
    CleanShot 2024-03-23 at 18.25.07@2x.png
    208.8 KB · Views: 9
  • CleanShot 2024-03-23 at 18.24.51@2x.png
    CleanShot 2024-03-23 at 18.24.51@2x.png
    118.9 KB · Views: 9
  • CleanShot 2024-03-23 at 18.24.12@2x.png
    CleanShot 2024-03-23 at 18.24.12@2x.png
    141.9 KB · Views: 8
  • CleanShot 2024-03-23 at 18.24.00@2x.png
    CleanShot 2024-03-23 at 18.24.00@2x.png
    274.7 KB · Views: 8
  • CleanShot 2024-03-23 at 18.23.48@2x.png
    CleanShot 2024-03-23 at 18.23.48@2x.png
    167.7 KB · Views: 8
  • CleanShot 2024-03-23 at 18.18.55@2x.png
    CleanShot 2024-03-23 at 18.18.55@2x.png
    186.1 KB · Views: 9
my first idea is: your network is to slow

what Speed of NICs for the ceph network you have ?
did you run a network speed test ?
 
Thanks!

Yeah, I also think it's a network issue, but I wasn't sure. Should be 20G (2x10G), but I have to check this. Any idea how to test the network speed on the cluster network?