Ceph - Basic Question

sannsio

Active Member
Dec 8, 2015
27
2
43
Hi all,

we are running a small Ceph Cluster with 5 Nodes as a shared storage within our Proxmox Cluster. Currently we are running about 40 VM's and Containers. Everything works nicely. But I recognized that several VM's stopped working after one Node was shut down and some VM's started getting very slow.

From my understanding the Ceph cluster should continue working without interruptions. I thought a 5 Node Ceph Cluster could run properly even when 2 Nodes are getting down. But it seems that I get it wrong somehow. Can anybody give me a hint?

Thanks,
Sandra
 
The Ceph Nodes are connected with 1GB. Would it help to go on 10GB to make the recovery time shorter?
1GB and Ceph is not a good idea. We have here 20GB and this is working fine, VMcloning with about 1300MB/s with 3 nodes. But this is depending on your hddspeed. What disks you have build in per node and how many?
 
1GB and Ceph is not a good idea. We have here 20GB and this is working fine, VMcloning with about 1300MB/s with 3 nodes. But this is depending on your hddspeed. What disks you have build in per node and how many?

The OS and the Journal are running on a Samsung 850 Pro. Then I have 2 OSDs running on small 500 GB disks.
 
Hi,
I would say the 850 Pro is the problem (and perhaps an to high osd_max_backfills + osd_recovery_max_active).

Udo

Thanks Udo, I will check osd_max_backfills + osd_recovery_max_active.
What SSD do you recommend?