Ceph - Basic Question

sannsio

Active Member
Dec 8, 2015
27
2
43
Hi all,

we are running a small Ceph Cluster with 5 Nodes as a shared storage within our Proxmox Cluster. Currently we are running about 40 VM's and Containers. Everything works nicely. But I recognized that several VM's stopped working after one Node was shut down and some VM's started getting very slow.

From my understanding the Ceph cluster should continue working without interruptions. I thought a 5 Node Ceph Cluster could run properly even when 2 Nodes are getting down. But it seems that I get it wrong somehow. Can anybody give me a hint?

Thanks,
Sandra
 
The Ceph Nodes are connected with 1GB. Would it help to go on 10GB to make the recovery time shorter?
1GB and Ceph is not a good idea. We have here 20GB and this is working fine, VMcloning with about 1300MB/s with 3 nodes. But this is depending on your hddspeed. What disks you have build in per node and how many?
 
1GB and Ceph is not a good idea. We have here 20GB and this is working fine, VMcloning with about 1300MB/s with 3 nodes. But this is depending on your hddspeed. What disks you have build in per node and how many?

The OS and the Journal are running on a Samsung 850 Pro. Then I have 2 OSDs running on small 500 GB disks.
 
Hi,
I would say the 850 Pro is the problem (and perhaps an to high osd_max_backfills + osd_recovery_max_active).

Udo

Thanks Udo, I will check osd_max_backfills + osd_recovery_max_active.
What SSD do you recommend?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!