Ceph - Basic Question

sannsio · May 29, 2017

Hi all,

we are running a small Ceph Cluster with 5 Nodes as a shared storage within our Proxmox Cluster. Currently we are running about 40 VM's and Containers. Everything works nicely. But I recognized that several VM's stopped working after one Node was shut down and some VM's started getting very slow.

From my understanding the Ceph cluster should continue working without interruptions. I thought a 5 Node Ceph Cluster could run properly even when 2 Nodes are getting down. But it seems that I get it wrong somehow. Can anybody give me a hint?

Thanks,
Sandra

fireon · May 29, 2017

Have you set the "noout" Flag https://pve.proxmox.com/wiki/Ceph_Server#noout And you moved all VM's from the host do another before shutdown or restart? Because Fencing and HA.

sannsio · May 29, 2017

fireon said:
Have you set the "noout" Flag https://pve.proxmox.com/wiki/Ceph_Server#noout And you moved all VM's from the host do another before shutdown or restart? Because Fencing and HA.

Thanks, no, I was not aware of the "noout" option. This is very helpfull, thanks a lot!!

The Ceph Nodes are connected with 1GB. Would it help to go on 10GB to make the recovery time shorter?

Best,
Sandra

fireon · May 29, 2017

sannsio said:
The Ceph Nodes are connected with 1GB. Would it help to go on 10GB to make the recovery time shorter?

1GB and Ceph is not a good idea. We have here 20GB and this is working fine, VMcloning with about 1300MB/s with 3 nodes. But this is depending on your hddspeed. What disks you have build in per node and how many?

sannsio · May 29, 2017

fireon said:
1GB and Ceph is not a good idea. We have here 20GB and this is working fine, VMcloning with about 1300MB/s with 3 nodes. But this is depending on your hddspeed. What disks you have build in per node and how many?

The OS and the Journal are running on a Samsung 850 Pro. Then I have 2 OSDs running on small 500 GB disks.

udo · May 30, 2017

sannsio said:
The OS and the Journal are running on a Samsung 850 Pro. Then I have 2 OSDs running on small 500 GB disks.

Hi,
I would say the 850 Pro is the problem (and perhaps an to high osd_max_backfills + osd_recovery_max_active).

Udo

sannsio · May 30, 2017

udo said:
Hi,
I would say the 850 Pro is the problem (and perhaps an to high osd_max_backfills + osd_recovery_max_active).

Udo

Thanks Udo, I will check osd_max_backfills + osd_recovery_max_active.
What SSD do you recommend?

Search

Search

Ceph - Basic Question

sannsio

Active Member

fireon

Distinguished Member

sannsio

Active Member

fireon

Distinguished Member

sannsio

Active Member

udo

Distinguished Member

sannsio

Active Member

We value your privacy