7 Nodes? OSD Issue

Andrew Sutton

New Member
Jan 7, 2016
17
0
1
50
Hi,

The theoretical limit is 32 nodes for a CEPH HA cluster.

We have 7 nodes, and our OSD setting is 2 minimum, 3 maximum.

We can't add additional OSD drives beyond 7 nodes, but we can have more nodes.

What are we doing wrong?

Is this, in fact, 'wrong'?

Thanks.
 
when you are referring to 32-node theoretical limit you are talking Proxmox-Cluster limit of 32, right ?
As referenced here:
https://pve.proxmox.com/wiki/Proxmox_VE_4.x_Cluster
A Proxmox VE Cluster consists of several nodes (up to 32 physical nodes, probably more, dependent on network latency).
That sounds like a suggestion tho. Not sure whether this really means latency or actually bandwith. AFAIK its about Corosync limitations.

You do have 7 Proxmox nodes with pveceph running.
They do have configs ranging from 2 OSD to 3 OSD. so less then 21 OSD's ??

We can't add additional OSD drives beyond 7 nodes, but we can have more nodes.
Is that a error you are encountering ?

What are we doing wrong?

Is this, in fact, 'wrong'?


What is the error message you are receiving ? "how" are you adding those osd drives ?


For reference, our (work) biggest Proxmox/Ceph cluster we have in production right now is
30 Nodes spread over 3 Datacenters with 8 nvme (Samsung 950's) OSD and 48 HDD OSD each.

We in fact have not tested beyond 30 Nodes tho, as we have had no need for it yet.
 
Last edited:
In order:

Yes.
Yes.
Yes. 14 OSDs.
We can add OSD drives however they cannot be used as shared storage because of the limit of 7 monitors in Ceph.
We do not use SSDs, fwiw.

The issue isn't we can't have more OSDs, it's we can't shared storage for HA because each of the first 7 nodes requires a monitor per node.

Make sense?
 
it's we can't shared storage for HA because each of the first 7 nodes requires a monitor per node.
Make sense?

It doesn't.
I assume when you are talking "shared storage" you mean a pool that has been added to Proxmox -> Datacenter -> Storage

You do not need a Monitor per Node that hosts OSD. In fact you can have a single Node with a mon and no OSD's and all OSD's on non Mon nodes.


Not sure where you are hitting this "limitation", can you give a step by step process-example ?
 
Right but that isn't the problem. :)

He was having issues setting up more than 7 nodes due to a misconfiguration of monitors, now fixed.
 
limited around 32nodes because of corosync

That reads like a suggestion in the wiki and on corosync web-resources tho. It seems to be tied to the latency that 32 nodes generate on your network, which apparently can be "configured" to allow for higher latencies and by implication more nodes.

To bad i do not have some additional 20-ish non production nodes floating around, i'd really like to test this.
 
He was having issues setting up more than 7 nodes due to a misconfiguration of monitors, now fixed.


did he use wrong ip's or just not "know" you do not need a monitor per ceph node ? Kinda wondering how this manifested it self as a symptom. I am so curious, so i can put that on companies "good to know" list (new employee manual) of what not to do, as i personally never stumbled on this on my own.
 
That reads like a suggestion in the wiki and on corosync web-resources tho. It seems to be tied to the latency that 32 nodes generate on your network, which apparently can be "configured" to allow for higher latencies and by implication more nodes.

To bad i do not have some additional 20-ish non production nodes floating around, i'd really like to test this.

I'm sure that corosync support 64 nodes, but in the past (2013) this was an hardcoded limit. (corosync 1.X).

Now we use corosync2, so it can manage more nodes.
I known that corosync2 have dynamic token timeout feature, auto increasing with number of cluster nodes. So It should require less tuning than corosync1.


(Personnaly, I'm using big nodes on my cluster (384G ram/ 2x intel 3,1ghz 10cores) )


Just about ceph, if you need a lot of ceph nodes, I think it's better to not use proxmox to manage them. It's not too difficult to manage it command line + ceph-deploy
 
I understand this and we too use "big nodes" on our clusters (and we do run some "ceph only" nodes. ) but we do like to maintain the ability to utilize the spare cpu-capacity for VM's.

So i'm wondering if that 32 Proxmox nodes limit is really a hard-limit, or just a legacy limi, no one ever tried breaking.

As i said, i am about 20 physical test-nodes short to give this a try. Not sure if virtual PROXMOX-nodes will work for this test scenario.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!