Ceph - Multiples OSD and POOLS

You need 3 mon and 3 managers minimum for everything to work transparently.

If you have 2 of each like you have now, there will be no majority quorum when its time to vote for a new mon/mgr leader.
They all need to be on the same network, this can be achieved with VPN if the third node is somewhere else.

tincvpn (allows multicasting packets and P2P VPN so if its connected to several end points the traffic always come through) or wireguard (full speed, but no p2p vpn and doubtful if multicast that coosync HA manager needs work).
 
You need 3 mon and 3 managers minimum for everything to work transparently.

If you have 2 of each like you have now, there will be no majority quorum when its time to vote for a new mon/mgr leader.
They all need to be on the same network, this can be achieved with VPN if the third node is somewhere else.

tincvpn (allows multicasting packets and P2P VPN so if its connected to several end points the traffic always come through) or wireguard (full speed, but no p2p vpn and doubtful if multicast that coosync HA manager needs work).

I fully understand.

Just to be very clear :
- All three nodes have cluster IP in network 192.168.7.0/24. This is where Proxmox cluster setup has been done.
- Two of three nodes have Ceph mon / initialization in network 192.168.10.0/24. This is non-switched network as there is a cross cable between two hosts. Nothing in the middle.

So, if I understand well, I must create another Ceph monitoring on the third node (makes) sense and it MUST also be in the same Ceph network, correct ? So for this, I have only 2 solutions;
- Adding a network switch with 10GB port and additional rj45 to usb port for my NUC (as it has only 1 network port)
- Creating VPN tunnel (even if machines are in the same Rack)

Is that correct ?

Thanks

EDIT : I guess you are refering to http://wiki.csnu.org/index.php/Clustering_proxmox_avec_tinc ?
 
Last edited:
The NUC (mon + mgr) is fine with one RJ45 jack, the 10gb ports are only used in case you need to put an OSD on the NUC.
 
The NUC (mon + mgr) is fine with one RJ45 jack, the 10gb ports are only used in case you need to put an OSD on the NUC.

Yeah but the thing is that both first nodes are connected togheter with a direct cable so it's impossible to put the NUC between... except with a switch :( So I definitively need a switch. I should have one tomorrow.

Many thanks !
 
- All three nodes have cluster IP in network 192.168.7.0/24. This is where Proxmox cluster setup has been done.
- Two of three nodes have Ceph mon / initialization in network 192.168.10.0/24.
This is non-switched network as there is a cross cable between two hosts. Nothing in the middle.

- Mon, mgr and proxmox https all should reside on network .10 - ie the PUBLIC network of 1gb.
- CLUSTER is your SAN where OSDs speak to each other, as well as ring1 in the corosync for HA migrations on .7 - ie the CLUSTER network of 10gb.

But getting a switch WILL make life much easier for you. I got one with 24x 1gb ports and 4x 10gbit. Wasnt cheap but nothing is when it comes to 10gb. Let me know if I should pull up the model nr for this.
 
- Mon, mgr and proxmox https all should reside on network .10 - ie the PUBLIC network of 1gb.
- CLUSTER is your SAN where OSDs speak to each other, as well as ring1 in the corosync for HA migrations on .7 - ie the CLUSTER network of 10gb.

But getting a switch WILL make life much easier for you. I got one with 24x 1gb ports and 4x 10gbit. Wasnt cheap but nothing is when it comes to 10gb. Let me know if I should pull up the model nr for this.

I have build the corosync cluster on .7 network. It's still a LAN / private network. .10 Network is exclusively used by Ceph setup.

I also think that having an additionnal switch will help. I choose a GS110MX which has 2x10GB and 8x1GB. What was the model you choose ?
 
You should switch it then as I mentioned above, its simple...
Change ceph.conf and do
Code:
systemctl stop ceph\*.service ceph\*.target
systemctl start ceph.target

TP-LINK T1700G 28TQ is what I have, eyeing the cheap Microtiks for SFP+ 10gbit 5 ports as the next buy to tie the hobby room with my house.
 
Hello Alexlup,

Currently I have :

[global]
...
cluster network = 192.168.10.0/24
...
public network = 192.168.10.0/24

[mon.host1]
host = host1
mon addr = 192.168.10.21:6789

[mon.host2]
host = host2
mon addr = 192.168.10.20:6789

So if I understand well, I may change public network to 192.168.7.0/24 (in fact this subnet is my LAN and only accessible for me) and also mon.hosts addresses to something still in 192.168.7 subnet and leave cluster network remains in 192.168.10.0/24 correct ?

By achieving that I would make monitoring also available from my NUC which is in 192.168.7.0/24 and can configure it as third ceph node ?

If that is right, how 2 first hosts will know to which IP they will talk together in storage subnet as I only specify the subnet it self (cluster network) ?

I hope my question is clear...

Thanks :)
 
- If .10 is the 1gb network accessible by all then that is your public network.
- The 10gb network .7 that is daisy chained is then your OSD/storage network.
They know how to find each other because *drumroll* of the mons. Mons act like a torrent tracker!

If that doesnt work you could ofc specify adressess but I'd not recommend it since the conf file gets so cluttery then..
 
- If .10 is the 1gb network accessible by all then that is your public network.
- The 10gb network .7 that is daisy chained is then your OSD/storage network.
They know how to find each other because *drumroll* of the mons. Mons act like a torrent tracker!

If that doesnt work you could ofc specify adressess but I'd not recommend it since the conf file gets so cluttery then..

Hello !

So, with such configuration I may not need additionnal switch then ? As 3 nodes will be able to talk on public network... Right ?

Cheers
 
Exactly, but a switch makes a world of difference when you add a fourth node! :)

A
 
Exactly, but a switch makes a world of difference when you add a fourth node! :)

A

This is nice.

I tried to modify the Ceph config but now everyhing is broken so I have to figure this out. But no more Ceph storage :) hopefully I only have a test machine on it.
 
To be sure, should this entry must be on PUBLIC or Storage LAN ?

[mon.host2]
host = host2
mon addr = 192.168.10.20:6789
 
Mons and MGR are always on the public net needs to be reachable by all

Ok but ceph is broken with this configuration :

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.10.0/24
fsid = 532deb0f-4b17-4343-9112-g26f78ce6125
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 192.168.7.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.host1]
host = host1
mon addr = 192.168.7.21:6789

[mon.host2]
host = host2
mon addr = 192.168.7.20:6789

Any obvious reason ?
 
First off, it doesnt seem like you have joined the NUC to the proxmox cluster ? Once joined, install ceph / ceph mon and ceph mgr according to the wiki on that NUC node and it will automagically sync with the other nodes - adding a mon entry in ceph.conf

Second off, remove host=host2 they are unnecessary. Also, host1 is .21 and host2 .20? My OCD just gave me shivers down the spine omg! :p

Third off, please explain what you mean by ceph is broken, run

ceph -w
crushmap
ceph osd tree
corosync.conf

and show us the output.

Also, once the NUC node joins change to the following so as not to expect a third node:
osd pool default min size = 1
osd pool default size = 2
and reboot all nodes

Lastly,
- If .10 is the 1gb network accessible by all then that is your public network.
- The 10gb network .7 that is daisy chained is then your OSD/cluster network.
You swapped them around so your 2 daisy chained nodes are not seeing the third unless you do vSwitch trickery and thats no easy task
 
I thank you a lot for your time a patience AlexLup.

I have started over the Ceph confirugation. I ran a "pveceph purge" and re-created the ceph cluster and initialized it on the PUBLIC network this time so I have now the three nodes. See the new configuration :

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 192.168.10.0/24
fsid = a389d639-f04f-4042-g154-85v3172af759
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 1
osd pool default size = 2
public network = 192.168.7.0/24

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mon.host3]
mon addr = 192.168.7.22:6789

[mon.host1]
mon addr = 192.168.7.20:6789

[mon.host2]
mon addr = 192.168.7.21:6789

I modified the "cluster network = 192.168.10.0/24" in /etc/ceph/ceph.conf just before posting this so now how can I be sure that new OSD I will create will use the cluster network I want to use for REPLICATION (192.168.10.) ?

Is there anything special to do ? I am trying to google that but didn't find my answer yet.

Cheers
 
If its defined in ceph.conf there is nothing you more you need to do. The monitors will tell the client where to find the OSDs!

Welcome to the ceph world - courtesy of the fine folks of Proxmox! Happy camping!

PS. Check out CephFS, its like samba but on all hosts! DS.
 
If its defined in ceph.conf there is nothing you more you need to do. The monitors will tell the client where to find the OSDs!

Welcome to the ceph world - courtesy of the fine folks of Proxmox! Happy camping!

PS. Check out CephFS, its like samba but on all hosts! DS.

Super ! Many many thanks again for all your answers. This was really helpful.
Now I will look into how to present network interfaces in different subnet without binding IP on each.

Cheers !
 
PS. Dont forget to do the corosync.conf totem magic to get faster HA speeds! Check the proxmox wiki! DS.

You talked about the totem magic, is that related to this KB ? https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network
Talking about "Separate After Cluster Creation" with this example ??

totem {
cluster_name: thomas-testcluster
config_version: 3
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 192.168.30.50
ringnumber: 0
}
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!