Multiple ceph networks - 10G nics directly connected

handy · Apr 17, 2019

Hi there.

Has anyone used proxmox with directly connected 10G network cards, or with multiple subnets / networks for the ceph network?

We have a working ceph cluster, 5 nodes and had the dual 10G NICS all connected to a 1G switch, with them setup as slaves to a bond interface, which had the IP assigned (10.0.0.1/24 etc).

We had 3 monitors configured on 10.0.0.1, 10.0.0.2 and 10.0.0.3, or the IP of the bond0 on each of the first 3 nodes

We want to directly connect the dual 10G interfaces, and are using quagga and ospf to advertsie routes etc, and we are able to ping to each IP address on any of the hosts, with routes showing up in each nodes routing table for all of the 5x /30 networks.

We've removed monitor 2 and 3 and have left the 1 existing monitor, the monitor which was created when node 1 was using bond0 on 10.0.0.1/24.

I have edited ceph.conf directly and added all 5x networks, for public and cluster networks

[global]
auth client required = cephx
auth cluster required = cephx
auth service required = cephx
cluster network = 10.0.0.0/30,10.0.0.4/30,10.0.0.8/30,10.0.0.12/30,10.0.0.16/30
fsid = 953d5cc2-aa41-4c28-8382-2f2ba943
keyring = /etc/pve/priv/$cluster.$name.keyring
mon allow pool delete = true
osd journal size = 5120
osd pool default min size = 2
osd pool default size = 3
public network = 10.0.0.0/30,10.0.0.4/30,10.0.0.8/30,10.0.0.12/30,10.0.0.16/30
debug mon = 10
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring

[osd]
keyring = /var/lib/ceph/osd/ceph-$id/keyring

[mds.wms-oc-01]
host = wms-oc-01
mds standby for name = pve

[mon.wms-oc-01]
host = wms-oc-01
mon addr = 10.0.0.1:6789

The firewall has been turned off.

IP forward is enabled on the hosts and on the quagga router.

We are unable to add new monitors as we have no quorum, even after specifying the mon-address on one of the other nodes, it seems like the existing monitor is now un-reachable from the other nodes. Haven't tried removing the existing monitor yet, but I suspect proxmox won't let us.

Snippet of netstat;

tcp 0 0 10.0.0.1:6789 10.0.0.6:45796 ESTABLISHED
tcp 0 0 10.0.0.1:6789 10.0.0.2:59510 TIME_WAIT
tcp 0 0 10.0.0.1:6789 10.0.0.17:55994 TIME_WAIT
tcp 0 0 10.0.0.1:6789 10.0.0.10:46776 TIME_WAIT
tcp 0 0 10.0.0.1:6789 10.0.0.2:59432 TIME_WAIT

/var/log/ceph/ceph...log

2019-04-17 09:39:18.122357 7f7294af2100 4 rocksdb: [/home/builder/source/ceph-12.2.11/src/rocksdb/db/db_impl.cc:343] Shutdown complete
2019-04-17 09:39:18.122374 7f7294af2100 -1 rocksdb: IO error: lock /var/lib/ceph/mon/ceph-wms-oc-01/store.db/LOCK: Resource temporarily unavailable
2019-04-17 09:39:18.122377 7f7294af2100 -1 error opening mon data directory at '/var/lib/ceph/mon/ceph-hostname': (22) Invalid argument

We have a lot of these now in the messages

Apr 17 13:33:27 hostname kernel: [ 5334.428085] libceph: mon0 10.0.0.1:6789 socket closed (con state OPEN)
Apr 17 13:33:32 hostname kernel: [ 5339.428808] libceph: mon0 10.0.0.1:6789 socket closed (con state OPEN)
Apr 17 13:33:37 hostname kernel: [ 5344.429202] libceph: mon0 10.0.0.1:6789 socket closed (con state OPEN)

Has anyone had any experience with this sort of setup?

Many Thanks
Andy

wolfgang · Apr 17, 2019

Hi,

I would make one subnet.
Then do a host routing with two entries.
On entry with the high prio in the short direction and the other with low prio to the long direction.

handy · Apr 23, 2019

Hi, just an update on this, it does actually work, and very well at this point

The issue we had was that the one remaining monitor thought it had other monitor friends, when we had deleted them during the IP switcharound... .

Once we had removed config related to the other monitors, ceph was happy and we could talk to it again.

We created bond interfaces on each node with one of each nodes eventual IP address ie, bond0 on node 2 was given 10.0.0.6/24 etc, then added a monitor for each bond interface.

Once this was done we deleted the bond0 interfaces and assigned the physical interfaces with the IP scheme above, using /30's. Re-installed quagga and ospfd, enabled ipv4_forward ing on each host and in the quagga router config, rebooted, and the cluster is now happy.

iperf's are 9.4Gb/s between all nodes to each 10G interface, with 8% CPU usage on the gruntier nodes and 17% on the older nodes during the tests.

Config for quagga is below if anyone else is interested

interface enp8s0f0
ip address 10.0.0.1/30
!
interface enp8s0f1
ip address 10.0.0.5/30
!
interface lo
link-detect
ip address 10.200.1.1/32
!
interface vmbr0
!
ip forwarding

Individual lo IP's for each node.

Andy

Search

Search

Multiple ceph networks - 10G nics directly connected

handy

New Member

wolfgang

Proxmox Retired Staff

handy

New Member