Changing the ceph public network

MoreDakka · May 2, 2019

Howdy,
First post here, I've tried to figured this out on my own but not working great for me.

I have a 3 node proxmox setup (r710s with quad 1Gb onboard NICs and add on 1 port 10Gb NIC eventually 2)
Only 1 Gb NIC is connected and the 10Gb NIC is connected to a 10Gb switch (eventually 2 separate switches)
Currently still learning and testing so breaking this isn't a big deal right now.

Each node has h700 controllers (which is a mistake, I'll replace them with h200 so can give the drives to ceph without using the dreaded R0 configuration).
There are 2 147Gb SAS in R1 for the Proxmox OS drives and added 2x3Tb NAS drives to test CEPH.

Public network 10.0.0.0/16
Storage network 10.1.1.0/24

Nodes - prox01, prox02, prox03

When I originally setup the CEPH cluster I used the IP for the 1Gb NIC as the public network and the 10Gb as the storage network. Where this all worked and the ceph cluster was up and running I did a speed test on a container and a VM built in Proxmox. These were stored on Ceph. The speeds were ok but not as fast as I would have expected, like maxing out a 1Gb link.

That's when I got to thinking that Proxmox is using the public network to connect to CEPH for it's VMs/CTs. So I attempted to change the public network to 10.1.1.x

I created the new monitors with the following commands:

cd /home
mkdir tmp
ceph auth get mon. -o tmp/key-ceph-prox01a
ceph mon getmap -o tmp/map-ceph-prox01a
ceph-mon -i prox01a --mkfs --monmap tmp/map-ceph-prox01a --keyring tmp/key-ceph-prox01a
chown ceph:ceph -Rf /var/lib/ceph/mon/ceph-prox01a/
ceph-mon -i prox01a --public-addr 10.1.1.1:6789

I did that on all three nodes. Once proxmox showed Quorum Yes I then removed the old monitors that were on the 10.0.0.0/16 network. Then I edited the ceph.conf file, removed the old monitors and added the new ones.

Since then CEPH hasn't connected properly. The cluster looks to be running properly as I can access the VMs/CTs that are stored on ceph but Proxmox does show anything other than "HEALTH_WARN - no active mgr"

Snip of error from log file:
mon.prox01a mon.0 10.1.1.1:6789/0 59228 : cluster [WRN] overall HEALTH_WARN no active mgr

Now that the back story is out of the way, 2 questions:

How can I change the system setup to use the 10Gb NICs for both OSDs and proxmox 'public' connectors?
If there is no way to alter this nicely, should I just destroy all the nodes and rebuild to save time?

Thanks!

MoreDakka · May 2, 2019

Well, looks like I lied a bit about being able to access the cephfs. Looks like the CTs/VMs are running good. When I tried to create a new container from the templates I have stored in cephfs, the templates are not there anymore. So I tried to browse to cephfs and see the content but I get "mount error: exit code 16 (500)".

mihanson · May 3, 2019

MoreDakka said:
Since then CEPH hasn't connected properly. The cluster looks to be running properly as I can access the VMs/CTs that are stored on ceph but Proxmox does show anything other than "HEALTH_WARN - no active mgr"

Snip of error from log file:
mon.prox01a mon.0 10.1.1.1:6789/0 59228 : cluster [WRN] overall HEALTH_WARN no active mgr

Have you tried to simply restart the Ceph Manager?

Code:

# systemctl restart ceph-mgr@prox01a

I am in a similar situation where I started with Ceph using the same network for public and cluster. Now, I want to change the public network, but I'm not completely clear how to get that accomplished. The information I have found is, for the most part, at least a year old (sometimes more) and with the new GUI improvements with Proxmox PVE 5.4 I'm wondering if there is an easier way.
Mike

MoreDakka · May 3, 2019

MoreDakka said:
Well, looks like I lied a bit about being able to access the cephfs. Looks like the CTs/VMs are running good. When I tried to create a new container from the templates I have stored in cephfs, the templates are not there anymore. So I tried to browse to cephfs and see the content but I get "mount error: exit code 16 (500)".

Thanks for the info however that comes back with nothing and here is the error code:

root@prox01:/home# systemctl restart ceph-mgr@prox01a
root@prox01:/home# systemctl status ceph-mgr@prox01a
● ceph-mgr@prox01a.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mgr@.service.d
└─ceph-after-pve-cluster.conf
Active: activating (auto-restart) (Result: exit-code) since Fri 2019-05-03 14:09:36 MDT; 7s ago
Process: 1138120 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id prox01a --setuser ceph --setgroup ceph (code=exited, status=254)
Main PID: 1138120 (code=exited, status=254)

May 03 14:09:36 prox01 systemd[1]: ceph-mgr@prox01a.service: Unit entered failed state.
May 03 14:09:36 prox01 systemd[1]: ceph-mgr@prox01a.service: Failed with result 'exit-code'.
root@prox01:/home#

Any ideas?

mihanson · May 4, 2019

Did you enable and start the new monitors via systemd?

Code:

# systemctl stop ceph-mon@<old-mon-id>
# systemctl disable ceph-mon@<old-mon-id>
# systemctl restart ceph-mon@<new-mon-id>
# systemctl enable ceph-mon@<new-mon-id>

You'd need to do this on each machine that hosts a monitor. Also note that Proxmox (v5.4 anyway) uses a symlink from

Code:

/etc/ceph/ceph.conf

that points to

Code:

/etc/pve/ceph.conf

.

Mike

MoreDakka · May 7, 2019

I gave up on this and just rebuilt the cluster.

Thanks!

Search

Search

Changing the ceph public network

MoreDakka

Active Member

MoreDakka

Active Member

mihanson

Well-Known Member

MoreDakka

Active Member

mihanson

Well-Known Member

MoreDakka

Active Member

We value your privacy