Changing the ceph public network

MoreDakka

Active Member
May 2, 2019
58
12
28
44
Howdy,
First post here, I've tried to figured this out on my own but not working great for me.

I have a 3 node proxmox setup (r710s with quad 1Gb onboard NICs and add on 1 port 10Gb NIC eventually 2)
Only 1 Gb NIC is connected and the 10Gb NIC is connected to a 10Gb switch (eventually 2 separate switches)
Currently still learning and testing so breaking this isn't a big deal right now.

Each node has h700 controllers (which is a mistake, I'll replace them with h200 so can give the drives to ceph without using the dreaded R0 configuration).
There are 2 147Gb SAS in R1 for the Proxmox OS drives and added 2x3Tb NAS drives to test CEPH.

Public network 10.0.0.0/16
Storage network 10.1.1.0/24

Nodes - prox01, prox02, prox03

When I originally setup the CEPH cluster I used the IP for the 1Gb NIC as the public network and the 10Gb as the storage network. Where this all worked and the ceph cluster was up and running I did a speed test on a container and a VM built in Proxmox. These were stored on Ceph. The speeds were ok but not as fast as I would have expected, like maxing out a 1Gb link.

That's when I got to thinking that Proxmox is using the public network to connect to CEPH for it's VMs/CTs. So I attempted to change the public network to 10.1.1.x

I created the new monitors with the following commands:

cd /home
mkdir tmp
ceph auth get mon. -o tmp/key-ceph-prox01a
ceph mon getmap -o tmp/map-ceph-prox01a
ceph-mon -i prox01a --mkfs --monmap tmp/map-ceph-prox01a --keyring tmp/key-ceph-prox01a
chown ceph:ceph -Rf /var/lib/ceph/mon/ceph-prox01a/
ceph-mon -i prox01a --public-addr 10.1.1.1:6789​

I did that on all three nodes. Once proxmox showed Quorum Yes I then removed the old monitors that were on the 10.0.0.0/16 network. Then I edited the ceph.conf file, removed the old monitors and added the new ones.

Since then CEPH hasn't connected properly. The cluster looks to be running properly as I can access the VMs/CTs that are stored on ceph but Proxmox does show anything other than "HEALTH_WARN - no active mgr"

Snip of error from log file:
mon.prox01a mon.0 10.1.1.1:6789/0 59228 : cluster [WRN] overall HEALTH_WARN no active mgr


Now that the back story is out of the way, 2 questions:

How can I change the system setup to use the 10Gb NICs for both OSDs and proxmox 'public' connectors?
If there is no way to alter this nicely, should I just destroy all the nodes and rebuild to save time?

Thanks!
 
Well, looks like I lied a bit about being able to access the cephfs. Looks like the CTs/VMs are running good. When I tried to create a new container from the templates I have stored in cephfs, the templates are not there anymore. So I tried to browse to cephfs and see the content but I get "mount error: exit code 16 (500)".
 
Since then CEPH hasn't connected properly. The cluster looks to be running properly as I can access the VMs/CTs that are stored on ceph but Proxmox does show anything other than "HEALTH_WARN - no active mgr"

Snip of error from log file:
mon.prox01a mon.0 10.1.1.1:6789/0 59228 : cluster [WRN] overall HEALTH_WARN no active mgr

Have you tried to simply restart the Ceph Manager?
Code:
# systemctl restart ceph-mgr@prox01a

I am in a similar situation where I started with Ceph using the same network for public and cluster. Now, I want to change the public network, but I'm not completely clear how to get that accomplished. The information I have found is, for the most part, at least a year old (sometimes more) and with the new GUI improvements with Proxmox PVE 5.4 I'm wondering if there is an easier way.
Mike
 
Well, looks like I lied a bit about being able to access the cephfs. Looks like the CTs/VMs are running good. When I tried to create a new container from the templates I have stored in cephfs, the templates are not there anymore. So I tried to browse to cephfs and see the content but I get "mount error: exit code 16 (500)".

Thanks for the info however that comes back with nothing and here is the error code:

root@prox01:/home# systemctl restart ceph-mgr@prox01a
root@prox01:/home# systemctl status ceph-mgr@prox01a
ceph-mgr@prox01a.service - Ceph cluster manager daemon
Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; disabled; vendor preset: enabled)
Drop-In: /lib/systemd/system/ceph-mgr@.service.d
└─ceph-after-pve-cluster.conf
Active: activating (auto-restart) (Result: exit-code) since Fri 2019-05-03 14:09:36 MDT; 7s ago
Process: 1138120 ExecStart=/usr/bin/ceph-mgr -f --cluster ${CLUSTER} --id prox01a --setuser ceph --setgroup ceph (code=exited, status=254)
Main PID: 1138120 (code=exited, status=254)

May 03 14:09:36 prox01 systemd[1]: ceph-mgr@prox01a.service: Unit entered failed state.
May 03 14:09:36 prox01 systemd[1]: ceph-mgr@prox01a.service: Failed with result 'exit-code'.
root@prox01:/home#​

Any ideas?
 
Did you enable and start the new monitors via systemd?
Code:
# systemctl stop ceph-mon@<old-mon-id>
# systemctl disable ceph-mon@<old-mon-id>
# systemctl restart ceph-mon@<new-mon-id>
# systemctl enable ceph-mon@<new-mon-id>

You'd need to do this on each machine that hosts a monitor. Also note that Proxmox (v5.4 anyway) uses a symlink from
Code:
/etc/ceph/ceph.conf
that points to
Code:
/etc/pve/ceph.conf
.

Mike
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!