[SOLVED] Ceph Cluster vlan migration help requested

Pouch6867

New Member
May 18, 2024
2
0
1
EDIT: SOLVED! https://forum.proxmox.com/threads/ceph-cluster-vlan-migration-help-requested.177942/post-824528



So I've been trying to move the ceph network to it's own dedicated vlan11 from where I initially stood this up on vlan1. Every time I add in a monitor IP from vlan11 it doesn't work.

First some background: I was running ceph over the same 1G interfaces as my ve management IP (and the guest's vnic). Vlan11 was created on 2 core switches only. The firewall does not have this vlan nor is traffic sent from the switches to/from vlan11... there is no gateway. The switches are configured for access and were restricted to vlan11 (had to shift to vlan1 to get the monitors back up on the old addresses using the 10G ports so at least I think ceph is now using the dedicated interface, even if it's on the wrong vlan.

Currents:

Ceph Version = 19.2.3
PVE Version = 8.4.14
Repositories = pve-no-subscription

ceph.conf snippet:
Code:
cluster_network = 192.168.1.0/24,192.168.11.0/24
mon_host = 192.168.1.202 192.168.1.203 192.168.1.201
public_network = 192.168.1.0/24,192.168.11.0/24


[mon.HOST1]

    public_addr = 192.168.1.201


[mon.HOST2]

    public_addr = 192.168.1.202

[mon.HOST3]

    public_addr = 192.168.1.203

interfaces snippet (as applicable to each host):
Code:
iface bond2 inet static
    address 192.168.11.104/24
    bond-slaves enp131s0f1
    bond-miimon 100
    bond-mode active-backup
    bond-primary enp131s0f1
#Ceph (new) HOST1

iface bond2 inet static
    address 192.168.11.108/24
    bond-slaves enp131s0f1
    bond-miimon 100
    bond-mode active-backup
    bond-primary enp131s0f1
#Ceph (new) HOST2

iface bond2 inet static
    address 192.168.11.112/24
    bond-slaves enp131s0f1
    bond-miimon 100
    bond-mode active-backup
    bond-primary enp131s0f1
#Ceph (new) HOST3


Notes:

1. All relevant OSDs are pointed to their applicable cluster and public address for their host in ceph.conf to mitigate ceph traffic going over the 1G interface.
2. In the gui the Configuration Database shows only 192.168.1.0/24 for both cluster_network and public_network... I'm thinking this is why it's complaining every time I try to use a 192.168.11.0/24 address for a monitor.
3. I can ping the vlan11 addresses from all 3 hosts and from both switches without issue.
4. When I go to destroy/create a monitor it will show me a message stating that I need to use mon-address because it sees two addresses (i.e. HOST1 says it sees 192.168.1.104 and 192.168.1.201).

As I said above, I'm just trying to move ceph out of vlan1... without blowing away the data or going into another long night until 4am to get the monitors back up while feeling my stomach turn in fear that I screwed everything up.
 
Last edited:
This thread may help you

 
This thread may help you

Ok so I figured it out on my own....

First (and really the only thing that was stopping me from doing this)... ceph.conf does not update all the settings... in order to ACTUALLY change the cluster or public network, after you first set up your ceph cluster, you have to use the following commands:

ceph config set global public_network "X.X.X.X/Y, X.X.X.X/Y, X.X.X.X/Y"

ceph config set global cluster_network "X.X.X.X/Y, X.X.X.X/Y, X.X.X.X/Y"

Literally as soon as you do this everything just works as you would expect it to. Just make sure that you match this into your ceph.conf to prevent any potential mismatches.