[SOLVED] Moving cluster operation traffic to another network

kobuki

Renowned Member
Dec 30, 2008
475
32
93
I made a mistake when I created a 5-node cluster. I added all nodes by IP to the cluster and `pvecm status` now shows:

Code:
Cluster information
-------------------
Name:             cluster2
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Mon Sep 25 14:42:15 2023
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000001
Ring ID:          1.4d
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 10.15.15.11 (local)
0x00000002          1 10.15.15.12
0x00000003          1 10.15.15.13
0x00000004          1 10.15.15.15
0x00000005          1 10.15.15.16

The cluster is operational, the problem is that the IPs on the subnet 10.15.15.0/24 don't actually point to the host names of the nodes. They point to a different host name assigned to each host. This is because I want to separate the cluster traffic and the admin network traffic - they are on different physical interfaces.

The IPs and host names look like the table below.

Host names for the cluster/corosync IPs:
Code:
pve1c.whatever.local -> 10.15.15.11
pve2c.whatever.local -> 10.15.15.12
etc.

"Real" host names for the admin access network (10.15.2.0/24):
Code:
pve1.whatever.local -> 10.15.2.11
pve2.whatever.local -> 10.15.2.12
etc.

The latter is what `hostname -f` shows on the nodes. Consequently, Proxmox uses this hostname and the associated 10.15.2.0/24 network for any cluster operation aside from corosync, but this is not good, as this is not intended for cluster operations like migrations, replication, etc., and is a gigabit network instead of the 10G network for the cluster.

Now, my question is: can I simply change the host names so that pveX.whatever.local (hostnames without the trailing 'c' in their name) point to the cluster IPs as shown in `pvecm status`? Would that cause any issues? There is also a CEPH cluster configured on the nodes on the correct networks dedicated to it, but CEPH has its own separate network setting, so I don't hink it would be affected. Production VMs are already running on the cluster, unfortunately. Worst case I could schedule a full downtime.
 
  • Like
Reactions: bosswaffle
Well, for whomever finds this thread among the other similar ones, the issue was solved by fixing /etc/hosts to align the host names with the desired subnet, and then restarting pve-cluster.service and then corosync.service. Replication and other heavy traffic is flowing on the network it was destined to. NB it also fixed the IPs in /etc/pve/.members, too.
 
  • Like
Reactions: bosswaffle