[SOLVED] Moving one node to a different subnet

bly · Jan 27, 2025

Hi all,
I have to move a 4-nodes (1,2,3,4) cluster from subnet A to subnet B, so far what I did:

- added firewall rules to let A and B see each other fully (tested, ok) (rules will be removed after successful change)

on node 1:
edited network in the web interface of node 1 moving its ip from A to B
edited /etc/pve/corosync.conf changing IP of node 1 and added 1 to corosync_version
applied configuration
reattaching to the web interface with new ip address and asking a reboot of the node

After reboot, I can attach to node 1 web interface and I see the node up, but all other nodes are shown as offline.

but, if I navigate an "offline" node, I can open, as example, its shell

What I am still missing/overlooked at? Some service is restriced to only node's network?
TIA

edit:
I noticed, even after rebooting another node, if it tries to connect to node 1 it still uses the old IP.
Do I had missed to update some other config?

Moayad · Jan 27, 2025

Hi,

Maybe you forgot to change the /etc/hosts IP?

bly · Jan 27, 2025

I did set manually the hosts on all 4 nodes to be sure;

This is hosts of a good node (fuji2):

This is hosts of node 1 (rsthost2):

bly · Jan 27, 2025

And Ceph is ok:

Moayad · Jan 27, 2025

Have you restarted the `systemctl restart pveproxy.service pvestatd.service`?

Do you still can't SSH to the another node from the hostname even after you modified the /etc/hosts?

Could you please provide us with the syslog from the `rsthost2`?

bly · Jan 27, 2025

Moayad said:
Have you restarted the `systemctl restart pveproxy.service pvestatd.service`?

I need do that on a node of the "good" group? After setting hosts on all nodes I did reboot of rsthost2

Do you still can't SSH to the another node from the hostname even after you modified the /etc/hosts?

After reboot I can SSH other nodes from node 1, but it still see all of them as offline in cluster

Could you please provide us with the syslog from the `rsthost2`?

From rsthost2
Jan 27 15:53:09 rsthost2 pvescheduler[5403]: jobs: cfs-lock 'file-jobs_cfg' error: no quorum!
Jan 27 15:53:09 rsthost2 pvescheduler[5402]: replication: cfs-lock 'file-replication_cfg' error: no quorum!

From fuji2 node I see a load of:
Jan 27 15:57:16 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405

Moayad · Jan 27, 2025

May you provide us more log entries? And please post the output of `pvecm status`.

bly · Jan 27, 2025

ok!
On one of the quorate nodes I see this is repeating :

Jan 27 16:34:00 fuji2 corosync[1358]: [QUORUM] Sync members[3]: 1 3 4
Jan 27 16:34:00 fuji2 corosync[1358]: [TOTEM ] A new membership (1.f96) was formed. Members
Jan 27 16:34:00 fuji2 corosync[1358]: [QUORUM] Members[3]: 1 3 4
Jan 27 16:34:00 fuji2 corosync[1358]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 27 16:34:00 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:02 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:03 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:04 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:05 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:06 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:08 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:09 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:10 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:11 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:12 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405
Jan 27 16:34:14 fuji2 corosync[1358]: [KNET ] rx: Packet rejected from 192.168.1.22:5405

While on the new subnet node rsthost2 I see this repeating:

Jan 27 16:35:28 rsthost2 corosync[1090]: [QUORUM] Sync members[1]: 2
Jan 27 16:35:28 rsthost2 corosync[1090]: [TOTEM ] A new membership (2.fde) was formed. Members
Jan 27 16:35:28 rsthost2 corosync[1090]: [QUORUM] Members[1]: 2
Jan 27 16:35:28 rsthost2 corosync[1090]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 27 16:35:31 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 3226 ms
Jan 27 16:35:35 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 7527 ms
Jan 27 16:35:42 rsthost2 corosync[1090]: [QUORUM] Sync members[1]: 2
Jan 27 16:35:42 rsthost2 corosync[1090]: [TOTEM ] A new membership (2.fea) was formed. Members
Jan 27 16:35:42 rsthost2 corosync[1090]: [QUORUM] Members[1]: 2
Jan 27 16:35:42 rsthost2 corosync[1090]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 27 16:35:45 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 3226 ms
Jan 27 16:35:49 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 7527 ms
Jan 27 16:35:55 rsthost2 corosync[1090]: [QUORUM] Sync members[1]: 2
Jan 27 16:35:55 rsthost2 corosync[1090]: [TOTEM ] A new membership (2.ff6) was formed. Members
Jan 27 16:35:55 rsthost2 corosync[1090]: [QUORUM] Members[1]: 2
Jan 27 16:35:55 rsthost2 corosync[1090]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 27 16:35:59 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 3225 ms
Jan 27 16:36:03 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 7526 ms
Jan 27 16:36:09 rsthost2 corosync[1090]: [QUORUM] Sync members[1]: 2
Jan 27 16:36:09 rsthost2 corosync[1090]: [TOTEM ] A new membership (2.1002) was formed. Members
Jan 27 16:36:09 rsthost2 corosync[1090]: [QUORUM] Members[1]: 2
Jan 27 16:36:09 rsthost2 corosync[1090]: [MAIN ] Completed service synchronization, ready to provide service.
Jan 27 16:36:12 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 3226 ms
Jan 27 16:36:17 rsthost2 corosync[1090]: [TOTEM ] Token has not been received in 7527 ms
Jan 27 16:36:23 rsthost2 corosync[1090]: [QUORUM] Sync members[1]: 2
Jan 27 16:36:23 rsthost2 corosync[1090]: [TOTEM ] A new membership (2.100e) was formed. Members
Jan 27 16:36:23 rsthost2 corosync[1090]: [QUORUM] Members[1]: 2

pvecm status on rsthost2 :

root@rsthost2:~# pvecm status
Cluster information
-------------------
Name: restore
Config Version: 11
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Jan 27 16:38:05 2025
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000002
Ring ID: 2.1062
Quorate: No

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 1
Quorum: 3 Activity blocked
Flags:

Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.1.22 (local)
root@rsthost2:~#

pvecm status on fuji2:

root@fuji2:~# pvecm status
Cluster information
-------------------
Name: restore
Config Version: 13
Transport: knet
Secure auth: on

Quorum information
------------------
Date: Mon Jan 27 16:39:48 2025
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000004
Ring ID: 1.10c2
Quorate: Yes

Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 4
Quorum: 3
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.165.24
0x00000003 2 192.168.165.25
0x00000004 1 192.168.165.19 (local)
root@fuji2:~#

bly · Jan 27, 2025

if I want SSH using the web interface from fuji2 to rsthost2, it tries the OLD ip address: ssh: connect to host 192.168.165.22 port 22: No route to host

feels like the old ip is still lingering somewhere

Moayad · Jan 27, 2025

Thank you for the logs, could you please disable the firewall temporary to see if the issue related to the firewall config, especially for the Corosync traffic UDP ports 5404 and 5405.

bly · Jan 27, 2025

Ok, firewall disabled on all nodes. On the not quorate I had to force the change and reboot.

after rsthost2 reboot the traffic is still rejected, logs are the same

bly · Jan 27, 2025

did a netstat to see if the port is listening.

root@rsthost2:~# netstat -pln | grep 5405
udp 0 0 192.168.1.22:5405 0.0.0.0:* 1094/corosync

Moayad · Jan 27, 2025

Thank you for testing!

Could you please run the following command on the `rsthost2` node and provide us with the output?

Bash:

grep -r "192.168.165.22" /etc/

I would also check the the Corosync on which IP uses `rsthost2` node you can run `ss` as the following command:

Bash:

ss -tulpn | grep corosync

Additionally please provide us with the output of the following:

Bash:

cat /etc/pve/.members

bly · Jan 27, 2025

Here the results:

root@rsthost2:~# grep -r "192.168.165.22" /etc/
/etc/pve/priv/known_hosts:192.168.165.22 ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQCc7x8Dyy0mtB5KQiftSZHlUzIr/HgrFolsr96r6ClfUma96T7BIK21G4bX/lhZ3Wt3oIw4XCsQbU2CXVKb+rl0iPJWmH0hLqJQS3jgrMdGuLccWbHRNKW59t5UBAlBo1tWiy6LrqNCteg0m2JCWy/rFgm7+HW2mU6QCA9PS/WiZyABii13/QYB7iw1tqT1PDmMGH+3mnZNG35RvCCx6DHmf3jmEiUo5aAIsAct6grTovMTIiIKCHyaxC29V+q3x6i8GTzdLAxP5l/AZ85oUD4MD+Wn4Us94T6gMxOmGcwwWKkSJPwMw9SAh2EaSIAo+etLLwkJc+gMXSt7hTJe+HfVYqz0qJtbgJDSJpYrxz8G1Z5l97mGIUJTnaE1Mh6XcIclXCFC3sPaFFnKvnAL4xzbkCtyL9tT1jE4CmfnYWQFNZA2je4YSRk1pCxQiFNvrI9IzlkPoquYWcfUC+wkmrMJ/fFiWsCRVRvbR5oExenELywLdPOjNcOSXIRPo05CoQU=
root@rsthost2:~#

root@rsthost2:~# ss -tulpn | grep corosync
udp UNCONN 0 0 192.168.1.22:5405 0.0.0.0:* users("corosync",pid=1094,fd=28))

root@rsthost2:~# cat /etc/pve/.members
{
"nodename": "rsthost2",
"version": 3,
"cluster": { "name": "restore", "version": 11, "nodes": 4, "quorate": 0 },
"nodelist": {
"rsthost2": { "id": 2, "online": 1, "ip": "192.168.1.22"},
"fuji2": { "id": 4, "online": 0},
"rsthost4": { "id": 1, "online": 0},
"rsthost5": { "id": 3, "online": 0}
}
}
root@rsthost2:~#

bly · Jan 27, 2025

As side note, I moved out all VMs from the node before the changes, so unloading ceph and remove/rebuilding the node is not a problem.

bly · Jan 28, 2025

I decided to remove its OSD from ceph and destroy the node to recreate it in its supposed subnet, just to be sure I have an healthy node, cannot rule out if something gone wrong in that node,something else has gone wrong too and I still didn't noticed

If anything goes up I'll update the thread! Thanks a lot for the help so far!

Search

Search

[SOLVED] Moving one node to a different subnet

bly

Member

Moayad

Proxmox Staff Member

bly

Member

bly

Member

Moayad

Proxmox Staff Member

bly

Member

Moayad

Proxmox Staff Member

bly

Member

bly

Member

Moayad

Proxmox Staff Member

bly

Member

bly

Member

Moayad

Proxmox Staff Member

bly

Member

bly

Member

bly

Member

We value your privacy