Dears,
big mess in Chinatown after changing IP in one node of a 2 nodes cluster, cluster broken.
Cluster: node "faxmox" (the one where I changed IP) + node "famoxout"
A bit of context:
- amended corosync.conf files as per new IP in both nodes
- after some unsuccessful attempts, copied and pasted corosync dirs and conf files from IP-untouched node (faxmoxout) to IP-modified node (faxmox)
- network-wise servers communicate
- on the IP-untouched node (faxmoxout), oddly enough, UI shell (No VNC) is not working ("code 1006"). VMs won't start for no quorum. SSH works well
*) node "faxmox" (192.168.70.2)
systemctl status corosync.service
pvecm status
/etc/pve/corosync.conf
*) node "faxmoxout" (192.168.60.6)
systemctl status corosync.service:
pvecm status
After several hours of failures, trying to finds some solace here among experts!
How can I make nodes speak together again and restore cluster?
Thanks in advance!
cheers
big mess in Chinatown after changing IP in one node of a 2 nodes cluster, cluster broken.
Cluster: node "faxmox" (the one where I changed IP) + node "famoxout"
A bit of context:
- amended corosync.conf files as per new IP in both nodes
- after some unsuccessful attempts, copied and pasted corosync dirs and conf files from IP-untouched node (faxmoxout) to IP-modified node (faxmox)
- network-wise servers communicate
- on the IP-untouched node (faxmoxout), oddly enough, UI shell (No VNC) is not working ("code 1006"). VMs won't start for no quorum. SSH works well
*) node "faxmox" (192.168.70.2)
systemctl status corosync.service
Code:
systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2023-06-04 00:52:49 CEST; 7min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1035 (corosync)
Tasks: 9 (limit: 9298)
Memory: 131.6M
CPU: 2.939s
CGroup: /system.slice/corosync.service
└─1035 /usr/sbin/corosync -f
Jun 04 00:55:34 faxmox corosync[1035]: [QUORUM] Sync joined[1]: 2
Jun 04 00:55:34 faxmox corosync[1035]: [TOTEM ] A new membership (1.23e) was formed. Members joined: 2
Jun 04 00:55:34 faxmox corosync[1035]: [QUORUM] Sync members[1]: 1
Jun 04 00:55:34 faxmox corosync[1035]: [QUORUM] Sync left[1]: 2
Jun 04 00:55:34 faxmox corosync[1035]: [TOTEM ] A new membership (1.242) was formed. Members left: 2
Jun 04 00:55:34 faxmox corosync[1035]: [QUORUM] Members[1]: 1
Jun 04 00:55:34 faxmox corosync[1035]: [MAIN ] Completed service synchronization, ready to provide service.
Jun 04 00:55:35 faxmox corosync[1035]: [KNET ] link: host: 2 link: 0 is down
Jun 04 00:55:35 faxmox corosync[1035]: [KNET ] host: host: 2 (passive) best link: 0 (pri: 1)
Jun 04 00:55:35 faxmox corosync[1035]: [KNET ] host: host: 2 has no active links
pvecm status
Code:
pvecm status
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Cluster information
-------------------
Name: FaxProxCluster1
Config Version: 6
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Sun Jun 4 01:04:31 2023
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.242
Quorate: No
Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 1
Quorum: 2 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.70.2 (local)
/etc/pve/corosync.conf
Code:
root@faxmox:~# more /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: FaxmoxOUT
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.60.6
}
node {
name: faxmox
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.70.2
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: FaxProxCluster1
config_version: 6
interface {
linknumber: 0
}
ip_version: ipv4-6
link_mode: passive
secauth: on
version: 2
}
*) node "faxmoxout" (192.168.60.6)
systemctl status corosync.service:
Code:
root@FaxmoxOUT:~# systemctl status corosync.service
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Sun 2023-06-04 00:55:35 CEST; 12min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Process: 920 ExecStart=/usr/sbin/corosync -f $COROSYNC_OPTIONS (code=exited, status=0/SUCCESS)
Process: 987 ExecStop=/usr/sbin/corosync-cfgtool -H --force (code=exited, status=1/FAILURE)
Main PID: 920 (code=exited, status=0/SUCCESS)
CPU: 115ms
Jun 04 00:55:34 FaxmoxOUT corosync[920]: [QB ] withdrawing server sockets
Jun 04 00:55:34 FaxmoxOUT corosync[920]: [SERV ] Service engine unloaded: corosync cluster quorum service v0.1
Jun 04 00:55:34 FaxmoxOUT corosync[920]: [SERV ] Service engine unloaded: corosync profile loading service
Jun 04 00:55:34 FaxmoxOUT corosync[920]: [SERV ] Service engine unloaded: corosync resource monitoring service
Jun 04 00:55:34 FaxmoxOUT corosync[920]: [SERV ] Service engine unloaded: corosync watchdog service
Jun 04 00:55:35 FaxmoxOUT corosync[920]: [KNET ] link: Resetting MTU for link 0 because host 1 joined
Jun 04 00:55:35 FaxmoxOUT corosync[920]: [KNET ] link: Resetting MTU for link 0 because host 2 joined
Jun 04 00:55:35 FaxmoxOUT corosync[920]: [MAIN ] Corosync Cluster Engine exiting normally
Jun 04 00:55:35 FaxmoxOUT systemd[1]: corosync.service: Control process exited, code=exited, status=1/FAILURE
Jun 04 00:55:35 FaxmoxOUT systemd[1]: corosync.service: Failed with result 'exit-code'.
pvecm status
Code:
root@FaxmoxOUT:~# pvecm status
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = (unset),
LC_ALL = (unset),
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to a fallback locale ("en_US.UTF-8").
Cluster information
-------------------
Name: FaxProxCluster1
Config Version: 2
Transport: knet
Secure auth: on
Cannot initialize CMAP service
After several hours of failures, trying to finds some solace here among experts!
How can I make nodes speak together again and restore cluster?
Thanks in advance!
cheers