Hello,
We have a cluster without HA of 6 Proxmox servers with backend DRBD9 storage and version 4.4-13 in production that consists of 20 VMs and 5 LXC. On bridge vmbr0 we have our public ip addresses onto which we built the cluster. So the cluster was available on the public addresses.
After some moment, we decided to switch the cluster from public ip addresses (41.213.15.x) to a private address that will be accessible internally through vpn. We created another bridge vmbr1 on each server where we installed OVSbridge with vlan. The second bridge will be used as private ip address (10.146.10.x.). We modified /etc/hosts and /etc/network/interfaces files accordingly but we forgot to change the ip in corosync configuration file which is still pointing to an old ip address as per below.
As per corosync configuration file, the cluster is bind on ip address 41.213.15.10 that is not in use anymore. A server (node2 with ip address 41.213.15.7) from the cluster was rebooted and now it cannot join the cluster.
On this node ie node2
Syslog says:
But on the other side, the all the rest of proxmox servers are still working in the cluster. We will not reboot them for fear of breaking the cluster. If we reboot, We knw there will be a problem.
How is that possible? Is there any cache or written configuration somewhere?
On all other nodes, the command pvecm status gives us the following:
We are planning to modify the /etc/hosts file correctly as per below in all the proxmox servers
10.146.10.1 node1.cluster.local node1 pvelocalhost
10.146.10.2 node2.cluster.local node2
10.146.10.3 node3.cluster.local node3
10.146.10.4 node4.cluster.local node4
10.146.10.5 node5.cluster.local node5
10.146.10.6 node6.cluster.local node6
Then modify corosync config file for bindinetaddr to 10.150.65.1 and increment its config file version.
We have 25 VM running here on production and we would not like to break the whole cluster.
Is there other configuration to do to make sure that the cluster will be working without any problem?
Thanks for your help
Best regards
Shafeek
We have a cluster without HA of 6 Proxmox servers with backend DRBD9 storage and version 4.4-13 in production that consists of 20 VMs and 5 LXC. On bridge vmbr0 we have our public ip addresses onto which we built the cluster. So the cluster was available on the public addresses.
After some moment, we decided to switch the cluster from public ip addresses (41.213.15.x) to a private address that will be accessible internally through vpn. We created another bridge vmbr1 on each server where we installed OVSbridge with vlan. The second bridge will be used as private ip address (10.146.10.x.). We modified /etc/hosts and /etc/network/interfaces files accordingly but we forgot to change the ip in corosync configuration file which is still pointing to an old ip address as per below.
totem {
cluster_name: cluster-run
config_version: 6
ip_version: ipv4
secauth: on
version: 2
interface {
bindnetaddr: 41.213.15.10
ringnumber: 0
}
}
As per corosync configuration file, the cluster is bind on ip address 41.213.15.10 that is not in use anymore. A server (node2 with ip address 41.213.15.7) from the cluster was rebooted and now it cannot join the cluster.
On this node ie node2
root@node2:~# pvecm status
Cannot initialize CMAP service
Syslog says:
node2 pmxcfs[26779]: [quorum] crit: quorum_initialize failed: 2
node2 pmxcfs[26779]: [confdb] crit: cmap_initialize failed: 2
node2 pmxcfs[26779]: [dcdb] crit: cpg_initialize failed: 2
node2 pmxcfs[26779]: [status] crit: cpg_initialize failed: 2
But on the other side, the all the rest of proxmox servers are still working in the cluster. We will not reboot them for fear of breaking the cluster. If we reboot, We knw there will be a problem.
How is that possible? Is there any cache or written configuration somewhere?
On all other nodes, the command pvecm status gives us the following:
Quorum information
------------------
Date: Tue Jan 30 13:21:20 2018
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000004
Ring ID: 1/13628
Quorate: Yes
Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 5
Quorum: 4
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 41.213.15.6
0x00000003 1 41.213.15.8
0x00000004 1 41.213.15.74 (local)
0x00000005 1 41.213.15.75
0x00000006 1 41.213.15.76
We are planning to modify the /etc/hosts file correctly as per below in all the proxmox servers
10.146.10.1 node1.cluster.local node1 pvelocalhost
10.146.10.2 node2.cluster.local node2
10.146.10.3 node3.cluster.local node3
10.146.10.4 node4.cluster.local node4
10.146.10.5 node5.cluster.local node5
10.146.10.6 node6.cluster.local node6
Then modify corosync config file for bindinetaddr to 10.150.65.1 and increment its config file version.
We have 25 VM running here on production and we would not like to break the whole cluster.
Is there other configuration to do to make sure that the cluster will be working without any problem?
Thanks for your help
Best regards
Shafeek