Hi,
Short story:
I have 2 different versions of corosync configuration on my cluster and now "pvecm status" gives me this ugly error:
"Can't use an undefined value as a HASH reference at /usr/share/perl5/PVE/CLI/pvecm.pm line 479, <DATA> line 755."
My cluster is totally broken.
Long story:
I'm running Proxmox for some years now and, looking at /etc/pve/corosync.conf (troubleshooting a networking issue), I saw this:
This IP is the one of an old node in this cluster that I deleted long time ago.
To make things clean, I decided to update this to an IP that currently is part of the nodes (192.168.10.101) and to change the cluster_name from "pm5-cluster-01" to "cluster-01".
Spoiler alert: do not do this at home...
To do so, I did:
Hint: At this step, I didn't see the "Activity blocked".
I did this to 2 other nodes and it seemed to work well.
But...
On one node, my "mv" command to replace the corosync configuration didn't work and hanged.
I was unable to get my prompt back even with "ctrl+c" or anything else.
Now my cluster is totally broken on every node...
I tried to do this:
This is a production cluster and I really don't know what to do.
Could you please help?
Best regards
Short story:
I have 2 different versions of corosync configuration on my cluster and now "pvecm status" gives me this ugly error:
"Can't use an undefined value as a HASH reference at /usr/share/perl5/PVE/CLI/pvecm.pm line 479, <DATA> line 755."
My cluster is totally broken.
Long story:
I'm running Proxmox for some years now and, looking at /etc/pve/corosync.conf (troubleshooting a networking issue), I saw this:
Code:
totem {
cluster_name: pm5-cluster-01
config_version: 43
interface {
bindnetaddr: 192.168.10.21 <========== old node IP
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
This IP is the one of an old node in this cluster that I deleted long time ago.
To make things clean, I decided to update this to an IP that currently is part of the nodes (192.168.10.101) and to change the cluster_name from "pm5-cluster-01" to "cluster-01".
Spoiler alert: do not do this at home...
To do so, I did:
- service pve-cluster stop (on each of my 11 nodes)
- service pve-cluster start (on the node 192.168.10.101)
- cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
- Editied /etc/pve/corosync.conf.new to change bindnetaddr: 192.168.10.21 -> 192.168.10.101, cluster_name: pm5-cluster-01 -> cluster-01, config_version: 43 -> 44)
- mv /etc/pve/corosync.conf.new /etc/pve/corosync.conf
Code:
pm6-01:~# pvecm status
Cluster information
-------------------
Name: cluster-01 <==== NEW
Config Version: 44 <==== NEW
Transport: knet
Secure auth: on
...
Votequorum information
----------------------
Expected votes: 11
Highest expected: 11
Total votes: 1
Quorum: 6 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000004 1 192.168.10.101 (local)
Hint: At this step, I didn't see the "Activity blocked".
I did this to 2 other nodes and it seemed to work well.
But...
On one node, my "mv" command to replace the corosync configuration didn't work and hanged.
I was unable to get my prompt back even with "ctrl+c" or anything else.
Now my cluster is totally broken on every node...
Code:
# pvecm status
Can't use an undefined value as a HASH reference at /usr/share/perl5/PVE/CLI/pvecm.pm line 479, <DATA> line 755.
I tried to do this:
- service pveproxy stop
- service pvedaemon stop
- service corosync stop
- pvecm expected 1
- edit the /etc/pve/corosync.conf ==> this hangs (same with cp / mv)
- service pve-cluster stop
- and the start everything again
This is a production cluster and I really don't know what to do.
Could you please help?
Best regards