I tried to join an existing cluster with a new node. The main IP of the nodes is different from the ip/subnet that the existing nodes should communicate on.
I forgot to specify the -link0 <ip> parameter and even though joined the cluster. It kept waiting on "waiting for quorum" so I stopped this proces in the hopes that installation was ready and I only had to change the ip (and version) in /etc/pve/corosync.conf.
Only to find out this didn't seem to work.
I already tried reboot but without luck. Also I'm a bit worried that the other nodes are very slow and GUI is not always working.
On the new node (hv03)
On an existing node in the cluster:
I forgot to specify the -link0 <ip> parameter and even though joined the cluster. It kept waiting on "waiting for quorum" so I stopped this proces in the hopes that installation was ready and I only had to change the ip (and version) in /etc/pve/corosync.conf.
Only to find out this didn't seem to work.
I already tried reboot but without luck. Also I'm a bit worried that the other nodes are very slow and GUI is not always working.
On the new node (hv03)
Code:
root@hv03:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: VRT18
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.1.71
}
node {
name: hv01
nodeid: 6
quorum_votes: 1
ring0_addr: 192.168.1.201
}
node {
name: hv02
nodeid: 7
quorum_votes: 1
ring0_addr: 192.168.1.202
}
node {
name: hv03
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.1.203
}
node {
name: vrt12
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.1.66
}
node {
name: vrt13
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.1.67
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: vrt
config_version: 19
interface {
bindnetaddr: 192.168.1.62
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
root@hv03:/etc# pvecm status
Cluster information
-------------------
Name: vrt
Config Version: 19
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue Jul 25 20:08:42 2023
Quorum provider: corosync_votequorum
Nodes: 1
Node ID: 0x00000001
Ring ID: 1.6d0
Quorate: No
Votequorum information
----------------------
Expected votes: 6
Highest expected: 6
Total votes: 1
Quorum: 4 Activity blocked
Flags:
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.1.203 (local)
root@hv03:~# service pvecm status
Unit pvecm.service could not be found.
root@hv03:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-07-25 20:50:58 CEST; 7min ago
Process: 42465 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 42466 (pmxcfs)
Tasks: 7 (limit: 618671)
Memory: 16.0M
CPU: 248ms
CGroup: /system.slice/pve-cluster.service
└─42466 /usr/bin/pmxcfs
Jul 25 20:50:57 hv03 systemd[1]: Starting The Proxmox VE cluster filesystem...
Jul 25 20:50:57 hv03 pmxcfs[42466]: [status] notice: update cluster info (cluster name vrt, version = 19)
Jul 25 20:50:58 hv03 systemd[1]: Started The Proxmox VE cluster filesystem.
Jul 25 20:51:02 hv03 pmxcfs[42466]: [dcdb] notice: members: 1/42466
Jul 25 20:51:02 hv03 pmxcfs[42466]: [dcdb] notice: all data is up to date
Jul 25 20:51:02 hv03 pmxcfs[42466]: [status] notice: members: 1/42466
Jul 25 20:51:02 hv03 pmxcfs[42466]: [status] notice: all data is up to date
root@hv03:~# service corosync status
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-07-25 20:50:52 CEST; 7min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 42357 (corosync)
Tasks: 9 (limit: 618671)
Memory: 138.5M
CPU: 7.993s
CGroup: /system.slice/corosync.service
└─42357 /usr/sbin/corosync -f
Jul 25 20:58:16 hv03 corosync[42357]: [QUORUM] Members[1]: 1
Jul 25 20:58:16 hv03 corosync[42357]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 25 20:58:22 hv03 corosync[42357]: [QUORUM] Sync members[1]: 1
Jul 25 20:58:22 hv03 corosync[42357]: [TOTEM ] A new membership (1.cda) was formed. Members
Jul 25 20:58:22 hv03 corosync[42357]: [QUORUM] Members[1]: 1
Jul 25 20:58:22 hv03 corosync[42357]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 25 20:58:29 hv03 corosync[42357]: [QUORUM] Sync members[1]: 1
Jul 25 20:58:29 hv03 corosync[42357]: [TOTEM ] A new membership (1.cde) was formed. Members
Jul 25 20:58:29 hv03 corosync[42357]: [QUORUM] Members[1]: 1
Jul 25 20:58:29 hv03 corosync[42357]: [MAIN ] Completed service synchronization, ready to provide service.
part of the syslog:
Jul 25 20:40:26 hv03 pveproxy[31928]: worker exit
Jul 25 20:40:26 hv03 pveproxy[31929]: worker exit
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 31928 finished
Jul 25 20:40:26 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 32012 started
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 31929 finished
Jul 25 20:40:26 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 32013 started
Jul 25 20:40:26 hv03 pveproxy[31930]: worker exit
Jul 25 20:40:26 hv03 pveproxy[32012]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:26 hv03 pveproxy[32013]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 31930 finished
Jul 25 20:40:26 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 32014 started
Jul 25 20:40:26 hv03 pveproxy[32014]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:28 hv03 pvestatd[4125]: authkey rotation error: cfs-lock 'authkey' error: pve cluster filesystem not online.
Jul 25 20:40:30 hv03 corosync[3726]: [QUORUM] Sync members[1]: 1
Jul 25 20:40:30 hv03 corosync[3726]: [TOTEM ] A new membership (1.af9) was formed. Members
Jul 25 20:40:30 hv03 corosync[3726]: [QUORUM] Members[1]: 1
Jul 25 20:40:30 hv03 corosync[3726]: [MAIN ] Completed service synchronization, ready to provide service.
Jul 25 20:40:30 hv03 pve-ha-lrm[4213]: unable to write lrm status file - unable to open file '/etc/pve/nodes/hv03/lrm_status.tmp.4213' - No such file or directory
Jul 25 20:40:31 hv03 pveproxy[32012]: worker exit
Jul 25 20:40:31 hv03 pveproxy[32013]: worker exit
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32012 finished
Jul 25 20:40:31 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32112 started
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32013 finished
Jul 25 20:40:31 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32113 started
Jul 25 20:40:31 hv03 pveproxy[32014]: worker exit
Jul 25 20:40:31 hv03 pveproxy[32112]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:31 hv03 pveproxy[32113]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32014 finished
Jul 25 20:40:31 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32114 started
Jul 25 20:40:31 hv03 pveproxy[32114]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
On an existing node in the cluster:
Code:
root@hv01:~# pvecm status
Cluster information
-------------------
Name: vrt
Config Version: 19
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Tue Jul 25 20:36:13 2023
Quorum provider: corosync_votequorum
Nodes: 5
Node ID: 0x00000006
Ring ID: 2.739
Quorate: Yes
Votequorum information
----------------------
Expected votes: 5
Highest expected: 5
Total votes: 5
Quorum: 3
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.1.66
0x00000003 1 192.168.1.71
0x00000004 1 192.168.1.67
0x00000006 1 192.168.1.201 (local)
0x00000007 1 192.168.1.202
root@hv01:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}
nodelist {
node {
name: VRT18
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.1.71
}
node {
name: hv01
nodeid: 6
quorum_votes: 1
ring0_addr: 192.168.1.201
}
node {
name: hv02
nodeid: 7
quorum_votes: 1
ring0_addr: 192.168.1.202
}
node {
name: hv03
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.1.203
}
node {
name: vrt12
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.1.66
}
node {
name: vrt13
nodeid: 4
quorum_votes: 1
ring0_addr: 192.168.1.67
}
}
quorum {
provider: corosync_votequorum
}
totem {
cluster_name: vrt
config_version: 19
interface {
bindnetaddr: 192.168.1.62
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}
root@hv01:~# service corosync status
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-07-25 20:11:21 CEST; 49min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 3534777 (corosync)
Tasks: 9 (limit: 629145)
Memory: 235.4M
CPU: 1min 920ms
CGroup: /system.slice/corosync.service
└─3534777 /usr/sbin/corosync -f
Jul 25 21:00:24 hv01 corosync[3534777]: [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:24 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d22) was formed. Members
Jul 25 21:00:31 hv01 corosync[3534777]: [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:31 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d26) was formed. Members
Jul 25 21:00:38 hv01 corosync[3534777]: [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:38 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d2a) was formed. Members
Jul 25 21:00:45 hv01 corosync[3534777]: [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:45 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d2e) was formed. Members
Jul 25 21:00:51 hv01 corosync[3534777]: [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:51 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d32) was formed. Members
root@hv01:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2023-04-02 20:36:38 CEST; 3 months 22 days ago
Main PID: 3702 (pmxcfs)
Tasks: 10 (limit: 629145)
Memory: 69.5M
CPU: 4h 5min 53.111s
CGroup: /system.slice/pve-cluster.service
└─3702 /usr/bin/pmxcfs
Jul 25 21:00:56 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 10
Jul 25 21:00:56 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 10
Jul 25 21:00:57 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 20
Jul 25 21:00:57 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 20
Jul 25 21:00:58 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 30
Jul 25 21:00:58 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 30
Jul 25 21:00:59 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 40
Jul 25 21:00:59 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 40
Jul 25 21:01:00 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 50
Jul 25 21:01:00 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 50
part of the syslog:
Jul 25 21:01:12 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d3e) was formed. Members
Jul 25 21:01:12 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 70
Jul 25 21:01:12 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 70
Jul 25 21:01:13 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 80
Jul 25 21:01:13 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 80
Jul 25 21:01:14 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 90
Jul 25 21:01:14 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 90
Jul 25 21:01:15 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 100
Jul 25 21:01:15 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retried 100 times
Jul 25 21:01:15 hv01 pmxcfs[3702]: [status] crit: cpg_send_message failed: 6
Jul 25 21:01:15 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 100
Jul 25 21:01:15 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retried 100 times
Jul 25 21:01:15 hv01 pmxcfs[3702]: [dcdb] crit: cpg_send_message failed: 6
Jul 25 21:01:15 hv01 pvescheduler[1438111]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jul 25 21:01:15 hv01 pve-firewall[3823]: firewall update time (200.246 seconds)
Jul 25 21:01:16 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 10
Jul 25 21:01:17 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 20
Jul 25 21:01:18 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 30
Jul 25 21:01:19 hv01 corosync[3534777]: [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:01:19 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d42) was formed. Members
Jul 25 21:01:19 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 40
Jul 25 21:01:20 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 50
Jul 25 21:01:21 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 60
Jul 25 21:01:22 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 70
Jul 25 21:01:23 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 80
Jul 25 21:01:24 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 90
Jul 25 21:01:25 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 100
Jul 25 21:01:25 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retried 100 times
Jul 25 21:01:25 hv01 pmxcfs[3702]: [status] crit: cpg_send_message failed: 6
Jul 25 21:01:25 hv01 corosync[3534777]: [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:01:25 hv01 corosync[3534777]: [TOTEM ] A new membership (2.d46) was formed. Members
Jul 25 21:01:25 hv01 pve-firewall[3823]: firewall update time (10.010 seconds)
Jul 25 21:01:27 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 10
Jul 25 21:01:28 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 20
Jul 25 21:01:29 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 30