Join cluster failed, how to solve

vmswtje

Member
Jun 29, 2020
12
0
6
34
I tried to join an existing cluster with a new node. The main IP of the nodes is different from the ip/subnet that the existing nodes should communicate on.
I forgot to specify the -link0 <ip> parameter and even though joined the cluster. It kept waiting on "waiting for quorum" so I stopped this proces in the hopes that installation was ready and I only had to change the ip (and version) in /etc/pve/corosync.conf.

Only to find out this didn't seem to work.

I already tried reboot but without luck. Also I'm a bit worried that the other nodes are very slow and GUI is not always working.

On the new node (hv03)
Code:
root@hv03:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: VRT18
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.1.71
  }
  node {
    name: hv01
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 192.168.1.201
  }
  node {
    name: hv02
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 192.168.1.202
  }
  node {
    name: hv03
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.203
  }
  node {
    name: vrt12
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.66
  }
  node {
    name: vrt13
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.1.67
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: vrt
  config_version: 19
  interface {
    bindnetaddr: 192.168.1.62
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}
root@hv03:/etc# pvecm status
Cluster information
-------------------
Name:             vrt
Config Version:   19
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jul 25 20:08:42 2023
Quorum provider:  corosync_votequorum
Nodes:            1
Node ID:          0x00000001
Ring ID:          1.6d0
Quorate:          No

Votequorum information
----------------------
Expected votes:   6
Highest expected: 6
Total votes:      1
Quorum:           4 Activity blocked
Flags:

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.203 (local)
root@hv03:~# service pvecm status
Unit pvecm.service could not be found.
root@hv03:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-07-25 20:50:58 CEST; 7min ago
    Process: 42465 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 42466 (pmxcfs)
      Tasks: 7 (limit: 618671)
     Memory: 16.0M
        CPU: 248ms
     CGroup: /system.slice/pve-cluster.service
             └─42466 /usr/bin/pmxcfs

Jul 25 20:50:57 hv03 systemd[1]: Starting The Proxmox VE cluster filesystem...
Jul 25 20:50:57 hv03 pmxcfs[42466]: [status] notice: update cluster info (cluster name  vrt, version = 19)
Jul 25 20:50:58 hv03 systemd[1]: Started The Proxmox VE cluster filesystem.
Jul 25 20:51:02 hv03 pmxcfs[42466]: [dcdb] notice: members: 1/42466
Jul 25 20:51:02 hv03 pmxcfs[42466]: [dcdb] notice: all data is up to date
Jul 25 20:51:02 hv03 pmxcfs[42466]: [status] notice: members: 1/42466
Jul 25 20:51:02 hv03 pmxcfs[42466]: [status] notice: all data is up to date
root@hv03:~# service corosync status
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-07-25 20:50:52 CEST; 7min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 42357 (corosync)
      Tasks: 9 (limit: 618671)
     Memory: 138.5M
        CPU: 7.993s
     CGroup: /system.slice/corosync.service
             └─42357 /usr/sbin/corosync -f


Jul 25 20:58:16 hv03 corosync[42357]:   [QUORUM] Members[1]: 1
Jul 25 20:58:16 hv03 corosync[42357]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 25 20:58:22 hv03 corosync[42357]:   [QUORUM] Sync members[1]: 1
Jul 25 20:58:22 hv03 corosync[42357]:   [TOTEM ] A new membership (1.cda) was formed. Members
Jul 25 20:58:22 hv03 corosync[42357]:   [QUORUM] Members[1]: 1
Jul 25 20:58:22 hv03 corosync[42357]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 25 20:58:29 hv03 corosync[42357]:   [QUORUM] Sync members[1]: 1
Jul 25 20:58:29 hv03 corosync[42357]:   [TOTEM ] A new membership (1.cde) was formed. Members
Jul 25 20:58:29 hv03 corosync[42357]:   [QUORUM] Members[1]: 1
Jul 25 20:58:29 hv03 corosync[42357]:   [MAIN  ] Completed service synchronization, ready to provide service.

part of the syslog:
Jul 25 20:40:26 hv03 pveproxy[31928]: worker exit
Jul 25 20:40:26 hv03 pveproxy[31929]: worker exit
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 31928 finished
Jul 25 20:40:26 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 32012 started
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 31929 finished
Jul 25 20:40:26 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 32013 started
Jul 25 20:40:26 hv03 pveproxy[31930]: worker exit
Jul 25 20:40:26 hv03 pveproxy[32012]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:26 hv03 pveproxy[32013]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 31930 finished
Jul 25 20:40:26 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:26 hv03 pveproxy[4205]: worker 32014 started
Jul 25 20:40:26 hv03 pveproxy[32014]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:28 hv03 pvestatd[4125]: authkey rotation error: cfs-lock 'authkey' error: pve cluster filesystem not online.
Jul 25 20:40:30 hv03 corosync[3726]:   [QUORUM] Sync members[1]: 1
Jul 25 20:40:30 hv03 corosync[3726]:   [TOTEM ] A new membership (1.af9) was formed. Members
Jul 25 20:40:30 hv03 corosync[3726]:   [QUORUM] Members[1]: 1
Jul 25 20:40:30 hv03 corosync[3726]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 25 20:40:30 hv03 pve-ha-lrm[4213]: unable to write lrm status file - unable to open file '/etc/pve/nodes/hv03/lrm_status.tmp.4213' - No such file or directory
Jul 25 20:40:31 hv03 pveproxy[32012]: worker exit
Jul 25 20:40:31 hv03 pveproxy[32013]: worker exit
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32012 finished
Jul 25 20:40:31 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32112 started
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32013 finished
Jul 25 20:40:31 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32113 started
Jul 25 20:40:31 hv03 pveproxy[32014]: worker exit
Jul 25 20:40:31 hv03 pveproxy[32112]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:31 hv03 pveproxy[32113]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32014 finished
Jul 25 20:40:31 hv03 pveproxy[4205]: starting 1 worker(s)
Jul 25 20:40:31 hv03 pveproxy[4205]: worker 32114 started
Jul 25 20:40:31 hv03 pveproxy[32114]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1996.

On an existing node in the cluster:
Code:
root@hv01:~# pvecm status
Cluster information
-------------------
Name:             vrt
Config Version:   19
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Tue Jul 25 20:36:13 2023
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000006
Ring ID:          2.739
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000002          1 192.168.1.66
0x00000003          1 192.168.1.71
0x00000004          1 192.168.1.67
0x00000006          1 192.168.1.201 (local)
0x00000007          1 192.168.1.202
root@hv01:~# cat /etc/pve/corosync.conf
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: VRT18
    nodeid: 3
    quorum_votes: 1
    ring0_addr: 192.168.1.71
  }
  node {
    name: hv01
    nodeid: 6
    quorum_votes: 1
    ring0_addr: 192.168.1.201
  }
  node {
    name: hv02
    nodeid: 7
    quorum_votes: 1
    ring0_addr: 192.168.1.202
  }
  node {
    name: hv03
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 192.168.1.203
  }
  node {
    name: vrt12
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.66
  }
  node {
    name: vrt13
    nodeid: 4
    quorum_votes: 1
    ring0_addr: 192.168.1.67
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: vrt
  config_version: 19
  interface {
    bindnetaddr: 192.168.1.62
    ringnumber: 0
  }
  ip_version: ipv4
  secauth: on
  version: 2
}
root@hv01:~# service corosync status
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-07-25 20:11:21 CEST; 49min ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 3534777 (corosync)
      Tasks: 9 (limit: 629145)
     Memory: 235.4M
        CPU: 1min 920ms
     CGroup: /system.slice/corosync.service
             └─3534777 /usr/sbin/corosync -f

Jul 25 21:00:24 hv01 corosync[3534777]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:24 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d22) was formed. Members
Jul 25 21:00:31 hv01 corosync[3534777]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:31 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d26) was formed. Members
Jul 25 21:00:38 hv01 corosync[3534777]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:38 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d2a) was formed. Members
Jul 25 21:00:45 hv01 corosync[3534777]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:45 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d2e) was formed. Members
Jul 25 21:00:51 hv01 corosync[3534777]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:00:51 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d32) was formed. Members
root@hv01:~# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2023-04-02 20:36:38 CEST; 3 months 22 days ago
   Main PID: 3702 (pmxcfs)
      Tasks: 10 (limit: 629145)
     Memory: 69.5M
        CPU: 4h 5min 53.111s
     CGroup: /system.slice/pve-cluster.service
             └─3702 /usr/bin/pmxcfs

Jul 25 21:00:56 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 10
Jul 25 21:00:56 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 10
Jul 25 21:00:57 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 20
Jul 25 21:00:57 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 20
Jul 25 21:00:58 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 30
Jul 25 21:00:58 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 30
Jul 25 21:00:59 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 40
Jul 25 21:00:59 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 40
Jul 25 21:01:00 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 50
Jul 25 21:01:00 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 50

part of the syslog:
Jul 25 21:01:12 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d3e) was formed. Members
Jul 25 21:01:12 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 70
Jul 25 21:01:12 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 70
Jul 25 21:01:13 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 80
Jul 25 21:01:13 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 80
Jul 25 21:01:14 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 90
Jul 25 21:01:14 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 90
Jul 25 21:01:15 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 100
Jul 25 21:01:15 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retried 100 times
Jul 25 21:01:15 hv01 pmxcfs[3702]: [status] crit: cpg_send_message failed: 6
Jul 25 21:01:15 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retry 100
Jul 25 21:01:15 hv01 pmxcfs[3702]: [dcdb] notice: cpg_send_message retried 100 times
Jul 25 21:01:15 hv01 pmxcfs[3702]: [dcdb] crit: cpg_send_message failed: 6
Jul 25 21:01:15 hv01 pvescheduler[1438111]: jobs: cfs-lock 'file-jobs_cfg' error: got lock request timeout
Jul 25 21:01:15 hv01 pve-firewall[3823]: firewall update time (200.246 seconds)
Jul 25 21:01:16 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 10
Jul 25 21:01:17 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 20
Jul 25 21:01:18 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 30
Jul 25 21:01:19 hv01 corosync[3534777]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:01:19 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d42) was formed. Members
Jul 25 21:01:19 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 40
Jul 25 21:01:20 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 50
Jul 25 21:01:21 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 60
Jul 25 21:01:22 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 70
Jul 25 21:01:23 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 80
Jul 25 21:01:24 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 90
Jul 25 21:01:25 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 100
Jul 25 21:01:25 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retried 100 times
Jul 25 21:01:25 hv01 pmxcfs[3702]: [status] crit: cpg_send_message failed: 6
Jul 25 21:01:25 hv01 corosync[3534777]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 25 21:01:25 hv01 corosync[3534777]:   [TOTEM ] A new membership (2.d46) was formed. Members
Jul 25 21:01:25 hv01 pve-firewall[3823]: firewall update time (10.010 seconds)
Jul 25 21:01:27 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 10
Jul 25 21:01:28 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 20
Jul 25 21:01:29 hv01 pmxcfs[3702]: [status] notice: cpg_send_message retry 30
 
I found out the potential source from the problem.

Of course I checked the /etc/hosts before joining, but I oversaw (didn't see)there were two lines mentioning the server (the line that points to itself). One with the wrong ip address (10.1.51.43), other with the right one (192.168.1.203). The wrong ip address was the first and that one seems to be used in linux.
Maybe this was source from problem, because the rest of the problems I had not before when joining other nodes with exactly same routine (also when I needed to correct corosync, it was no problem).

Even though, I still hope to fix it so that I don't have to reset the server.
I guess most important finding/error now is now that corosync only sees itself and the /etc/pve-folder is incomplete (i.e. local-symlink points to nodes/hv03 but that nodes-folder does not exist on the node. Also, the pveproxy service requires the /etc/pve/local/pve-ssl.key, but this does not exist too.

(hv03 is the node that needs to join but has some problems)

Code:
root@hv03:/etc/pve# ls -lh
total 512
-r--r----- 1 root www-data 817 Jul 25 20:07 corosync.conf
lr-xr-xr-x 1 root www-data   0 Jan  1  1970 local -> nodes/hv03
lr-xr-xr-x 1 root www-data   0 Jan  1  1970 lxc -> nodes/hv03/lxc
lr-xr-xr-x 1 root www-data   0 Jan  1  1970 openvz -> nodes/hv03/openvz
lr-xr-xr-x 1 root www-data   0 Jan  1  1970 qemu-server -> nodes/hv03/qemu-server
dr-xr-xr-x 2 root www-data   0 Jul 25 20:19 virtual-guest

root@hv03:/etc/pve# service pve-cluster status
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-07-27 11:29:48 CEST; 3min 54s ago
    Process: 2309040 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
   Main PID: 2309041 (pmxcfs)
      Tasks: 6 (limit: 618671)
     Memory: 16.0M
        CPU: 178ms
     CGroup: /system.slice/pve-cluster.service
             └─2309041 /usr/bin/pmxcfs

Jul 27 11:29:51 hv03 pmxcfs[2309041]: [quorum] crit: quorum_dispatch failed: 2
Jul 27 11:29:51 hv03 pmxcfs[2309041]: [dcdb] crit: cpg_dispatch failed: 2
Jul 27 11:29:51 hv03 pmxcfs[2309041]: [dcdb] crit: cpg_leave failed: 2
Jul 27 11:29:52 hv03 pmxcfs[2309041]: [status] crit: cpg_dispatch failed: 2
Jul 27 11:29:52 hv03 pmxcfs[2309041]: [status] crit: cpg_leave failed: 2
Jul 27 11:29:53 hv03 pmxcfs[2309041]: [status] notice: update cluster info (cluster name  vrt, version = 19)
Jul 27 11:29:53 hv03 pmxcfs[2309041]: [dcdb] notice: members: 1/2309041
Jul 27 11:29:53 hv03 pmxcfs[2309041]: [dcdb] notice: all data is up to date
Jul 27 11:29:53 hv03 pmxcfs[2309041]: [status] notice: members: 1/2309041
Jul 27 11:29:53 hv03 pmxcfs[2309041]: [status] notice: all data is up to date


root@hv03:/etc/corosync# systemctl status corosync

● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-07-27 11:25:38 CEST; 2min 36s ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 2304681 (corosync)
      Tasks: 9 (limit: 618671)
     Memory: 139.0M
        CPU: 2.849s
     CGroup: /system.slice/corosync.service
             └─2304681 /usr/sbin/corosync -f

Jul 27 11:27:57 hv03 corosync[2304681]:   [QUORUM] Members[1]: 1
Jul 27 11:27:57 hv03 corosync[2304681]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 27 11:28:03 hv03 corosync[2304681]:   [QUORUM] Sync members[1]: 1
Jul 27 11:28:03 hv03 corosync[2304681]:   [TOTEM ] A new membership (1.14c65) was formed. Members
Jul 27 11:28:03 hv03 corosync[2304681]:   [QUORUM] Members[1]: 1
Jul 27 11:28:03 hv03 corosync[2304681]:   [MAIN  ] Completed service synchronization, ready to provide service.
Jul 27 11:28:10 hv03 corosync[2304681]:   [QUORUM] Sync members[1]: 1
Jul 27 11:28:10 hv03 corosync[2304681]:   [TOTEM ] A new membership (1.14c69) was formed. Members
Jul 27 11:28:10 hv03 corosync[2304681]:   [QUORUM] Members[1]: 1
Jul 27 11:28:10 hv03 corosync[2304681]:   [MAIN  ] Completed service synchronization, ready to provide service.


root@hv03:/etc/corosync# service pveproxy status
● pveproxy.service - PVE API Proxy Server
     Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
     Active: active (running) since Thu 2023-07-27 11:29:18 CEST; 3s ago
    Process: 2308501 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=1/FAILURE)
    Process: 2308504 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
   Main PID: 2308505 (pveproxy)
      Tasks: 4 (limit: 618671)
     Memory: 201.3M
        CPU: 1.269s
     CGroup: /system.slice/pveproxy.service
             ├─2308505 pveproxy
             ├─2308506 pveproxy worker
             ├─2308507 pveproxy worker
             └─2308508 pveproxy worker

Jul 27 11:29:17 hv03 pvecm[2308501]: mkdir /etc/pve/firewall: Permission denied at /usr/share/perl5/PVE/Cluster.pm lin>
Jul 27 11:29:18 hv03 pveproxy[2308505]: starting server
Jul 27 11:29:18 hv03 pveproxy[2308505]: starting 3 worker(s)
Jul 27 11:29:18 hv03 pveproxy[2308505]: worker 2308506 started
Jul 27 11:29:18 hv03 pveproxy[2308505]: worker 2308507 started
Jul 27 11:29:18 hv03 pveproxy[2308505]: worker 2308508 started
Jul 27 11:29:18 hv03 systemd[1]: Started PVE API Proxy Server.
Jul 27 11:29:18 hv03 pveproxy[2308506]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)>
Jul 27 11:29:18 hv03 pveproxy[2308507]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)>
Jul 27 11:29:18 hv03 pveproxy[2308508]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key)>


On random other node, the logs show me:
Code:
root@hv02:~# systemctl status corosync
● corosync.service - Corosync Cluster Engine
     Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2023-04-02 20:17:52 CEST; 3 months 24 days ago
       Docs: man:corosync
             man:corosync.conf
             man:corosync_overview
   Main PID: 1250342 (corosync)
      Tasks: 9 (limit: 629145)
     Memory: 5.3G
        CPU: 1d 10h 25min 59.102s
     CGroup: /system.slice/corosync.service
             └─1250342 /usr/sbin/corosync -f

Jul 27 11:36:36 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405
Jul 27 11:36:37 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405
Jul 27 11:36:38 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405
Jul 27 11:36:40 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405
Jul 27 11:36:41 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405
Jul 27 11:36:42 hv02 corosync[1250342]:   [QUORUM] Sync members[5]: 2 3 4 6 7
Jul 27 11:36:42 hv02 corosync[1250342]:   [TOTEM ] A new membership (2.14d96) was formed. Members
Jul 27 11:36:43 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405
Jul 27 11:36:44 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405
Jul 27 11:36:45 hv02 corosync[1250342]:   [KNET  ] rx: Packet rejected from 192.168.1.203:5405

root@hv02:~# systemctl status pve-cluster
● pve-cluster.service - The Proxmox VE cluster filesystem
     Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2023-04-02 20:00:23 CEST; 3 months 24 days ago
   Main PID: 1239491 (pmxcfs)
      Tasks: 10 (limit: 629145)
     Memory: 49.2M
        CPU: 4h 19min 18.729s
     CGroup: /system.slice/pve-cluster.service
             └─1239491 /usr/bin/pmxcfs

Jul 27 11:37:07 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 40
Jul 27 11:37:08 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 50
Jul 27 11:37:09 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 60
Jul 27 11:37:10 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 70
Jul 27 11:37:11 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 80
Jul 27 11:37:12 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 90
Jul 27 11:37:13 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 100
Jul 27 11:37:13 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retried 100 times
Jul 27 11:37:13 hv02 pmxcfs[1239491]: [status] crit: cpg_send_message failed: 6
Jul 27 11:37:14 hv02 pmxcfs[1239491]: [status] notice: cpg_send_message retry 10

root@hv01:~# ls -lh /etc/pve/
total 5.0K
-rw-r----- 1 root www-data  451 Jul 26 18:10 authkey.pub
-rw-r----- 1 root www-data  451 Jul 26 18:10 authkey.pub.old
-rw-r----- 1 root www-data  817 Jul 25 19:47 corosync.conf
-rw-r----- 1 root www-data   40 Apr 29 11:52 datacenter.cfg
drwxr-xr-x 2 root www-data    0 Aug 20  2019 firewall
-rw-r----- 1 root www-data 2.3K Jun 30 10:52 jobs.cfg
lrwxr-xr-x 1 root www-data    0 Jan  1  1970 local -> nodes/hv01
lrwxr-xr-x 1 root www-data    0 Jan  1  1970 lxc -> nodes/hv01/lxc
drwxr-xr-x 2 root www-data    0 Oct 29  2016 nodes
lrwxr-xr-x 1 root www-data    0 Jan  1  1970 openvz -> nodes/hv01/openvz
drwx------ 2 root www-data    0 Oct 29  2016 priv
-rw-r----- 1 root www-data 2.0K Oct 29  2016 pve-root-ca.pem
-rw-r----- 1 root www-data 1.7K Oct 29  2016 pve-www.key
lrwxr-xr-x 1 root www-data    0 Jan  1  1970 qemu-server -> nodes/hv01/qemu-server
-rw-r----- 1 root www-data    0 Apr 10 21:34 replication.cfg
-rw-r----- 1 root www-data 1.2K Apr 26 09:30 storage.cfg
-rw-r----- 1 root www-data  151 Oct 31  2017 user.cfg
drwxr-xr-x 2 root www-data    0 Jun 21  2020 virtual-guest
-rw-r----- 1 root www-data  639 Apr 28 20:03 vzdump.cron
 
Last edited:
I'm not sure why, but I think I solved the problem by restarting corosync on hv02 (one of the old nodes).

hv02 was the only one that gave the error:
Jul 27 15:19:40 hv02 corosync[1250342]: [KNET ] rx: Packet rejected from 192.168.1.203:5405

I'm not sure why, but after restart of corosync (on only that node) it suddenly worked.

only had to generate the ssl certificates, but that seemed to work.

So, it seems alright now (some small things left like that the console doesn't work when I try to open an vm on this new node from another hypervisor, but I'll try to fix this myself first)
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!