I have a four node PVE. All nodes are licensed with PVE Community Subscription.
Node one has failed and must be replaced.
The cluster and all VM's are working fine on the remaining three nodes.
Following suggestions on other posts I have re-installed Proxmox on new hardware to replace failed Node one.
But when I add it to the cluster although the node shows up in the GUI, the CLI of node keeps timeing out trying to connect.
Here is last part of the message log (I could not add all of it due to size) -
Sep 27 21:51:05 pmc1 kernel: DLM (built Sep 12 2015 12:55:41) installed
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Corosync built-in features: nss
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Successfully parsed cman config
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Successfully configured openais services to load
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] The network interface is down.
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Using quorum provider quorum_cman
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Sep 27 21:51:05 pmc1 corosync[4028]: [CMAN ] CMAN 1364188437 (built Mar 25 2013 06:14:01) started
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais cluster membership service B.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais event service B.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais message service B.03.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais distributed locking service B.03.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais timer service A.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync configuration service
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync profile loading service
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Using quorum provider quorum_cman
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] CLM CONFIGURATION CHANGE
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] New Configuration:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Left:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Joined:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] CLM CONFIGURATION CHANGE
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] New Configuration:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] #011r(0) ip(127.0.0.1)
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Left:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Joined:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] #011r(0) ip(127.0.0.1)
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Members[1]: 1
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Members[1]: 1
Sep 27 21:51:05 pmc1 corosync[4028]: [CPG ] chosen downlist: sender r(0) ip(127.0.0.1) ; members(old:0 left:0)
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 27 21:51:55 pmc1 kernel: Netfilter messages via NETLINK v0.30.
Sep 27 21:51:55 pmc1 kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
Sep 27 21:51:55 pmc1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Sep 27 21:51:55 pmc1 kernel: tun: Universal TUN/TAP device driver, 1.6
Sep 27 21:51:55 pmc1 kernel: tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
Sep 27 21:51:55 pmc1 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Sep 27 21:51:55 pmc1 kernel: Enabling conntracks and NAT for ve0
Sep 27 21:51:55 pmc1 kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Sep 27 21:51:55 pmc1 kernel: ploop_dev: module loaded
Sep 27 21:51:56 pmc1 kernel: ip_set: protocol 6
Sep 27 21:51:58 pmc1 pvesh: <root@pam> starting task UPIDmc1:000010DD:000021E6:56089D3E:startall::root@pam:
Here is the continuous repeating of the syslog -
Sep 29 07:26:44 pmc1 pveproxy[225574]: worker exit
Sep 29 07:26:44 pmc1 pveproxy[225575]: worker exit
Sep 29 07:26:44 pmc1 pveproxy[4291]: worker 225574 finished
Sep 29 07:26:44 pmc1 pveproxy[4291]: starting 1 worker(s)
Sep 29 07:26:44 pmc1 pveproxy[4291]: worker 225575 finished
Sep 29 07:26:44 pmc1 pveproxy[4291]: worker 225579 started
Sep 29 07:26:44 pmc1 pveproxy[225579]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1634
I'm looking for some guidance to get node one back into the cluster.
Node one has failed and must be replaced.
The cluster and all VM's are working fine on the remaining three nodes.
Following suggestions on other posts I have re-installed Proxmox on new hardware to replace failed Node one.
But when I add it to the cluster although the node shows up in the GUI, the CLI of node keeps timeing out trying to connect.
Here is last part of the message log (I could not add all of it due to size) -
Sep 27 21:51:05 pmc1 kernel: DLM (built Sep 12 2015 12:55:41) installed
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Corosync Cluster Engine ('1.4.7'): started and ready to provide service.
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Corosync built-in features: nss
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Successfully read config from /etc/cluster/cluster.conf
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Successfully parsed cman config
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Successfully configured openais services to load
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] Initializing transport (UDP/IP Multicast).
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] Initializing transmit/receive security: libtomcrypt SOBER128/SHA1HMAC (mode 0).
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] The network interface is down.
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Using quorum provider quorum_cman
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Sep 27 21:51:05 pmc1 corosync[4028]: [CMAN ] CMAN 1364188437 (built Mar 25 2013 06:14:01) started
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync CMAN membership service 2.90
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais cluster membership service B.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais event service B.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais checkpoint service B.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais message service B.03.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais distributed locking service B.03.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: openais timer service A.01.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync extended virtual synchrony service
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync configuration service
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster closed process group service v1.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster config database access v1.01
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync profile loading service
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Using quorum provider quorum_cman
Sep 27 21:51:05 pmc1 corosync[4028]: [SERV ] Service engine loaded: corosync cluster quorum service v0.1
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Compatibility mode set to whitetank. Using V1 and V2 of the synchronization engine.
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] CLM CONFIGURATION CHANGE
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] New Configuration:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Left:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Joined:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] CLM CONFIGURATION CHANGE
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] New Configuration:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] #011r(0) ip(127.0.0.1)
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Left:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] Members Joined:
Sep 27 21:51:05 pmc1 corosync[4028]: [CLM ] #011r(0) ip(127.0.0.1)
Sep 27 21:51:05 pmc1 corosync[4028]: [TOTEM ] A processor joined or left the membership and a new membership was formed.
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Members[1]: 1
Sep 27 21:51:05 pmc1 corosync[4028]: [QUORUM] Members[1]: 1
Sep 27 21:51:05 pmc1 corosync[4028]: [CPG ] chosen downlist: sender r(0) ip(127.0.0.1) ; members(old:0 left:0)
Sep 27 21:51:05 pmc1 corosync[4028]: [MAIN ] Completed service synchronization, ready to provide service.
Sep 27 21:51:55 pmc1 kernel: Netfilter messages via NETLINK v0.30.
Sep 27 21:51:55 pmc1 kernel: kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work properly. Using workaround
Sep 27 21:51:55 pmc1 kernel: ip_tables: (C) 2000-2006 Netfilter Core Team
Sep 27 21:51:55 pmc1 kernel: tun: Universal TUN/TAP device driver, 1.6
Sep 27 21:51:55 pmc1 kernel: tun: (C) 1999-2004 Max Krasnyansky <maxk@qualcomm.com>
Sep 27 21:51:55 pmc1 kernel: ip6_tables: (C) 2000-2006 Netfilter Core Team
Sep 27 21:51:55 pmc1 kernel: Enabling conntracks and NAT for ve0
Sep 27 21:51:55 pmc1 kernel: nf_conntrack version 0.5.0 (16384 buckets, 65536 max)
Sep 27 21:51:55 pmc1 kernel: ploop_dev: module loaded
Sep 27 21:51:56 pmc1 kernel: ip_set: protocol 6
Sep 27 21:51:58 pmc1 pvesh: <root@pam> starting task UPIDmc1:000010DD:000021E6:56089D3E:startall::root@pam:
Here is the continuous repeating of the syslog -
Sep 29 07:26:44 pmc1 pveproxy[225574]: worker exit
Sep 29 07:26:44 pmc1 pveproxy[225575]: worker exit
Sep 29 07:26:44 pmc1 pveproxy[4291]: worker 225574 finished
Sep 29 07:26:44 pmc1 pveproxy[4291]: starting 1 worker(s)
Sep 29 07:26:44 pmc1 pveproxy[4291]: worker 225575 finished
Sep 29 07:26:44 pmc1 pveproxy[4291]: worker 225579 started
Sep 29 07:26:44 pmc1 pveproxy[225579]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1634
I'm looking for some guidance to get node one back into the cluster.