Hi,
we have a three node Proxmox Cluster on Proxmox 6.
We have added new hardware network cards and disks to one system.
After installing the hardware during the first boot the network was unreachable.
The name of the network card changed from "enp3s0=>enp4s0".
We have one bridging device for the vms and one network device for ceph.
I have changed this in /etc/network/interfaces ... the network is reachable for all interfaces as it should.
But the webinterface on changed node does not work and the cluster is not working as well.
The curious thing is the other members see the defect node but the defect node does not see the other nodes:
Output Defect Node:
Output Other Nodes:
The Output of "
systemctl status pve-cluster pveproxy pvedaemon
It seems that on the defect node the Proxy certificates are not present, but a
pvecm updatecerts -force does not work because the cluster is not present for the node.
The outout of "systemctl status corosync" is:
Any ideas?
we have a three node Proxmox Cluster on Proxmox 6.
We have added new hardware network cards and disks to one system.
After installing the hardware during the first boot the network was unreachable.
The name of the network card changed from "enp3s0=>enp4s0".
We have one bridging device for the vms and one network device for ceph.
I have changed this in /etc/network/interfaces ... the network is reachable for all interfaces as it should.
But the webinterface on changed node does not work and the cluster is not working as well.
The curious thing is the other members see the defect node but the defect node does not see the other nodes:
Output Defect Node:
Code:
pvecm status
ipcc_send_rec[1] failed: Connection refused
ipcc_send_rec[2] failed: Connection refused
ipcc_send_rec[3] failed: Connection refused
Unable to load access control list: Connection refused
Output Other Nodes:
Code:
Cluster information
-------------------
Name: XXXXXXXX
Config Version: 6
Transport: knet
Secure auth: on
Quorum information
------------------
Date: Mon Jul 27 18:17:21 2020
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 1.70
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 1 XXX.YYY.ZZZ.121 (local)
0x00000002 1 XXX.YYY.ZZZ.119
0x00000004 1 XXX.YYY.ZZZ.211
The Output of "
systemctl status pve-cluster pveproxy pvedaemon
Code:
pve-cluster.service
Loaded: bad-setting (Reason: Unit pve-cluster.service has a bad unit file setting.)
Active: inactive (dead)
Jul 27 17:57:56 vmhost3 systemd[1]: /etc/systemd/system/pve-cluster.service:1: Assignment outside of section. Ignoring.
Jul 27 17:57:56 vmhost3 systemd[1]: pve-cluster.service: Service has no ExecStart=, ExecStop=, or SuccessAction=. Refusing.
Jul 27 17:57:56 vmhost3 systemd[1]: /etc/systemd/system/pve-cluster.service:1: Assignment outside of section. Ignoring.
Jul 27 17:57:56 vmhost3 systemd[1]: pve-cluster.service: Service has no ExecStart=, ExecStop=, or SuccessAction=. Refusing.
Jul 27 17:57:56 vmhost3 systemd[1]: /etc/systemd/system/pve-cluster.service:1: Assignment outside of section. Ignoring.
Jul 27 17:57:56 vmhost3 systemd[1]: pve-cluster.service: Service has no ExecStart=, ExecStop=, or SuccessAction=. Refusing.
Jul 27 18:00:31 vmhost3 systemd[1]: /etc/systemd/system/pve-cluster.service:1: Missing '='.
Jul 27 18:06:10 vmhost3 systemd[1]: pve-cluster.service: Cannot add dependency job, ignoring: Unit pve-cluster.service has a bad unit file setting.
● pveproxy.service - PVE API Proxy Server
Loaded: loaded (/lib/systemd/system/pveproxy.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2020-07-27 17:58:03 CEST; 24min ago
Process: 1408 ExecStartPre=/usr/bin/pvecm updatecerts --silent (code=exited, status=111)
Process: 1413 ExecStart=/usr/bin/pveproxy start (code=exited, status=0/SUCCESS)
Main PID: 1417 (pveproxy)
Tasks: 4 (limit: 4915)
Memory: 132.8M
CGroup: /system.slice/pveproxy.service
├─1417 pveproxy
├─3746 pveproxy worker
├─3747 pveproxy worker
└─3748 pveproxy worker
Jul 27 18:22:17 vmhost3 pveproxy[3746]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1727.
Jul 27 18:22:17 vmhost3 pveproxy[3744]: worker exit
Jul 27 18:22:17 vmhost3 pveproxy[3745]: worker exit
Jul 27 18:22:17 vmhost3 pveproxy[1417]: worker 3744 finished
Jul 27 18:22:17 vmhost3 pveproxy[1417]: worker 3745 finished
Jul 27 18:22:17 vmhost3 pveproxy[1417]: starting 2 worker(s)
Jul 27 18:22:17 vmhost3 pveproxy[1417]: worker 3747 started
Jul 27 18:22:17 vmhost3 pveproxy[1417]: worker 3748 started
Jul 27 18:22:17 vmhost3 pveproxy[3747]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1727.
Jul 27 18:22:17 vmhost3 pveproxy[3748]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1727.
● pvedaemon.service - PVE API Daemon
Loaded: loaded (/lib/systemd/system/pvedaemon.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2020-07-27 17:57:57 CEST; 24min ago
Main PID: 1403 (pvedaemon)
Tasks: 4 (limit: 4915)
Memory: 133.9M
CGroup: /system.slice/pvedaemon.service
├─1403 pvedaemon
├─1404 pvedaemon worker
├─1405 pvedaemon worker
└─1406 pvedaemon worker
Jul 27 17:57:53 vmhost3 systemd[1]: Starting PVE API Daemon...
Jul 27 17:57:56 vmhost3 pvedaemon[1403]: starting server
Jul 27 17:57:56 vmhost3 pvedaemon[1403]: starting 3 worker(s)
Jul 27 17:57:56 vmhost3 pvedaemon[1403]: worker 1404 started
Jul 27 17:57:56 vmhost3 pvedaemon[1403]: worker 1405 started
Jul 27 17:57:56 vmhost3 pvedaemon[1403]: worker 1406 started
Jul 27 17:57:57 vmhost3 systemd[1]: Started PVE API Daemon.
pvecm updatecerts -force does not work because the cluster is not present for the node.
The outout of "systemctl status corosync" is:
Code:
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2020-07-27 17:57:53 CEST; 34min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 1019 (corosync)
Tasks: 9 (limit: 4915)
Memory: 131.2M
CGroup: /system.slice/corosync.service
└─1019 /usr/sbin/corosync -f
Jul 27 17:58:09 vmhost3 corosync[1019]: [KNET ] pmtud: PMTUD link change for host: 2 link: 0 from 469 to 1397
Jul 27 17:58:09 vmhost3 corosync[1019]: [KNET ] pmtud: PMTUD link change for host: 2 link: 1 from 469 to 1397
Jul 27 17:58:09 vmhost3 corosync[1019]: [KNET ] pmtud: PMTUD link change for host: 1 link: 0 from 469 to 1397
Jul 27 17:58:09 vmhost3 corosync[1019]: [KNET ] pmtud: PMTUD link change for host: 1 link: 1 from 469 to 1397
Jul 27 17:58:09 vmhost3 corosync[1019]: [KNET ] pmtud: Global data MTU changed to: 1397
Jul 27 18:16:13 vmhost3 corosync[1019]: [KNET ] link: host: 1 link: 0 is down
Jul 27 18:16:13 vmhost3 corosync[1019]: [KNET ] host: host: 1 (passive) best link: 1 (pri: 1)
Jul 27 18:16:13 vmhost3 corosync[1019]: [TOTEM ] Retransmit List: 13f2
Jul 27 18:16:25 vmhost3 corosync[1019]: [KNET ] rx: host: 1 link: 0 is up
Jul 27 18:16:25 vmhost3 corosync[1019]: [KNET ] host: host: 1 (passive) best link: 0 (pri: 1)
Any ideas?
Last edited: