Proxmox cluster repeatedly getting out of sync

redmop

Well-Known Member
Feb 24, 2015
121
2
58
I have a cluster of 3 Proxmox servers that repeatedly seem to lose contact with one another. My switch that they are on it multicast enabled, and it is indeed enabled.

When I attempt to log on, I get this:

upload_2017-1-23_12-13-53.png

Note that it will not let me select a Realm to log on with.

Only one of the three servers lets me log on to the web interface, and it does not show communication to the other two servers.

If I reboot the cluster, they will work fine for a while, but then lose contact with one another after a random time.

The VMs appear to be running still.

What is my next step in troubleshooting this?

# pveversion -v
proxmox-ve: 4.4-78 (running kernel: 4.4.35-2-pve)
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.35-2-pve: 4.4.35-78
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.1-1
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80
 
They do not have NTP installed (I just installed it) but they have matching dates/times.

# for X in proxmox3 proxmox4 proxmox5 ; do echo -n "Date for $X - " ; ssh $X date ; done
Date for proxmox3 - Mon Jan 23 12:46:39 MST 2017
Date for proxmox4 - Mon Jan 23 12:46:39 MST 2017
Date for proxmox5 - Mon Jan 23 12:46:39 MST 2017
 
more info:

systemctl status pve-cluster.service
● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled)
Active: failed (Result: exit-code) since Mon 2017-01-23 13:14:03 MST; 6s ago
Process: 27382 ExecStart=/usr/bin/pmxcfs $DAEMON_OPTS (code=exited, status=255)
Main PID: 3363 (code=killed, signal=KILL)

Jan 23 13:14:03 proxmox4 pmxcfs[27382]: fuse: failed to access mountpoint /etc/pve: Transport endpoint is not connected
Jan 23 13:14:03 proxmox4 pmxcfs[27382]: [main] crit: fuse_mount error: Transport endpoint is not connected
Jan 23 13:14:03 proxmox4 pmxcfs[27382]: [main] crit: fuse_mount error: Transport endpoint is not connected
Jan 23 13:14:03 proxmox4 pmxcfs[27382]: [main] notice: exit proxmox configuration filesystem (-1)
Jan 23 13:14:03 proxmox4 systemd[1]: pve-cluster.service: control process exited, code=exited status=255
Jan 23 13:14:03 proxmox4 systemd[1]: Failed to start The Proxmox VE cluster filesystem.
Jan 23 13:14:03 proxmox4 systemd[1]: Unit pve-cluster.service entered failed state.
 
even more info"

# tail -f syslog
Jan 23 13:15:44 proxmox4 pveproxy[27391]: worker 2889 started
Jan 23 13:15:44 proxmox4 pveproxy[2889]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:44 proxmox4 pveproxy[27391]: worker 2879 finished
Jan 23 13:15:44 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:44 proxmox4 pveproxy[27391]: worker 2890 started
Jan 23 13:15:44 proxmox4 pveproxy[27391]: worker 2880 finished
Jan 23 13:15:44 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:44 proxmox4 pveproxy[2890]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:44 proxmox4 pveproxy[27391]: worker 2891 started
Jan 23 13:15:44 proxmox4 pveproxy[2891]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:45 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:45 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:45 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:45 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:45 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:45 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:49 proxmox4 pveproxy[2889]: worker exit
Jan 23 13:15:49 proxmox4 pveproxy[2890]: worker exit
Jan 23 13:15:49 proxmox4 pveproxy[2891]: worker exit
Jan 23 13:15:49 proxmox4 pveproxy[27391]: worker 2889 finished
Jan 23 13:15:49 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:49 proxmox4 pveproxy[27391]: worker 2895 started
Jan 23 13:15:49 proxmox4 pveproxy[2895]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:49 proxmox4 pveproxy[27391]: worker 2890 finished
Jan 23 13:15:49 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:49 proxmox4 pveproxy[27391]: worker 2896 started
Jan 23 13:15:49 proxmox4 pveproxy[27391]: worker 2891 finished
Jan 23 13:15:49 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:49 proxmox4 pveproxy[27391]: worker 2897 started
Jan 23 13:15:49 proxmox4 pveproxy[2896]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:49 proxmox4 pveproxy[2897]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:50 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:50 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:54 proxmox4 pveproxy[2895]: worker exit
Jan 23 13:15:54 proxmox4 pveproxy[2896]: worker exit
Jan 23 13:15:54 proxmox4 pveproxy[2897]: worker exit
Jan 23 13:15:54 proxmox4 pveproxy[27391]: worker 2895 finished
Jan 23 13:15:54 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:54 proxmox4 pveproxy[27391]: worker 2904 started
Jan 23 13:15:54 proxmox4 pveproxy[2904]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:54 proxmox4 pveproxy[27391]: worker 2896 finished
Jan 23 13:15:54 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:54 proxmox4 pveproxy[27391]: worker 2905 started
Jan 23 13:15:54 proxmox4 pveproxy[27391]: worker 2897 finished
Jan 23 13:15:54 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:54 proxmox4 pveproxy[27391]: worker 2906 started
Jan 23 13:15:54 proxmox4 pveproxy[2905]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:54 proxmox4 pveproxy[2906]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:15:55 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:55 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:55 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:55 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:55 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:55 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:15:59 proxmox4 pveproxy[2904]: worker exit
Jan 23 13:15:59 proxmox4 pveproxy[2905]: worker exit
Jan 23 13:15:59 proxmox4 pveproxy[2906]: worker exit
Jan 23 13:15:59 proxmox4 pveproxy[27391]: worker 2904 finished
Jan 23 13:15:59 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:15:59 proxmox4 pveproxy[27391]: worker 2907 started
Jan 23 13:15:59 proxmox4 pveproxy[2907]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:16:00 proxmox4 pveproxy[27391]: worker 2905 finished
Jan 23 13:16:00 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:16:00 proxmox4 pveproxy[27391]: worker 2908 started
Jan 23 13:16:00 proxmox4 pveproxy[2908]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:16:00 proxmox4 pveproxy[27391]: worker 2906 finished
Jan 23 13:16:00 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:16:00 proxmox4 pveproxy[27391]: worker 2909 started
Jan 23 13:16:00 proxmox4 pveproxy[2909]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:16:00 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pvestatd[3439]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:00 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:01 proxmox4 cron[3375]: (*system*vzdump) CAN'T OPEN SYMLINK (/etc/cron.d/vzdump)
Jan 23 13:16:05 proxmox4 pveproxy[2907]: worker exit
Jan 23 13:16:05 proxmox4 pveproxy[2908]: worker exit
Jan 23 13:16:05 proxmox4 pveproxy[27391]: worker 2907 finished
Jan 23 13:16:05 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:16:05 proxmox4 pveproxy[27391]: worker 2916 started
Jan 23 13:16:05 proxmox4 pveproxy[2909]: worker exit
Jan 23 13:16:05 proxmox4 pveproxy[2916]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:16:05 proxmox4 pveproxy[27391]: worker 2908 finished
Jan 23 13:16:05 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:16:05 proxmox4 pveproxy[27391]: worker 2917 started
Jan 23 13:16:05 proxmox4 pveproxy[2917]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:16:05 proxmox4 pveproxy[27391]: worker 2909 finished
Jan 23 13:16:05 proxmox4 pveproxy[27391]: starting 1 worker(s)
Jan 23 13:16:05 proxmox4 pveproxy[27391]: worker 2918 started
Jan 23 13:16:05 proxmox4 pveproxy[2918]: /etc/pve/local/pve-ssl.key: failed to load local private key (key_file or key) at /usr/share/perl5/PVE/HTTPServer.pm line 1646.
Jan 23 13:16:05 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:05 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:05 proxmox4 pve-ha-crm[3463]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:05 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:05 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
Jan 23 13:16:05 proxmox4 pve-ha-lrm[3475]: ipcc_send_rec failed: Connection refused
 
corosync is running at 100% cpu (on 1 core) on one of the nodes. One of the nodes is down (as of about an hour ago) for another problem, and one is corosync idle.

Code:
service corosync status
● corosync.service - Corosync Cluster Engine
   Loaded: loaded (/lib/systemd/system/corosync.service; enabled)
   Active: active (running) since Thu 2017-02-02 10:33:41 MST; 3 days ago
  Process: 3550 ExecStart=/usr/share/corosync/corosync start (code=exited, status=0/SUCCESS)
 Main PID: 3566 (corosync)
   CGroup: /system.slice/corosync.service
           └─3566 corosync

Feb 05 17:30:48 proxmox3 corosync[3566]: [TOTEM ] A new membership (192.168.0.133:10548) was formed. Members
Feb 05 17:30:48 proxmox3 corosync[3566]: [QUORUM] Members[1]: 1
Feb 05 17:30:48 proxmox3 corosync[3566]: [MAIN  ] Completed service synchronization, ready to provide service.
Feb 05 17:30:50 proxmox3 corosync[3566]: [TOTEM ] A new membership (192.168.0.133:10552) was formed. Members
Feb 05 17:30:50 proxmox3 corosync[3566]: [QUORUM] Members[1]: 1
Feb 05 17:30:50 proxmox3 corosync[3566]: [MAIN  ] Completed service synchronization, ready to provide service.
Feb 05 17:30:51 proxmox3 corosync[3566]: [TOTEM ] A new membership (192.168.0.133:10556) was formed. Members joined: 2
Feb 05 17:30:51 proxmox3 corosync[3566]: [QUORUM] This node is within the primary component and will provide service.
Feb 05 17:30:51 proxmox3 corosync[3566]: [QUORUM] Members[2]: 1 2
Feb 05 17:30:51 proxmox3 corosync[3566]: [MAIN  ] Completed service synchronization, ready to provide service.
 
This still isn't solved. I had to turn off multi-casting just to be able to backup the servers...

I could use some help here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!