Cluster problem

Nam Trần

Active Member
Jun 29, 2016
31
1
28
39
I had 3 PCs freshly installed Proxmox 4.4.2 on Friday evening with enterprise repository disabled and pve-no-subscription repository enabled. After upgrading to the newest version, I created a cluster on one PC. Then add the other PCs to that cluster. Everything works fine for about 20 hours.

Yesterday I logged into one PC via web-based management tool and found out that the two other PCs were marked in red. I first think that those PCs has a power failure but I lately can log in those two ones separately. I am not a Linux guru nor Proxmox master so I feel very bad about what is happening.

Autostart VMs are stuck until I manually hit pvecm expected 1.

pvecm nodes and /etc/hosts lists only the node I am loggin on.

pveversion -v returns:
Code:
proxmox-ve: 4.4-84 (running kernel: 4.4.44-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.44-1-pve: 4.4.44-84
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-109
pve-firmware: 1.1-10
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-96
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80

systemctl --all returns (normal records are trimmed to save spaces):
Code:
  UNIT                                                                                                           LOAD      ACTIVE     SUB       JOB   DESCRIPTION
● var-lock.mount                                                                                                 not-found inactive   dead            var-lock.mount
● var-run.mount                                                                                                  not-found inactive   dead            var-run.mount
● auditd.service                                                                                                 not-found inactive   dead            auditd.service
● ceph.service                                                                                                   not-found inactive   dead            ceph.service
● clamav-daemon.service                                                                                          not-found inactive   dead            clamav-daemon.service
● console-screen.service                                                                                         not-found inactive   dead            console-screen.service
● display-manager.service                                                                                        not-found inactive   dead            display-manager.service
● dovecot.service                                                                                                not-found inactive   dead            dovecot.service
● dracut-mount.service        
● glusterd.service                                                                                               not-found inactive   dead            glusterd.service
● keymap.service                                                                                                 not-found inactive   dead            keymap.service
● mountdevsubfs.service                                                                                          masked    inactive   dead            mountdevsubfs.service
● mountkernfs.service                                                                                            masked    inactive   dead            mountkernfs.service
● mountnfs-bootclean.service                                                                                     masked    inactive   dead            mountnfs-bootclean.service
● mysql.service                                                                                                  not-found inactive   dead            mysql.service
● nfs-kernel-server.service                                                                                      not-found inactive   dead            nfs-kernel-server.service
● nfs-server.service                                                                                             not-found inactive   dead            nfs-server.service
● plymouth-quit-wait.service                                                                                     not-found inactive   dead            plymouth-quit-wait.service
● plymouth-start.service                                                                                         not-found inactive   dead            plymouth-start.servicet
● postgresql.service                                                                                             not-found inactive   dead            postgresql.service
● postgrey.service                                                                                               not-found inactive   dead            postgrey.service
● saslauthd.service                                                                                              not-found inactive   dead            saslauthd.service
● sheepdog.service            
● smb.service                                                                                                    not-found inactive   dead            smb.service
● spamassassin.service                                                                                           not-found inactive   dead            spamassassin.service
● systemd-sysusers.service                                                                                       not-found inactive   dead            systemd-sysusers.service
● systemd-udev-hwdb-update.service                                                                               not-found inactive   dead            systemd-udev-hwdb-update.service
● systemd-vconsole-setup.service                                                                                 not-found inactive   dead            systemd-vconsole-setup.service

What should I do to overcome the problem?
 
Have you verified that multicast is working? If not there are some good wiki pages that show how to test.

https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network

there may be another page - I could not find it.

Thanks for your informative information. I have found out that the problem happens when I open the web-based management tool over VPN connection. Even if I connect to a local PC via RDP, the web page tells that quorum is not OK. Only when I physically sit on my PC in my work place, everything works as expected. Is that weird?
 
After a power failure, the two red nodes became online like a charm. Cannot figure how.

In fact, the 3 nodes are connected to the two distinct linked switches located in 2 450-metre-far buildings. The 2 reds reside on one building. The two switches are Cisco SG300-20.

Thanks!