Cluster problem

Nam Trần

Active Member
Jun 29, 2016
31
1
28
37
I had 3 PCs freshly installed Proxmox 4.4.2 on Friday evening with enterprise repository disabled and pve-no-subscription repository enabled. After upgrading to the newest version, I created a cluster on one PC. Then add the other PCs to that cluster. Everything works fine for about 20 hours.

Yesterday I logged into one PC via web-based management tool and found out that the two other PCs were marked in red. I first think that those PCs has a power failure but I lately can log in those two ones separately. I am not a Linux guru nor Proxmox master so I feel very bad about what is happening.

Autostart VMs are stuck until I manually hit pvecm expected 1.

pvecm nodes and /etc/hosts lists only the node I am loggin on.

pveversion -v returns:
Code:
proxmox-ve: 4.4-84 (running kernel: 4.4.44-1-pve)
pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.44-1-pve: 4.4.44-84
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-109
pve-firmware: 1.1-10
libpve-common-perl: 4.0-94
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-1
pve-docs: 4.4-3
pve-qemu-kvm: 2.7.1-4
pve-container: 1.0-96
pve-firewall: 2.0-33
pve-ha-manager: 1.0-40
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80

systemctl --all returns (normal records are trimmed to save spaces):
Code:
  UNIT                                                                                                           LOAD      ACTIVE     SUB       JOB   DESCRIPTION
● var-lock.mount                                                                                                 not-found inactive   dead            var-lock.mount
● var-run.mount                                                                                                  not-found inactive   dead            var-run.mount
● auditd.service                                                                                                 not-found inactive   dead            auditd.service
● ceph.service                                                                                                   not-found inactive   dead            ceph.service
● clamav-daemon.service                                                                                          not-found inactive   dead            clamav-daemon.service
● console-screen.service                                                                                         not-found inactive   dead            console-screen.service
● display-manager.service                                                                                        not-found inactive   dead            display-manager.service
● dovecot.service                                                                                                not-found inactive   dead            dovecot.service
● dracut-mount.service        
● glusterd.service                                                                                               not-found inactive   dead            glusterd.service
● keymap.service                                                                                                 not-found inactive   dead            keymap.service
● mountdevsubfs.service                                                                                          masked    inactive   dead            mountdevsubfs.service
● mountkernfs.service                                                                                            masked    inactive   dead            mountkernfs.service
● mountnfs-bootclean.service                                                                                     masked    inactive   dead            mountnfs-bootclean.service
● mysql.service                                                                                                  not-found inactive   dead            mysql.service
● nfs-kernel-server.service                                                                                      not-found inactive   dead            nfs-kernel-server.service
● nfs-server.service                                                                                             not-found inactive   dead            nfs-server.service
● plymouth-quit-wait.service                                                                                     not-found inactive   dead            plymouth-quit-wait.service
● plymouth-start.service                                                                                         not-found inactive   dead            plymouth-start.servicet
● postgresql.service                                                                                             not-found inactive   dead            postgresql.service
● postgrey.service                                                                                               not-found inactive   dead            postgrey.service
● saslauthd.service                                                                                              not-found inactive   dead            saslauthd.service
● sheepdog.service            
● smb.service                                                                                                    not-found inactive   dead            smb.service
● spamassassin.service                                                                                           not-found inactive   dead            spamassassin.service
● systemd-sysusers.service                                                                                       not-found inactive   dead            systemd-sysusers.service
● systemd-udev-hwdb-update.service                                                                               not-found inactive   dead            systemd-udev-hwdb-update.service
● systemd-vconsole-setup.service                                                                                 not-found inactive   dead            systemd-vconsole-setup.service

What should I do to overcome the problem?
 
Have you verified that multicast is working? If not there are some good wiki pages that show how to test.

https://pve.proxmox.com/wiki/Cluster_Manager#_cluster_network

there may be another page - I could not find it.

Thanks for your informative information. I have found out that the problem happens when I open the web-based management tool over VPN connection. Even if I connect to a local PC via RDP, the web page tells that quorum is not OK. Only when I physically sit on my PC in my work place, everything works as expected. Is that weird?
 
After a power failure, the two red nodes became online like a charm. Cannot figure how.

In fact, the 3 nodes are connected to the two distinct linked switches located in 2 450-metre-far buildings. The 2 reds reside on one building. The two switches are Cisco SG300-20.

Thanks!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!