Some Cluster Nodes marked red in Web-Interface

adoII

Renowned Member
Jan 28, 2010
182
19
83
Hi there,

i have a proxmox cluster with 10 machines. All machines same software proxmox 4.1-2 newest release from pve-no-subscription repository.

3 of the 10 nodes are red in the Proxmox Webinterface instead of green. When I restart pvestatd on these machines they become green for some minutes and then turn red again.

Restarting all services (corosync.service pve-cluster.service pvedaemon.service pveproxy.service pvestatd.service) on all machines also helps only for a few minutes..

Multicast and omping is okay.

I have changed a switch this night. he switch I changed does not connect these 10 machines but it connects other machines in the same subnet.

My pveversions look like this:
Code:
proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: not correctly installed

Any Ideas how to solve the problem ?
 
Maybe a problem with a storage backend? NFS server offline? Test with

# pvesm status

on all nodes to see if storage backends are online.
 
Yes, it was a stale NFS mount which made some pve processes hang.

Thanks Dietmar
 
Hi,

Today I had the same problem on a node (stale NFS mount), is there a way to avoid this situation ? Why PVE is freezing on a stale NFS mount, which is not used ? (confirmed with lsof, no process were accessing nfs mount).

Thank you
 
By the way it seems there's a loop when Proxmox is trying to remount an NFS share :
# ps aux | grep nfs
root 1033 0.0 0.0 0 0 ? S< 2015 0:00 [nfsiod]
root 1807 0.0 0.0 19868 1640 ? S 11:43 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 1808 0.0 0.0 42692 3928 ? D 11:43 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 1829 0.0 0.0 19868 1720 ? S 11:43 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 1830 0.0 0.0 42692 3908 ? D 11:43 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 1963 0.0 0.0 19868 1712 ? S 11:44 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 1964 0.0 0.0 42692 3908 ? D 11:44 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 2204 0.0 0.0 19868 1724 ? S 11:46 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 2205 0.0 0.0 42692 3908 ? D 11:46 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 2261 0.0 0.0 12732 2048 pts/2 S+ 11:47 0:00 grep nfs

Creates a lot of processes but when mount is frozen, PVE re-try to mount...