Some Cluster Nodes marked red in Web-Interface

adoII

Renowned Member
Jan 28, 2010
174
17
83
Hi there,

i have a proxmox cluster with 10 machines. All machines same software proxmox 4.1-2 newest release from pve-no-subscription repository.

3 of the 10 nodes are red in the Proxmox Webinterface instead of green. When I restart pvestatd on these machines they become green for some minutes and then turn red again.

Restarting all services (corosync.service pve-cluster.service pvedaemon.service pveproxy.service pvestatd.service) on all machines also helps only for a few minutes..

Multicast and omping is okay.

I have changed a switch this night. he switch I changed does not connect these 10 machines but it connects other machines in the same subnet.

My pveversions look like this:
Code:
proxmox-ve: 4.1-28 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-2 (running version: 4.1-2/78c5f4a2)
pve-kernel-4.2.6-1-pve: 4.2.6-28
pve-kernel-4.2.2-1-pve: 4.2.2-16
pve-kernel-4.2.3-2-pve: 4.2.3-22
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 0.17.2-1
pve-cluster: 4.0-29
qemu-server: 4.0-42
pve-firmware: 1.1-7
libpve-common-perl: 4.0-42
libpve-access-control: 4.0-10
libpve-storage-perl: 4.0-38
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.4-18
pve-container: 1.0-35
pve-firewall: 2.0-14
pve-ha-manager: 1.0-16
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-5
lxcfs: 0.13-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: not correctly installed

Any Ideas how to solve the problem ?
 
Maybe a problem with a storage backend? NFS server offline? Test with

# pvesm status

on all nodes to see if storage backends are online.
 
Yes, it was a stale NFS mount which made some pve processes hang.

Thanks Dietmar
 
Hi,

Today I had the same problem on a node (stale NFS mount), is there a way to avoid this situation ? Why PVE is freezing on a stale NFS mount, which is not used ? (confirmed with lsof, no process were accessing nfs mount).

Thank you
 
By the way it seems there's a loop when Proxmox is trying to remount an NFS share :
# ps aux | grep nfs
root 1033 0.0 0.0 0 0 ? S< 2015 0:00 [nfsiod]
root 1807 0.0 0.0 19868 1640 ? S 11:43 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 1808 0.0 0.0 42692 3928 ? D 11:43 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 1829 0.0 0.0 19868 1720 ? S 11:43 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 1830 0.0 0.0 42692 3908 ? D 11:43 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 1963 0.0 0.0 19868 1712 ? S 11:44 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 1964 0.0 0.0 42692 3908 ? D 11:44 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 2204 0.0 0.0 19868 1724 ? S 11:46 0:00 /bin/mount -t nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o vers=3
root 2205 0.0 0.0 42692 3908 ? D 11:46 0:00 /sbin/mount.nfs mynfserver:/exports/pvedata/pvedata1 /mnt/pve/PVE01-PVEDATA1 -o rw,vers=3
root 2261 0.0 0.0 12732 2048 pts/2 S+ 11:47 0:00 grep nfs

Creates a lot of processes but when mount is frozen, PVE re-try to mount...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!