nodes greys in versión 5.4

rickygm · Sep 17, 2019

Hi forum , recently I finished installing a cluster with proxmox 5.4 and I have a few days that a couple of my nodes are put as offline (gray), and I have ping them perfectly, I connect by ssh and the cpu is working normal, there is no high consumption in ram and not in processing, the network is perfect less than 1ms.

I have zfs tuned

cat /proc/spl/kstat/zfs/arcstats |grep c_
c_min 4 2109876352
c_max 4 8589934592
arc_no_grow 4 0
arc_tempreserve 4 0
arc_loaned_bytes 4 0
arc_prune 4 0
arc_meta_used 4 51221384
arc_meta_limit 4 6442450944
arc_dnode_limit 4 644245094
arc_meta_max 4 56157776
arc_meta_min 4 16777216
sync_wait_for_async 4 14
arc_need_free 4 0
arc_sys_free 4 1054938176

cluster status

pvecm status
Quorum information
------------------
Date: Tue Sep 17 14:45:55 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000001
Ring ID: 3/4608
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000003 1 192.168.11.30
0x00000001 1 192.168.11.31 (local)
0x00000002 1 192.168.11.36

version in use

pveversion -v
proxmox-ve: 5.4-2 (running kernel: 4.15.18-20-pve)
pve-manager: 5.4-13 (running version: 5.4-13/aee6f0ec)
pve-kernel-4.15: 5.4-8
pve-kernel-4.15.18-20-pve: 4.15.18-46
pve-kernel-4.15.18-12-pve: 4.15.18-36
corosync: 2.4.4-pve1
criu: 2.11.1-1~bpo90
glusterfs-client: 3.8.8-1
ksm-control-daemon: 1.2-2
libjs-extjs: 6.0.1-2
libpve-access-control: 5.1-12
libpve-apiclient-perl: 2.0-5
libpve-common-perl: 5.0-54
libpve-guest-common-perl: 2.0-20
libpve-http-server-perl: 2.0-14
libpve-storage-perl: 5.0-44
libqb0: 1.0.3-1~bpo9
lvm2: 2.02.168-pve6
lxc-pve: 3.1.0-6
lxcfs: 3.0.3-pve1
novnc-pve: 1.0.0-3
proxmox-widget-toolkit: 1.0-28
pve-cluster: 5.0-38
pve-container: 2.0-40
pve-docs: 5.4-2
pve-edk2-firmware: 1.20190312-1
pve-firewall: 3.0-22
pve-firmware: 2.0-7
pve-ha-manager: 2.0-9
pve-i18n: 1.1-4
pve-libspice-server1: 0.14.1-2
pve-qemu-kvm: 3.0.1-4
pve-xtermjs: 3.12.0-1
qemu-server: 5.0-54
smartmontools: 6.5+svn4324-1
spiceterm: 3.0-5
vncterm: 1.5-3
zfsutils-linux: 0.7.13-pve1~bpo2

command pveperf

pveperf
CPU BOGOMIPS: 100822.56
REGEX/SECOND: 1500991
HD SIZE: 228.61 GB (rpool/ROOT/pve-1)
FSYNCS/SECOND: 1609.30
DNS EXT: 66.74 ms
DNS INT: 42.86 ms (domain.com)

I have my cluster nodes connected by iscsi and lvm my volumes, and an nfs server for backup

I've tried with this post here

https://forum.proxmox.com/threads/node-with-question-mark.41180/
https://forum.proxmox.com/threads/c...-a-strange-state-containers-greyed-out.46650/

Stoiko Ivanov · Sep 18, 2019

* check if pvestatd is still running: `systemctl status -l pvestatd`
* check the journal for hanging nfs-mounts: `journalctl -r` (journal in reversed order)
* restart pvestatd: `systemctl restart pvestatd`

I hope this helps!

rickygm · Sep 19, 2019

the commands didn't work for me, I had to reboot the servers one by one

rickygm · Sep 19, 2019

what is the function of the journalctl -r command?

Do you think the problem could be in the nfs server, sometimes I had a power failure when the vm are in backup

Stoiko Ivanov · Sep 23, 2019

rickygm said:
what is the function of the journalctl -r command?

`journalctl -r` lists jour systems journal (=logs) in reverse order (newest first)

rickygm said:
Do you think the problem could be in the nfs server, sometimes I had a power failure when the vm are in backup

a nfs-server which vanishes due to power-failure certainly can be the cause of those problems - hanging NFS-mounts cause system-hangs (since there is no timeout for those operations) - in that case a reboot is usually the way to go.

Search

Search

nodes greys in versión 5.4

rickygm

Renowned Member

Stoiko Ivanov

Proxmox Staff Member

rickygm

Renowned Member

rickygm

Renowned Member

Stoiko Ivanov

Proxmox Staff Member