[SOLVED] pve node and vm marked as question in web ui

k-123

Well-Known Member
Mar 3, 2017
50
5
48
a node (pve-3,172.16.2.3) is marked as questioned in web UI. what is found:
1. ssh to the node is ok, and vm is not dead, can be migrate to other node with [qm migrate --live] command.
2. restart pveproxy service did not fix the problem.
3. access to the node's ui(https://172.16.2.3:8006) is ok, but the node itself still shows in question mark.
3. after system reboot, everything will back to normal. (last time, just happen again today, not yet reboot now)
1657852584259.png

further info:
1. backup job was scheduled at 05:00(to pbs), /var/syslog/vzdump/192.log (#192 is the last vm in queue) shows the job was finished at 05:24 with no error.
2. the job status in web ui shows it's finished. but the round circle never stop rolling. (and "Duration" is still counting)

system info
Code:
root@pve-3:~# pveversion 
pve-manager/7.2-7/d0dd0e85 (running kernel: 5.15.39-1-pve)
root@pve-3:~# uptime
 10:25:32 up 5 days, 19:18,  1 user,  load average: 0.18, 0.35, 0.35
physical host is a intel nuc8i5, with 32GB RAM.

Code:
root@pve-3:~# pvecm status
Cluster information
-------------------
Name:             cluster01
Config Version:   5
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Fri Jul 15 10:42:44 2022
Quorum provider:  corosync_votequorum
Nodes:            5
Node ID:          0x00000003
Ring ID:          1.de
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   5
Highest expected: 5
Total votes:      5
Quorum:           3 
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 172.16.2.1
0x00000002          1 172.16.2.2
0x00000003          1 172.16.2.3 (local)
0x00000004          1 172.16.2.4
0x00000005          1 172.16.2.5

I have no idea what is stucked.
 
other 4 nodes did not have this problem, just node 3 .
Code:
root@pve-3:~# pveversion -v
proxmox-ve: 7.2-1 (running kernel: 5.15.39-1-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-6
pve-kernel-helper: 7.2-6
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.35-3-pve: 5.15.35-6
pve-kernel-5.15.30-2-pve: 5.15.30-3
ceph-fuse: 15.2.16-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-5
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.3-1
proxmox-backup-file-restore: 2.2.3-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-1
pve-container: 4.2-1
pve-docs: 7.2-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.4-2
pve-ha-manager: 3.3-4
pve-i18n: 2.7-2
pve-qemu-kvm: 6.2.0-11
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.4-pve1
 
this happens when the 'pvestatd' daemon is not able to update the status in time.
normally the reason for this is some storage that blocks/hangs.
 
you can try to restart the pvestatd daemon, but to really fix it you have to find out the underlying cause (as i said most likely hanging storage) and fix that