[SOLVED] Upgrade to 8.2.4 - broke UI - VMs & Storage status (?) Unknown

Diff

Member
Apr 3, 2022
6
0
6
Just updated one of PVE hosts to most recent patches and both VMs & Storages now shown (?) Unknown status.

1718856166760.png

I found in forums if try to restart pvestatd it should fix VMs status.

After running following it helps recover VMs statuses, but not Storages:

Bash:
systemctl restart pvestatd

But after reboot it's broken again...

Anybody knows more permanent fix?

Here are package version from that host:

Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.8-1-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-1
proxmox-kernel-6.8.8-1-pve-signed: 6.8.8-1
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.5.13-5-pve-signed: 6.5.13-5
proxmox-kernel-6.5: 6.5.13-5
proxmox-kernel-6.5.11-4-pve-signed: 6.5.11-4
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.2
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.4-1
proxmox-backup-file-restore: 3.2.4-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
 
Last edited:
There is a good chance that your storage is slow to respond. There could be other reasons. Recently forum members reported that an external statistics service could cause similar results.
In short, the state you are seeing is a generic symptom. You need to look at the logs and run a few diagnostics commands to zoom in on the underlying cause.
Check your journal: journalctl -f
Check storage status (often) : pvesm status
Check your cluster status: pvecm status
Check that you have stable communication between nodes

Generally, there should be something in the logs that will help


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Diff
Hmm..

Code:
journalctl -f

does not give me any leads unfortunately

Code:
pvesm status

just hangs

I started to investigate storage and apparently non of NFS shares actually connected.

Since all VMs are running off local ZFS, my VMs are running. I only have NFS disks for ISO and Backups.

Code:
mount -l

dos not list any NFS shares mounted correctly

Code:
tree /mnt/pve/

/mnt/pve/
├── backup
├── cloud-init-images
└── iso

list mount folders, but not files.

Code:
showmount -e X.X.X.X

shows all shares exposed correctly, ping to TrueNAS server with these NFS shares also passes fine, so there is normal network connectivity.

This is all on single node Proxmox server I upgraded to 8.2.4.

In same time I have another PVE cluster with 3 other nodes, still running 8.2.2 and using same TrueNAS server and same NFS shares and that cluster does not show any problems.

I guess next step to keep digging into NFS client on that PVE standalone server I upgraded recently to 8.2.4 :(
 
What NIC/s are being used in this server. This sounds like a speed issue.

The new kernel (6.8) has brought a ton load of issues to the table. (I'm personally busy investigating USB 3 speeds - which appear to be broken on some devices on this new kernel).

You may have to pin this server to an older working kernel.
 
What NIC/s are being used in this server. This sounds like a speed issue.
Here is
Code:
lspci |grep -i net
02:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 04)
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 04)
04:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 04)
05:00.0 Ethernet controller: Intel Corporation Ethernet Controller I226-V (rev 04)

Also seems like rpc test successful as well

Code:
rpcinfo -t X.X.X.X nfs 4
program 100003 version 4 ready and waiting

rpcinfo -t X.X.X.X nfs 3
program 100003 version 3 ready and waiting

also, pvesm scan nfs X.X.X.X returns all expected shares
 
just hangs

I started to investigate storage and apparently non of NFS shares actually connected.
This is likely the main cause of "?"

ICMP pings and RPC queries are only part of making sure the connectivity is healthy. The absence of the established NFS mounts is the main investigation vector. I'd start with trying to mount the same exports as those in your /etc/pve/storage.cfg manually. I expect that to fail. Continue by figuring out why and what will be the cause of the presented PVE issues.
Keep in mind that the NFS used by PVE is the same Kernel/Userland that is used by Ubuntu/Debian. It's possible that something got broken, but also possible something changed in your environment.
If my suspicions are correct and you are unable to mount the exports manually, then continue with generic NFS troubleshooting, including tpcdump if necessary.
Check your MTUs as well.

Good luck


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Same issue here on 2 Servers, one of them is an EPYC 7443P with ZFS on local nvme storage.
Looks more like a Bug, than a performance issue.
 
Last edited:
Same issue here on 2 Servers, one of them is an EPYC 7443P with ZFS on local flash only nvme storage.
Looks more like a Bug, than a performance issue.
Hi @thorakel,

At a glance, it seems that nobody in the thread suggested that OP's problem was a performance issue. "Performance" is an extremely broad topic on its own.

A GUI displaying a "?" is often caused by the "pvestatd" daemon not responding, but this isn't always the case. The "pvestatd" daemon can be bogged down by various issues, including misconfigured storage, misconfigured statistics hosts, bad firewall settings, or wrong routing. Some of these issues might appear as "performance" problems at a high level. However, storage that doesn't respond within 5 seconds, or at all, is not a performance issue but rather a misconfiguration issue.

Based on the limited information you provided, it seems you have nothing in common with the original poster (OP), so your issue is unlikely to be the same.

There could be a PVE bug causing the issues for either you or the OP, but it's unlikely to be the same bug for both of you. I recommend opening a new thread, reviewing comment #2 above, and providing more detailed information than you have so far.

Good luck.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I've already found a solution for my case.
On both machines there were 3 ESXi storages configured.
After removing all three, everything was displayed correctly.
 
I'd start with trying to mount the same exports as those in your /etc/pve/storage.cfg manually. I expect that to fail. Continue by figuring out why and what will be the cause of the presented PVE issues.

tried to mount manually, it just hangs for hours and nothing happens
 
Thank you for confirming. Sounds like a network problem. I'd start with checking MTU .

Did not have enough time to sit down and dig more due work keeping me occupied. Will spend more time on weekend.

I am leaning towards this is not related to MTU or network, but rather to actual behavior of this kernel on this specific hardware.
Why?
1. Because it was working for months on this hardware before upgrade and no changes has been done, except upgrade to 8.2.4, which brought kernel 6.8.8-1.
2. I have another 3 hosts (running as PVE cluster), still running on 8.2.2 and kernel 6.8.4-3, and working with exact same TruneNAS and same shares fine.
 
Same issue. Removed ESXi storage and UI immediately recovered.
Sounds like there were changes made that made pvestatd more sensitive to probing ESXi storage.
I am curious if the ESXi pools that you removed were still online/live or offline?
But, @Diff (OP) is probably not the same boat, based on the facts so far.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Sounds like there were changes made that made pvestatd more sensitive to probing ESXi storage.
I am curious if the ESXi pools that you removed were still online/live or offline?
But, @Diff (OP) is probably not the same boat, based on the facts so far.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
The esxi storage attached were on production servers, so all were online.
 
I have had the exact same issue, with the resolution as above. Remove the ESXi storage and all is well. After a year of progressing towards a production environment and migrating away from VMWare, this was alarming!
 
  • Like
Reactions: gilberto.novutec
Just wanted to report back with last update (kernel 6.8.8-2) released problem is fully fixed

Code:
uname -r
6.8.8-2-pve
 
I got a similar problem with kernel 6.8.8-2 on a cluster using ceph.
The error in the interface is "can't open '/sys/fs/cgroup/blkio//lxc/..../blkio.throttle.io_service_bytes_recursive'" where ... is the lxc container id.

The old kernel 6.8.4.3 works flawlessy.

CEPH is made on NVME drives and server are interconnected on a 10Gb/s copper lan.
In the same cluster I have also some node that uses CEPH but not provide OSD to the cluster: in those nodes kernel 6.8.8-2 works without issues.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!