Storage status unknown.

bongani

New Member
May 22, 2024
4
0
1
I am new to Proxmox and I have 4 node cluster with CEPH and NAS shared storage. One of the nodes became unresponsive, and I turned it off. HA successfully moved VMs to the other nodes. After restarting, all the storage devices have a question mark on GUI (See Screenshot) and showing status unknown from command line. What would cause this and how can I resolve it. I have tried restarting the node a few times and removing it from the cluster and re-joining it with no luck

node124.jpg

pvesh get /cluster/resources
node124_2.jpg

root@node124:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.2
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.3-1
proxmox-backup-file-restore: 3.2.3-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1
 
Hello,

Did you check the system logs of the Proxmox VE host? There is probably an error somewhere with more info.
 
Please show output for the following:
Code:
cat /etc/pve/storage.cfg

df -h
(Post it in CODE tags please)
 
Hi

Please see below

Code:
root@node124:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,vztmpl,backup

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

esxi: vm-node-8
        server XXX.XXX.XXX.XXX
        username root
        content import
        skip-cert-verification 1

rbd: JB1-CEPH
        content images,rootdir
        krbd 0
        pool JB1-CEPH

nfs: JB1-NAS3
        export /mnt/Storage/nas3-proxmox
        path /mnt/pve/JB1-NAS3
        server XXX.XXX.XXX.XXX
        content iso,images
        options vers=4.2
        preallocation off
        prune-backups keep-all=1


Code:
root@node124:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                  189G     0  189G   0% /dev
tmpfs                  38G  1.8M   38G   1% /run
/dev/mapper/pve-root   94G  6.0G   84G   7% /
tmpfs                 189G   66M  189G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/fuse             128M   52K  128M   1% /etc/pve
tmpfs                 189G   28K  189G   1% /var/lib/ceph/osd/ceph-4
tmpfs                 189G   28K  189G   1% /var/lib/ceph/osd/ceph-11
tmpfs                 189G   28K  189G   1% /var/lib/ceph/osd/ceph-3
tmpfs                  38G     0   38G   0% /run/user/0
 
Can you try disabling the ESXI storage on the node. There have been bugs reported recently with the ESXI storage.
 
Hi Gfngfn256

I have removed the esxi storage, but that hasn't resolved this. Thanks for assisting by the way
I upgraded another node to kernel version 6.8.8-2-pve. After a reboot, it also started doing the same. I tried reverting to the old kernel by running the command proxmox-boot-tool kernel pin 6.8.4-3-pve, but still getting unknown states for both the node and storage devices. I don't know if there is any package on the upgrade that's causing this.


Code:
root@node124:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

rbd: JB1-CEPH
        content images,rootdir
        krbd 0
        pool JB1-CEPH

nfs: JB1-NAS3
        export /mnt/Storage/nas3-proxmox
        path /mnt/pve/JB1-NAS3
        server 192.168.127.17
        content images,iso
        options vers=4.2
        preallocation off
        prune-backups keep-all=1

node124_3.jpg
 
If you filter the ceph and nas storages as only being available on the other nodes (Cluster->Storage->Edit->Nodes->Select all but 124) and reboot, do local and local-lvm then become green?
If they are green, and you then re-enable the NAS3, is that one going/staying green? And what if you swap to just Ceph and local enabled?
Trying to see if it is a problem with the statd for storage in general, or with statd specifically with either of the 2 storages / storage-types.
 
Thank you @sw-omit. After removing NAS3 and restarting services the node seems to be fine.
I have also picked up an orphaned VM disk on NAS3 that I cannot destroy. I get a TASK ERROR: 'storage-JB1-NAS3'-locked command timed out - aborting. I had begun moving all VMs to CEPH from NAS3 then will remove NAS3 and add it back and see if that resolves
 
Given the added message, looks like the Node124 for some reason can't fully connect to NAS3, which in turn is causing the statd to wait and finally time out as well, waiting for the statistics for that storage.
Good to know we at least got a possible cause which can be further looked into, and indeed remove and re-add would have been one of the steps (if storage allowed it)
When re-adding, I would suggest to re-add it to the cluster while logged in as root from the 124 node, to trigger any errors/warnings more clearly.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!