Storage status unknown.

bongani · Jul 11, 2024

I am new to Proxmox and I have 4 node cluster with CEPH and NAS shared storage. One of the nodes became unresponsive, and I turned it off. HA successfully moved VMs to the other nodes. After restarting, all the storage devices have a question mark on GUI (See Screenshot) and showing status unknown from command line. What would cause this and how can I resolve it. I have tried restarting the node a few times and removing it from the cluster and re-joining it with no luck

pvesh get /cluster/resources

root@node124:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-2-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-2
proxmox-kernel-6.8.8-2-pve-signed: 6.8.8-2
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.2
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.3-1
proxmox-backup-file-restore: 3.2.3-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.12-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1

Maximiliano · Jul 12, 2024

Hello,

Did you check the system logs of the Proxmox VE host? There is probably an error somewhere with more info.

gfngfn256 · Jul 12, 2024

Please show output for the following:

Code:

cat /etc/pve/storage.cfg

df -h

(Post it in CODE tags please)

bongani · Jul 12, 2024

Hi

Please see below

Code:

root@node124:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content iso,vztmpl,backup

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

esxi: vm-node-8
        server XXX.XXX.XXX.XXX
        username root
        content import
        skip-cert-verification 1

rbd: JB1-CEPH
        content images,rootdir
        krbd 0
        pool JB1-CEPH

nfs: JB1-NAS3
        export /mnt/Storage/nas3-proxmox
        path /mnt/pve/JB1-NAS3
        server XXX.XXX.XXX.XXX
        content iso,images
        options vers=4.2
        preallocation off
        prune-backups keep-all=1

Code:

root@node124:~# df -h
Filesystem            Size  Used Avail Use% Mounted on
udev                  189G     0  189G   0% /dev
tmpfs                  38G  1.8M   38G   1% /run
/dev/mapper/pve-root   94G  6.0G   84G   7% /
tmpfs                 189G   66M  189G   1% /dev/shm
tmpfs                 5.0M     0  5.0M   0% /run/lock
/dev/fuse             128M   52K  128M   1% /etc/pve
tmpfs                 189G   28K  189G   1% /var/lib/ceph/osd/ceph-4
tmpfs                 189G   28K  189G   1% /var/lib/ceph/osd/ceph-11
tmpfs                 189G   28K  189G   1% /var/lib/ceph/osd/ceph-3
tmpfs                  38G     0   38G   0% /run/user/0

gfngfn256 · Jul 12, 2024

Can you try disabling the ESXI storage on the node. There have been bugs reported recently with the ESXI storage.

bongani · Jul 12, 2024

Hi Gfngfn256

I have removed the esxi storage, but that hasn't resolved this. Thanks for assisting by the way
I upgraded another node to kernel version 6.8.8-2-pve. After a reboot, it also started doing the same. I tried reverting to the old kernel by running the command proxmox-boot-tool kernel pin 6.8.4-3-pve, but still getting unknown states for both the node and storage devices. I don't know if there is any package on the upgrade that's causing this.

Code:

root@node124:~# cat /etc/pve/storage.cfg
dir: local
        path /var/lib/vz
        content backup,vztmpl,iso

lvmthin: local-lvm
        thinpool data
        vgname pve
        content images,rootdir

rbd: JB1-CEPH
        content images,rootdir
        krbd 0
        pool JB1-CEPH

nfs: JB1-NAS3
        export /mnt/Storage/nas3-proxmox
        path /mnt/pve/JB1-NAS3
        server 192.168.127.17
        content images,iso
        options vers=4.2
        preallocation off
        prune-backups keep-all=1

sw-omit · Jul 13, 2024

If you filter the ceph and nas storages as only being available on the other nodes (Cluster->Storage->Edit->Nodes->Select all but 124) and reboot, do local and local-lvm then become green?
If they are green, and you then re-enable the NAS3, is that one going/staying green? And what if you swap to just Ceph and local enabled?
Trying to see if it is a problem with the statd for storage in general, or with statd specifically with either of the 2 storages / storage-types.

bongani · Jul 13, 2024

Thank you @sw-omit. After removing NAS3 and restarting services the node seems to be fine.
I have also picked up an orphaned VM disk on NAS3 that I cannot destroy. I get a TASK ERROR: 'storage-JB1-NAS3'-locked command timed out - aborting. I had begun moving all VMs to CEPH from NAS3 then will remove NAS3 and add it back and see if that resolves

sw-omit · Jul 13, 2024

Given the added message, looks like the Node124 for some reason can't fully connect to NAS3, which in turn is causing the statd to wait and finally time out as well, waiting for the statistics for that storage.
Good to know we at least got a possible cause which can be further looked into, and indeed remove and re-add would have been one of the steps (if storage allowed it)
When re-adding, I would suggest to re-add it to the cluster while logged in as root from the 124 node, to trigger any errors/warnings more clearly.

Search

Search

Storage status unknown.

bongani

New Member

Maximiliano

Proxmox Staff Member

gfngfn256

Distinguished Member

bongani

New Member

gfngfn256

Distinguished Member

bongani

New Member

sw-omit

Well-Known Member

bongani

New Member

sw-omit

Well-Known Member

We value your privacy