Storage offline in web interface

check-ict

Well-Known Member
Apr 19, 2011
102
18
58
Hello,

I have 4 nodes in a cluster with a NFS storage for all VM disks.

It works fine for over a year now, but suddenly I wasn't able to create a new VM because the storage was offline:
TASK ERROR: create failed - storage 'zfs01' is not online

I checked the storage in the web interface and it shows the storage is empty (0 disk space etc.).

When I use SSH to the proxmox nodes, I see the share is mounted and running normally. I can read/write and all VM's are still running.

I tried to migrate VM's away from 1 node, but it was not possible because proxmox thinks the storage is offline. So I decided to shutdown all VM's on 1 node. I updated the node and restarted it, but it wasn't working. The node doesn't even mount the storage on boot, I had to do this manually from commandline (which works). I'm not able to start the VM's that were running on this node, again because the storage seems offline. I also restarted pvedaemon/pve-cluster/pve-manager etc. on all nodes.

How can I recover from this error without big impact? I don't want to restart all nodes and find out I can't start any VM anymore. Also I don't want to restart the ZFS01 storage server, this will be too risky and could cause a big downtime of the VM's won't start after the reboot.

The updated node:
root@proxmox-cluster-03:~# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Other nodes:
root@proxmox-cluster-01:~# pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-21 (running version: 3.1-21/93bf03d4)
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-6
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Storage test:
root@proxmox-cluster-01:~# pveperf /mnt/pve/zfs01
CPU BOGOMIPS: 72529.68
REGEX/SECOND: 823399
HD SIZE: 11225.11 GB (10.1.1.91:/data/proxmox-cluster01)
FSYNCS/SECOND: 1101.79
DNS EXT: 67.63 ms

root@proxmox-cluster-01:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,rootdir
maxfiles 0

nfs: zfs01
path /mnt/pve/zfs01
server 10.1.1.91
export /data/proxmox-cluster01
options vers=3
content images,iso,vztmpl,rootdir,backup
maxfiles 1

root@proxmox-cluster-01:~# pvecm status
Version: 6.2.0
Config Version: 4
Cluster Name: cluster01
Cluster Id: 53601
Cluster Member: Yes
Cluster Generation: 26168
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Node votes: 1
Quorum: 3
Active subsystems: 1
Flags:
Ports Bound: 0
Node name: proxmox-cluster-01
Node ID: 1
Multicast addresses: 239.192.209.51
Node addresses: 10.1.1.101


May 5 12:38:57 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:08 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:17 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:27 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:37 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:47 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online

The storage/ZFS server has not been changed, it all looks ok in the syslogs and the VM's are still running on it.
 
The problem is solved.

I figured out that "showmount -e 10.1.1.91" gives a error, while my second fileserver showed all NFS exports. It had a problem with RPCBind. I tried to restart RPCBind on ZFS01, but it caused all NFS traffic to halt. I had to reboot ZFS01 and now everything is working again.

I think the issue was ZFS on Linux (which I use) that consumed too much memory, even though I had a limit (arc_max_size). I did notice earlier that I only had 200 to 300MB left from the 24GB in the fileserver, however it was running like this for a long time so I thought this was normal ZFS behaviour. Guess I was wrong and something triggered the OOM (out of memory killer) on ZFS01.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!