Hello,
I have 4 nodes in a cluster with a NFS storage for all VM disks.
It works fine for over a year now, but suddenly I wasn't able to create a new VM because the storage was offline:
TASK ERROR: create failed - storage 'zfs01' is not online
I checked the storage in the web interface and it shows the storage is empty (0 disk space etc.).
When I use SSH to the proxmox nodes, I see the share is mounted and running normally. I can read/write and all VM's are still running.
I tried to migrate VM's away from 1 node, but it was not possible because proxmox thinks the storage is offline. So I decided to shutdown all VM's on 1 node. I updated the node and restarted it, but it wasn't working. The node doesn't even mount the storage on boot, I had to do this manually from commandline (which works). I'm not able to start the VM's that were running on this node, again because the storage seems offline. I also restarted pvedaemon/pve-cluster/pve-manager etc. on all nodes.
How can I recover from this error without big impact? I don't want to restart all nodes and find out I can't start any VM anymore. Also I don't want to restart the ZFS01 storage server, this will be too risky and could cause a big downtime of the VM's won't start after the reboot.
The updated node:
root@proxmox-cluster-03:~# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
Other nodes:
root@proxmox-cluster-01:~# pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-21 (running version: 3.1-21/93bf03d4)
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-6
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
Storage test:
root@proxmox-cluster-01:~# pveperf /mnt/pve/zfs01
CPU BOGOMIPS: 72529.68
REGEX/SECOND: 823399
HD SIZE: 11225.11 GB (10.1.1.91:/data/proxmox-cluster01)
FSYNCS/SECOND: 1101.79
DNS EXT: 67.63 ms
root@proxmox-cluster-01:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,rootdir
maxfiles 0
nfs: zfs01
path /mnt/pve/zfs01
server 10.1.1.91
export /data/proxmox-cluster01
options vers=3
content images,iso,vztmpl,rootdir,backup
maxfiles 1
root@proxmox-cluster-01:~# pvecm status
Version: 6.2.0
Config Version: 4
Cluster Name: cluster01
Cluster Id: 53601
Cluster Member: Yes
Cluster Generation: 26168
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Node votes: 1
Quorum: 3
Active subsystems: 1
Flags:
Ports Bound: 0
Node name: proxmox-cluster-01
Node ID: 1
Multicast addresses: 239.192.209.51
Node addresses: 10.1.1.101
May 5 12:38:57 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:08 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:17 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:27 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:37 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:47 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
The storage/ZFS server has not been changed, it all looks ok in the syslogs and the VM's are still running on it.
I have 4 nodes in a cluster with a NFS storage for all VM disks.
It works fine for over a year now, but suddenly I wasn't able to create a new VM because the storage was offline:
TASK ERROR: create failed - storage 'zfs01' is not online
I checked the storage in the web interface and it shows the storage is empty (0 disk space etc.).
When I use SSH to the proxmox nodes, I see the share is mounted and running normally. I can read/write and all VM's are still running.
I tried to migrate VM's away from 1 node, but it was not possible because proxmox thinks the storage is offline. So I decided to shutdown all VM's on 1 node. I updated the node and restarted it, but it wasn't working. The node doesn't even mount the storage on boot, I had to do this manually from commandline (which works). I'm not able to start the VM's that were running on this node, again because the storage seems offline. I also restarted pvedaemon/pve-cluster/pve-manager etc. on all nodes.
How can I recover from this error without big impact? I don't want to restart all nodes and find out I can't start any VM anymore. Also I don't want to restart the ZFS01 storage server, this will be too risky and could cause a big downtime of the VM's won't start after the reboot.
The updated node:
root@proxmox-cluster-03:~# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
Other nodes:
root@proxmox-cluster-01:~# pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-21 (running version: 3.1-21/93bf03d4)
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-15
pve-firmware: 1.1-2
libpve-common-perl: 3.0-14
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-6
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1
Storage test:
root@proxmox-cluster-01:~# pveperf /mnt/pve/zfs01
CPU BOGOMIPS: 72529.68
REGEX/SECOND: 823399
HD SIZE: 11225.11 GB (10.1.1.91:/data/proxmox-cluster01)
FSYNCS/SECOND: 1101.79
DNS EXT: 67.63 ms
root@proxmox-cluster-01:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,rootdir
maxfiles 0
nfs: zfs01
path /mnt/pve/zfs01
server 10.1.1.91
export /data/proxmox-cluster01
options vers=3
content images,iso,vztmpl,rootdir,backup
maxfiles 1
root@proxmox-cluster-01:~# pvecm status
Version: 6.2.0
Config Version: 4
Cluster Name: cluster01
Cluster Id: 53601
Cluster Member: Yes
Cluster Generation: 26168
Membership state: Cluster-Member
Nodes: 4
Expected votes: 4
Total votes: 4
Node votes: 1
Quorum: 3
Active subsystems: 1
Flags:
Ports Bound: 0
Node name: proxmox-cluster-01
Node ID: 1
Multicast addresses: 239.192.209.51
Node addresses: 10.1.1.101
May 5 12:38:57 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:08 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:17 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:27 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:37 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
May 5 12:39:47 proxmox-cluster-01 pvestatd[225507]: WARNING: storage 'zfs01' is not online
The storage/ZFS server has not been changed, it all looks ok in the syslogs and the VM's are still running on it.