Good morning, forum
I have two separate nodes at home, both running 8.3.2. All my VMs were running storage on local NAS via NFS shares attached within ProxMox UI. This setup has been working for close to 2 years now, with no hiccups. After upgrading to 8.3.2, NFS shares dropped dead on me yesterday morning, sending nodes into a reboot frenzy. NAS is working fine, nothing changed in LAN connectivity, and I can attach the shares via NFS on other machines without any issues.
Nodes were rebuilt, since I suspected issues with the update itself. I re-installed 8.2.2 version (that was the ISO I had locally), re-added NFS shares, and ran the upgrade.
Today in the morning, one of the nodes (node 2) again lost NFS share access. Node 1 for now is fine. Before losing access, on Node 2 I noticed status of all VMs, node, storage, etc. change to unknown, i.e., individual elements had gray question mark instead of the expected green check.
and here is the storage configuration
Listing of the VM images on node1 works fine, it does not produce anything on node 2
Any attempt to mount NFS shares on node2 just hangs
root@mox2:~# mount -a
with no result or error code. I did find the same kind of mounting error in boot logs as well.
At this time, I am not sure what to do next here. Re-installing again is a pain in the neck, and there is zero guarantee it will do anything good. On the other hand, running old code version (8.2.x) is not something I'd fancy for a long term support as well.
I have two separate nodes at home, both running 8.3.2. All my VMs were running storage on local NAS via NFS shares attached within ProxMox UI. This setup has been working for close to 2 years now, with no hiccups. After upgrading to 8.3.2, NFS shares dropped dead on me yesterday morning, sending nodes into a reboot frenzy. NAS is working fine, nothing changed in LAN connectivity, and I can attach the shares via NFS on other machines without any issues.
Nodes were rebuilt, since I suspected issues with the update itself. I re-installed 8.2.2 version (that was the ISO I had locally), re-added NFS shares, and ran the upgrade.
Today in the morning, one of the nodes (node 2) again lost NFS share access. Node 1 for now is fine. Before losing access, on Node 2 I noticed status of all VMs, node, storage, etc. change to unknown, i.e., individual elements had gray question mark instead of the expected green check.
Code:
root@mox2:~# pveversion -v
proxmox-ve: 8.3.0 (running kernel: 6.8.12-6-pve)
pve-manager: 8.3.2 (running version: 8.3.2/3e76eec21c4a14a7)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.12-6
proxmox-kernel-6.8.12-6-pve-signed: 6.8.12-6
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph-fuse: 17.2.7-pve3
corosync: 3.1.7-pve3
criu: 3.17.1-2+deb12u1
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx11
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-5
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.4
libpve-access-control: 8.2.0
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.10
libpve-cluster-perl: 8.0.10
libpve-common-perl: 8.2.9
libpve-guest-common-perl: 5.1.6
libpve-http-server-perl: 5.1.2
libpve-network-perl: 0.10.0
libpve-rs-perl: 0.9.1
libpve-storage-perl: 8.3.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.5.0-1
proxmox-backup-client: 3.3.2-1
proxmox-backup-file-restore: 3.3.2-2
proxmox-firewall: 0.6.0
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.3.1
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.7
proxmox-widget-toolkit: 4.3.3
pve-cluster: 8.0.10
pve-container: 5.2.3
pve-docs: 8.3.1
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.2
pve-firewall: 5.1.0
pve-firmware: 3.14-2
pve-ha-manager: 4.0.6
pve-i18n: 3.3.2
pve-qemu-kvm: 9.0.2-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.3.3
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.6-pve1
and here is the storage configuration
Code:
root@mox2:~# cat /etc/pve/storage.cfg
dir: local
path /var/lib/vz
content vztmpl,backup,iso
lvmthin: local-lvm
thinpool data
vgname pve
content images,rootdir
dir: vm-qcow2
path /mnt/data/vm-qcow2
content images
preallocation off
prune-backups keep-all=1
shared 0
dir: trunas-iso
path /mnt/nas-iso
content iso
prune-backups keep-all=1
shared 0
dir: trunas-vm
path /mnt/nas-vm
content images
preallocation off
prune-backups keep-all=1
shared 0
dir: trunas-snapshot
path /mnt/nas-snapshot
content snippets
prune-backups keep-all=1
shared 0
Listing of the VM images on node1 works fine, it does not produce anything on node 2
Code:
root@mox1:~# ls -lah /mnt/nas-vm/images/
total 37K
drwxr-xr-x 25 root root 25 Jan 11 15:02 .
drwxr-xr-x 3 root root 3 May 4 2024 ..
drwxr----- 2 root root 4 May 4 2024 100
drwxr----- 2 root root 5 Nov 28 11:49 1000
drwxr----- 2 root root 4 May 31 2024 1005
drwxr----- 2 root root 5 May 5 2024 101
drwxr----- 2 root root 4 May 4 2024 102
drwxr----- 2 root root 4 May 4 2024 103
drwxr----- 2 root root 4 May 4 2024 104
drwxr----- 2 root root 4 May 4 2024 105
drwxr----- 2 root root 2 Nov 2 14:10 106
drwxr----- 2 root root 4 May 4 2024 107
drwxr----- 2 root root 4 May 4 2024 108
drwxr----- 2 root root 4 May 30 2024 109
drwxr----- 2 root root 4 May 31 2024 110
drwxr----- 2 root root 5 Nov 20 11:37 111
drwxr----- 2 root root 4 May 4 2024 1111
drwxr----- 2 root root 4 May 30 2024 113
drwxr----- 2 root root 4 May 29 2024 114
drwxr----- 2 root root 4 Nov 6 19:58 117
drwxr----- 2 root root 4 Sep 1 08:32 118
drwxr----- 2 root root 3 Nov 2 09:17 119
drwxr----- 2 root root 2 Nov 2 17:32 120
drwxr----- 2 root root 3 Jan 14 15:09 127
drwxr----- 2 root root 2 Sep 14 17:18 201
Code:
root@mox2:~# ls -lah /mnt/nas-vm/images/
total 8.0K
drwxr-xr-x 2 root root 4.0K Jan 17 06:50 .
drwxr-xr-x 3 root root 4.0K Jan 17 06:50 ..
Any attempt to mount NFS shares on node2 just hangs
root@mox2:~# mount -a
with no result or error code. I did find the same kind of mounting error in boot logs as well.
At this time, I am not sure what to do next here. Re-installing again is a pain in the neck, and there is zero guarantee it will do anything good. On the other hand, running old code version (8.2.x) is not something I'd fancy for a long term support as well.