Hello experts,
Background: After receiving generous help, I now have 2 templates successfully built on local storage of a cluster. VM cloning with my cloud-init customizations works great on it as well. I've setup backup tasks on local disks. It works fine as well.
Issue: I'm trying to perform template/VM/container backups over NFS share. NFS share is detected and mounted fine. But, during back, nfs doesn't work.
Homework: I've combed through forum threads, reddit, google etc with similar background and issues. I've tried various recipes but to no avail. Please note, NFS share is on ZFS filesystem and is exposed as an NFS share. All ACL permissions have been turned off. It can be read, and written by any user from any where on trusted LAN addresses. Also, it is not a firewall issue. Below is log from an ubuntu VM (on a different host) to the same NFS server.
System: pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-3-pve), cluster, no CEPH, local + local-lvm on SSDs running on Intel i11/i14 NUCs
journalctl log:
storage.cfg:
nfs mount info:
making the nfs options same as another VM (where nfs mount works - please see Homework section above) results in the same issue of hung thread dumps.
pvesm status:
The cluster nodes seemed to be hung up on this inaccesible nfs share. To prevent sluggish behavior of the nodes, I had to issue
Help please.
Background: After receiving generous help, I now have 2 templates successfully built on local storage of a cluster. VM cloning with my cloud-init customizations works great on it as well. I've setup backup tasks on local disks. It works fine as well.
Issue: I'm trying to perform template/VM/container backups over NFS share. NFS share is detected and mounted fine. But, during back, nfs doesn't work.
Homework: I've combed through forum threads, reddit, google etc with similar background and issues. I've tried various recipes but to no avail. Please note, NFS share is on ZFS filesystem and is exposed as an NFS share. All ACL permissions have been turned off. It can be read, and written by any user from any where on trusted LAN addresses. Also, it is not a firewall issue. Below is log from an ubuntu VM (on a different host) to the same NFS server.
Code:
showmount -e <fqdn>
Export list for <fqdn>:
/mnt/z_store/nfs_pve (everyone)
mount | grep nfs
<fqdn>:/mnt/z_store/home on /mnt/home type nfs4 (rw,noatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.9,local_lock=none,addr=192.168.121.7)
<fqdn>:/mnt/z_store/logs on /mnt/logs type nfs4 (rw,noatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.9,local_lock=none,addr=192.168.121.7)
<fqdn>:/mnt/z_store/images on /mnt/conf_files type nfs4 (rw,noatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,soft,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=192.168.100.9,local_lock=none,addr=192.168.121.7)
System: pve-manager/8.2.4/faa83925c9641325 (running kernel: 6.8.8-3-pve), cluster, no CEPH, local + local-lvm on SSDs running on Intel i11/i14 NUCs
journalctl log:
Code:
Jul 30 18:49:32 i11-nuc pvedaemon[7306]: INFO: Starting Backup of VM 9001 (qemu)
Jul 30 18:50:19 i11-nuc pveproxy[1091]: worker exit
Jul 30 18:50:19 i11-nuc pveproxy[1088]: worker 1091 finished
Jul 30 18:50:19 i11-nuc pveproxy[1088]: starting 1 worker(s)
Jul 30 18:50:19 i11-nuc pveproxy[1088]: worker 7446 started
Jul 30 18:52:41 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:52:41 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:52:41 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:52:41 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:53:13 i11-nuc kernel: INFO: task zstd:7330 blocked for more than 122 seconds.
Jul 30 18:53:13 i11-nuc kernel: Tainted: P O 6.8.8-4-pve #1
Jul 30 18:53:13 i11-nuc kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 30 18:53:13 i11-nuc kernel: task:zstd state:D stack:0 pid:7330 tgid:7330 ppid:7326 flags:0x00004002
Jul 30 18:53:13 i11-nuc kernel: Call Trace:
Jul 30 18:53:13 i11-nuc kernel: <TASK>
Jul 30 18:53:13 i11-nuc kernel: __schedule+0x401/0x15e0
Jul 30 18:53:13 i11-nuc kernel: ? nfs_pageio_complete+0xee/0x140 [nfs]
Jul 30 18:53:13 i11-nuc kernel: schedule+0x33/0x110
Jul 30 18:53:13 i11-nuc kernel: io_schedule+0x46/0x80
Jul 30 18:53:13 i11-nuc kernel: folio_wait_bit_common+0x136/0x330
Jul 30 18:53:13 i11-nuc kernel: ? __pfx_wake_page_function+0x10/0x10
Jul 30 18:53:13 i11-nuc kernel: folio_wait_bit+0x18/0x30
Jul 30 18:53:13 i11-nuc kernel: folio_wait_writeback+0x2b/0xa0
Jul 30 18:53:13 i11-nuc kernel: __filemap_fdatawait_range+0x90/0x100
Jul 30 18:53:13 i11-nuc kernel: filemap_write_and_wait_range+0x94/0xc0
Jul 30 18:53:13 i11-nuc kernel: nfs_wb_all+0x27/0x130 [nfs]
Jul 30 18:53:13 i11-nuc kernel: nfs4_file_flush+0x7e/0xe0 [nfsv4]
Jul 30 18:53:13 i11-nuc kernel: filp_flush+0x35/0x90
Jul 30 18:53:13 i11-nuc kernel: __x64_sys_close+0x34/0x90
Jul 30 18:53:13 i11-nuc kernel: x64_sys_call+0x1a20/0x24b0
Jul 30 18:53:13 i11-nuc kernel: do_syscall_64+0x81/0x170
Jul 30 18:53:13 i11-nuc kernel: ? clear_bhb_loop+0x15/0x70
Jul 30 18:53:13 i11-nuc kernel: ? clear_bhb_loop+0x15/0x70
Jul 30 18:53:13 i11-nuc kernel: ? clear_bhb_loop+0x15/0x70
Jul 30 18:53:13 i11-nuc kernel: entry_SYSCALL_64_after_hwframe+0x78/0x80
Jul 30 18:53:13 i11-nuc kernel: RIP: 0033:0x7a2f9fc25d57
Jul 30 18:53:13 i11-nuc kernel: RSP: 002b:00007ffe4159cd68 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Jul 30 18:53:13 i11-nuc kernel: RAX: ffffffffffffffda RBX: 00007a2f9fcfc760 RCX: 00007a2f9fc25d57
Jul 30 18:53:13 i11-nuc kernel: RDX: 00007a2f9fcf79e0 RSI: 00007a2f88000b70 RDI: 0000000000000001
Jul 30 18:53:13 i11-nuc kernel: RBP: 00007a2f9fcf85e0 R08: 0000000000000000 R09: 0000000000000000
Jul 30 18:53:13 i11-nuc kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Jul 30 18:53:13 i11-nuc kernel: R13: 0000000000000002 R14: 00005a6a414b7018 R15: 000000007935e800
Jul 30 18:53:13 i11-nuc kernel: </TASK>
Jul 30 18:53:39 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:53:39 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:53:39 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:53:39 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:53:39 i11-nuc kernel: nfs: server <fqdn> not responding, still trying
Jul 30 18:54:45 i11-nuc pvedaemon[973]: got timeout
Jul 30 18:54:45 i11-nuc pvedaemon[973]: unable to activate storage 'zfs_nfs_pve' - directory '/mnt/pve/zfs_nfs_pve' does not exist or is unreachable
storage.cfg:
Code:
nfs: zfs_nfs_pve
export /mnt/z_store/nfs_pve
path /mnt/pve/zfs_nfs_pve
server <fqdn>
content iso,images,rootdir,vztmpl,backup,snippets
preallocation metadata
prune-backups keep-daily=1,keep-monthly=6,keep-weekly=4,keep-yearly=1
nfs mount info:
Code:
findmnt /mnt/pve/zfs_nfs_pve
TARGET SOURCE FSTYPE OPTIONS
/mnt/pve/zfs_nfs_pve <fqdn>:/mnt/z_store/nfs_pve nfs4 rw,relatime,vers=4
.2,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,cli
entaddr=<ip1>,local_lock=none,addr=<ip2>
making the nfs options same as another VM (where nfs mount works - please see Homework section above) results in the same issue of hung thread dumps.
pvesm status:
Code:
pvesm status
Name Type Status Total Used Available %
local dir active 98497780 5522676 87925556 5.61%
local-lvm lvmthin active 492216320 0 492216320 0.00%
zfs_nfs_pve nfs active 33830317824 384 33830317440 0.00%
The cluster nodes seemed to be hung up on this inaccesible nfs share. To prevent sluggish behavior of the nodes, I had to issue
Code:
pvesm set zfs_nfs_pve --disable
Help please.
Last edited: