TrueNAS Scale for shared storage issue

quartzeye

Member
Nov 29, 2019
10
0
6
64
So this is giving me headaches for a few days. I have TrueNAS Scale running in a VM with a dedicated nic and drives. My Proxmox host has a separate dedicated nic and drives. I am up to date on everything. I have set up datasets and shares in TrueNAS and can access the SMB shares with out issue from my Windows clients. The NFS shares are another issue.

When in a vm on the same host or another host, I can mount the nfs shares and have RW capabilities to the share. These are tested with vanilla Debian v11.3 vm's with nothing but the nfs client added. Simple "mount -vvv xxx.xxx.xxx.xxx:/mnt/data-01/ds-proxmox /opt/truenas" and no issues coping, editing, or deleting files/directories.

When I am on the Proxmox host or any other Proxmox host and try to the same thing, I can mount and list the files but cannot edit them and attempting to edit anything locks up the system where I have to reboot the host to clear it. Probably a NFS busy state and the session is dead.

I have tried many different permissions and NFS3/4 on the TrueNAS scale side but nothing works from any Proxmox host. This prevents me from created NFS storage, shared or not, on any Proxmox host where the NFS share is running on TrueNAS scale.

At this point I have to believe there is something going on with Promox related to accessing TrueNAS Scale NFS shares. as it is the only common denominator that does not work properly. Any help would be appreciated.
 
So this is giving me headaches for a few days. I have TrueNAS Scale running in a VM with a dedicated nic and drives. My Proxmox host has a separate dedicated nic and drives. I am up to date on everything. I have set up datasets and shares in TrueNAS and can access the SMB shares with out issue from my Windows clients. The NFS shares are another issue.

When in a vm on the same host or another host, I can mount the nfs shares and have RW capabilities to the share. These are tested with vanilla Debian v11.3 vm's with nothing but the nfs client added. Simple "mount -vvv xxx.xxx.xxx.xxx:/mnt/data-01/ds-proxmox /opt/truenas" and no issues coping, editing, or deleting files/directories.

When I am on the Proxmox host or any other Proxmox host and try to the same thing, I can mount and list the files but cannot edit them and attempting to edit anything locks up the system where I have to reboot the host to clear it. Probably a NFS busy state and the session is dead.

I have tried many different permissions and NFS3/4 on the TrueNAS scale side but nothing works from any Proxmox host. This prevents me from created NFS storage, shared or not, on any Proxmox host where the NFS share is running on TrueNAS scale.

At this point I have to believe there is something going on with Promox related to accessing TrueNAS Scale NFS shares. as it is the only common denominator that does not work properly. Any help would be appreciated.
Wow 60 views and not a singe reply. I guess I am out in the wilderness then.

So here is my setup

I have a single host v7.2-4. On it I have a single VM running TrueNAS Scale v22.02.2.

The VM has a dedicated NIC passed in via PCI Passthrough and all drives passed via iscsi plus no config on the host for the nic or ports. The host has a seperate NIC config'd with a bridge. The TrueNAS VM does not use the host NIC or drives.

TrueNAS is set up properly, pools, datasets, users, and shares all good. I have a single share to a dataset and the export file on TrueNAS shows RW and no_root_squash.

I created a vanilla Debian 11 VM and installed the NFS common package on the same host. As root, I was able to mount the share to the ds_proxmox dataset/share. In that VM running on the same host using the only bridge on the host, I have full access to the share and can cp, write, delete to my hearts content.

On the Proxmox host, from the commandline or GUI, I can mount the same ds_proxmox dataset/share from the TrueNAS VM. When using the GUI, it creates all the directories and sub-directories. At the commandline, I can create a file on the share by using touch. If I try to VI the file, it hangs up the entire mount point. In the GUI, if I try to upload an ISO, it uploads to the tmp directory and hangs when attempting to cp the file to the directory on the share. It does create a 0-byte file but that is it.

Why can I have full access to the share between VM's on the same host as the TrueNAS VM but I cannot fully access the share from the host it self? It shouldn't be a routing issue as the NIC in the TrueNAS VM is removed from the host OS via iommu and pci passthrough. The client VM's on the host, except for the TrueNAS VM, share the same bridge as the host.

I get if I was all using the same bridge that NFS could get lost but not with PCI passthrough of the TrueNAS nic. So how does someone mount an NFS share inside a VM back to the proxmox host file system. I cannot mount it locally then add it to Proxmox as a Directory and I cannot mount it as a NFS share.

The only thing I know that works is if I map the directory into a container running an NFS server. Then I can mount that share from the container back to the host as shared storage. I have that working but that is no substitute for something like TrueNAS.
 
Some things I'm not clear on are;

all drives passed via iscsi
what does this mean? are the drives on a different host?

Is the share mounted on the proxmox host in the storage config?
Is the TrueNAS VM in the same subnet as a) the proxmox host b) the Debian VM
Is there any mapping between users/groups on the different systems?
 
Some things I'm not clear on are;


what does this mean? are the drives on a different host?

Is the share mounted on the proxmox host in the storage config?
Is the TrueNAS VM in the same subnet as a) the proxmox host b) the Debian VM
Is there any mapping between users/groups on the different systems
First, the storage is irrelevant to this problem as this is a networking issue. However. I have each local drive pushed into the VM via iscsi in the HW config. I know I can pci passthrough the HBA but for now this is working and isn't the problem.

Selection_036.png

I have only one subnet for the host and VM's. No other subnets, bridges or VLANs. I am using the root group and uid in the host, and the vm uid=0 gid=0.

Selection_038.png

The host is connecting as root. The test VM is connecting as root. TrueNAS has root as the dataset owner and share set up for root access and is RW with no_root_squash. Both VM's are on the same host. The test VM can mount the the TrueNAS share and access the share with out issue. The host can mount the TrueNAS share, create directories and 0-byte file on the same share. The host cannot edit the files or cp data on the TrueNAS share.

Selection_034.png

TrueNAS is configured correctly. There is something wrong on the host that is preventing it from writing to file on the share even though it is RW. I don't think it is a routing issue, is it possible there is an app armor issue blocking this?

Selection_037.png
 

Attachments

  • Selection_036.png
    Selection_036.png
    286.4 KB · Views: 3
When attempting to vi a file on the share I see the following in the journalctl logs

Jul 04 12:05:35 r820-01 systemd[1]: mnt-pve-nfs820\x2d01\x2dproxmox.mount: Succeeded.
░░ Subject: Unit succeeded
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░
░░ The unit mnt-pve-nfs820\x2d01\x2dproxmox.mount has successfully entered the 'dead' state.
Jul 04 12:08:08 r820-01 pvedaemon[2323]: <root@pam> successful auth for user 'root@pam'
Jul 04 12:09:56 r820-01 pvestatd[2303]: got timeout

then later

root@r820-01:~# journalctl -xe
Jul 04 12:16:24 r820-01 kernel: nfs_wb_all+0x27/0xf0 [nfs]
Jul 04 12:16:24 r820-01 kernel: nfs_setattr+0x1d7/0x1f0 [nfs]
Jul 04 12:16:24 r820-01 kernel: notify_change+0x347/0x4d0
Jul 04 12:16:24 r820-01 kernel: chmod_common+0xc4/0x180
Jul 04 12:16:24 r820-01 kernel: ? chmod_common+0xc4/0x180
Jul 04 12:16:24 r820-01 kernel: do_fchmodat+0x62/0xb0
Jul 04 12:16:24 r820-01 kernel: __x64_sys_chmod+0x1b/0x20
Jul 04 12:16:24 r820-01 kernel: do_syscall_64+0x5c/0xc0
Jul 04 12:16:24 r820-01 kernel: ? kern_select+0xf1/0x180
Jul 04 12:16:24 r820-01 kernel: ? handle_mm_fault+0xd8/0x2c0
Jul 04 12:16:24 r820-01 kernel: ? exit_to_user_mode_prepare+0x37/0x1b0
Jul 04 12:16:24 r820-01 kernel: ? syscall_exit_to_user_mode+0x27/0x50
Jul 04 12:16:24 r820-01 kernel: ? __x64_sys_select+0x25/0x30
Jul 04 12:16:24 r820-01 kernel: ? do_syscall_64+0x69/0xc0
Jul 04 12:16:24 r820-01 kernel: ? irqentry_exit+0x19/0x30
Jul 04 12:16:24 r820-01 kernel: ? exc_page_fault+0x89/0x160
Jul 04 12:16:24 r820-01 kernel: ? asm_exc_page_fault+0x8/0x30
Jul 04 12:16:24 r820-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae
Jul 04 12:16:24 r820-01 kernel: RIP: 0033:0x7f6aa7452a17
Jul 04 12:16:24 r820-01 kernel: RSP: 002b:00007fff68d26d68 EFLAGS: 00000202 ORIG_RAX: 000000000000005a
Jul 04 12:16:24 r820-01 kernel: RAX: ffffffffffffffda RBX: 00000000000001a4 RCX: 00007f6aa7452a17
Jul 04 12:16:24 r820-01 kernel: RDX: 000055fcbdd87e60 RSI: 00000000000001a4 RDI: 000055fcbe4ef1c0
Jul 04 12:16:24 r820-01 kernel: RBP: 0000000000000001 R08: 00007fff68d26c70 R09: 00007f6aa7522be0
Jul 04 12:16:24 r820-01 kernel: R10: 000055fcbdd87de0 R11: 0000000000000202 R12: 0000000000000001
Jul 04 12:16:24 r820-01 kernel: R13: 000055fcbe4be460 R14: 000055fcbe4c3880 R15: 000055fcbe4ef1c0
Jul 04 12:16:24 r820-01 kernel: </TASK>
Jul 04 12:17:01 r820-01 CRON[506847]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
Jul 04 12:17:01 r820-01 CRON[506848]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Jul 04 12:17:01 r820-01 CRON[506847]: pam_unix(cron:session): session closed for user root
Jul 04 12:18:25 r820-01 kernel: INFO: task vi:505749 blocked for more than 483 seconds.
Jul 04 12:18:25 r820-01 kernel: Tainted: P O 5.15.35-2-pve #1
Jul 04 12:18:25 r820-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Jul 04 12:18:25 r820-01 kernel: task:vi state:D stack: 0 pid:505749 ppid:504811 flags:0x00000000
Jul 04 12:18:25 r820-01 kernel: Call Trace:
Jul 04 12:18:25 r820-01 kernel: <TASK>
Jul 04 12:18:25 r820-01 kernel: __schedule+0x33d/0x1750


There is something going on in the kernel that isn't quite right. I did a Debian 11 install per the wiki and converted it to Proxmox. This was necessary as I am running the OS from a custom partition with a boot drive usb and the rest of the install on a NVME drive on a PCI adapter and the Proxmox installer doesn't support that but the Proxmox install config is supported. So lets not pick apart the infrastructure and keep this focused on the problem at hand in the OS.

Also, I created a nested Proxmox install on the same host using the Proxmox installer, again per the wiki, and it did not have any issues mounting and accessing the TrueNAS share in that nested Proxmox install.
 
Last edited:
Solved

Well not to my satisfaction but solved none the less.

Since I have a full 10G network, I took my MTU up to 9000 in order to utilize jumbo frames. This worked for pretty much everthing but NFS. And it only hangs up NFS when it actually is trying to affect data by editing a file with VI or copying a file across the mount point. Why. I don't know but I suspect someone with deep networking knowledge could figure it out. For me, I guess I will live with the performance hit and throttle mtu back to 1500 in order to at least have NFS work.

I certainly would welcome an explanation as to why NFS can't move the jumbo packets or how to figure out where in the network the chock point actually resides.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!