Connect and ACCESS NFS and ISCSI shares

listhor

Member
Nov 14, 2023
34
1
8
I'm not able to configure Proxmox correctly to access any iSCSI share (Truenas and Synology); it connects to server but doesn't access/see share. Esxi (same server - same IP, only booted by esxi) connects to Truenas without any problem

On top of that it is the same case with NFS shares in Truenas but it access correctly NFS shares in Synology. All NFS shares are set to use version 4 or 4.1.
How to troubleshoot it?
 
I'm not able to configure Proxmox correctly to access any iSCSI share (Truenas and Synology); it connects to server but doesn't access/see share. Esxi (same server - same IP, only booted by esxi) connects to Truenas without any problem

On top of that it is the same case with NFS shares in Truenas but it access correctly NFS shares in Synology. All NFS shares are set to use version 4 or 4.1.
How to troubleshoot it?
Did you properly update ACL lists for each protocol to allow new server/initiator/client to connect? This would be done on your respective NAS.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
Did you properly update ACL lists for each protocol to allow new server/initiator/client to connect? This would be done on your respective NAS.
When it comes to NFS - server uses same IP address (either booted to esxi or pve). Finally I managed to get NFS working but only using version 4.
For iscsi the only access control was initiator name. I removed that condition and still is the same:
Zrzut ekranu 2023-11-16 o 09.33.30.png

EDIT:
And Truenas log is full of following:
Code:
Nov 16 09:35:41 freenas 1 2023-11-16T09:35:41.438893+01:00 xxx.com ctld 1049 - - child process 15177 terminated with exit status 1
Nov 16 09:35:42 freenas 1 2023-11-16T09:35:42.535343+01:00 xxx.com ctld 15178 - - 10.55.0.1: read: connection lost
Nov 16 09:35:42 freenas 1 2023-11-16T09:35:42.535636+01:00 xxx.com ctld 1049 - - child process 15178 terminated with exit status 1
Nov 16 09:35:44 freenas 1 2023-11-16T09:35:44.053828+01:00 xxx.com ctld 15179 - - 10.55.0.1: read: connection lost
Nov 16 09:35:44 freenas 1 2023-11-16T09:35:44.054162+01:00 xxx.com ctld 1049 - - child process 15179 terminated with exit status 1
But I think it is a known issue???
 
Last edited:
But I think it is a known issue???
may be? certainly lots of chatter about it on freenas forums. The messages could be an artifact of health check probes, or genuine network issue - impossible to say without proper troubleshooting. However, freenas community is better equipped with debugging log messages on freenas.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
I managed (no_root_squash was missing) to connect Proxmox to NFS share over internal bridge (Truenas' boot-pool is virtualized). Seemingly it also works without any mapping as well - just checked that.
But If I reboot Truenas or interrupt communication pve is not able to re-establish storage (displays error 500) - mounting directory seems to be broken (all its attributes displayed as "?"). I need to disable storage, unmount share manually and re-enable storage.

I can connect over regular LAN only briefly for a split second, when all information about storage is displayed and then disappears - and again error 500. It's not a network issue as I'm able to connect to Synology share - what is strange as there's no users mapping there at all. If I remove mapping in Truenas then connection lasts 5 seconds until information is gone
rpcinfo -p <IP> displays the same on both ends; showmount -e displays correct exports.

What are detailed conditions/requirements for PVE to connect NFS and iSCSI shares? I can't find them in docs....
 
connect Proxmox to NFS share over internal bridge (Truenas' boot-pool is virtualized).
can you provide an explanation what this means in the context of your setup?
But If I reboot Truenas or interrupt communication pve is not able to re-establish storage (displays error 500) - mounting directory seems to be broken (all its attributes displayed as "?"). I need to disable storage, unmount share manually and re-enable storage.
is Truenas an isolated external appliance?
I can connect over regular LAN only briefly for a split second, when all information about storage is displayed and then disappears - and again error 500.
Can you use standard Linux tools to mount your NFS share, ie "mount truenas:/export /mnt/test" - does this work and is it stable?
It's not a network issue as I'm able to connect to Synology share - what is strange as there's no users mapping there at all. If I remove mapping in Truenas then connection lasts 5 seconds until information is gone
what does this mean? As far as PVE NFS access is concerned, the NFS is executed by "root" user on PVE, there are no user impersonations or other users involved.

What are detailed conditions/requirements for PVE to connect NFS and iSCSI shares? I can't find them in docs....
PVE uses standard Linux tools to connect to NFS and/or iSCSI. Technically all you need is an industry standard implementation of NFS server code and/or of iSCSI target. And, of course, stable network.

Here is a sample of NFS mount options for most basic default NFS storage defined in PVE:
bbnas:/mnt/data/testing on /mnt/pve/bbnas type nfs (rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.100.20,mountvers=3,mountport=911,mountproto=udp,local_lock=none,addr=172.16.100.20)


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
can you provide an explanation what this means in the context of your setup?
PVE --- vmbr0 (mtu 9000) ------------------------- net0-----------------------TrueNAS (boot-pool virtualized)
|___ vmbr1 (mtu 1500, trunk)------switch---vlan11 on lagg4095 (igb0:igb1 (passthrough)) ___|

is Truenas an isolated external appliance?
I'm not sure what you mean by isolated but I think above "drawing" explains it.

Can you use standard Linux tools to mount your NFS share, ie "mount truenas:/export /mnt/test" - does this work and is it stable?
Yes, I did it and it works hassles free.

what does this mean? As far as PVE NFS access is concerned, the NFS is executed by "root" user on PVE, there are no user impersonations or other users involved.
I meant either maproot or mapall settings on server side. I've read also on this forum that PVE requires no_root_squash but from my experience it doesn't look like that...

PVE uses standard Linux tools to connect to NFS and/or iSCSI. Technically all you need is an industry standard implementation of NFS server code and/or of iSCSI target. And, of course, stable network.
Good to hear it. But example given doesn't explain why pve can't reconnect storage (error 500 due to already occupied/busy mounting directory) and why the same thing happens in case of LAN connection just right after connection is established?


Does similar industry standard apply to iSCSI connections?
 
I'm not sure what you mean by isolated but I think above "drawing" explains it.
The "drawing" is not as self-explanatory as it may seem to a person who is intimately involved with the config on a daily basis.
What it tells me is that you have a mix of MTU sizes, that if not implemented properly _will_ introduce unpredictable and random failures.
In fact, its a good match to your symptoms - initial negotiation works, then larger packets/checks fail.

Does similar industry standard apply to iSCSI connections?
yes.

My recommendation is to reduce your network complexity.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
The "drawing" is not as self-explanatory as it may seem to a person who is intimately involved with the config on a daily basis.
What it tells me is that you have a mix of MTU sizes, that if not implemented properly _will_ introduce unpredictable and random failures.
In fact, its a good match to your symptoms - initial negotiation works, then larger packets/checks fail.
Jumbo frames are only within "internal" vmbr0 bridge (between pve and virtualized truenas). Regular, physical LAN works on MTU 1500. Exactly same setup worked flawlessly with esxi (vswitch connected to hypervisor and truenas).
As Synology NFS share works ok and previously Truenas has been working fine with esxi (plus manual mounting in pve works ok) - it looks like there's something not right in pve storage management layer (?)
 
Jumbo frames are only within "internal" vmbr0 bridge (between pve and virtualized truenas).
I would need to see a comprehensive network diagram, including all IPs, subnets, networks, routes. Is it possible that the traffic is being routed in a different way than you think it is?
Regular, physical LAN works on MTU 1500.
Good supporting point to move everything to regular MTU and start from there.
Exactly same setup worked flawlessly with esxi (vswitch connected to hypervisor and truenas).
Thats an "apples and oranges" comparison. Although at 10000 feet the concepts are similar, the internal implementations of network layers are completely different.
(plus manual mounting in pve works ok
Collect a network trace of full NFS negotiation for both working and non-working case. Compare them side by side - are there any differences?
it looks like there's something not right in pve storage management layer (?)
PVE is a set of packages that wrap API/GUI/CLI around several Open Source technologies (Linux, QEMU, Corosync, etc). PVE does not re-implement anything RFC protocol related. Hundreds of millions of PCs are running Linux/Qemu/Corosync right now without trouble.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
So, following is general layout of my network

1700837837755.png



Content of: cat /proc/mounts | grep nfs

PVE
Code:
172.16.1.10:/volume3/NFS/pvess /mnt/pve/mmds-nfs nfs4 rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.0.8,local_lock=none,addr=172.16.1.10 0 0
10.55.1.2:/mnt/wszystko/PVE/pvess/nfs-pvess /mnt/test nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.55.1.2,mountvers=3,mountport=43296,mountproto=udp,local_lock=none,addr=10.55.1.2 0 0
172.16.1.62:/mnt/wszystko/PVE/pvess/nfs-pvess /mnt/test nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.1.62,mountvers=3,mountport=57230,mountproto=udp,local_lock=none,addr=172.16.1.62 0 0
10.55.1.2:/mnt/wszystko/PVE/pvess/nfs-pvess /mnt/test nfs rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.55.1.2,mountvers=3,mountport=43296,mountproto=udp,local_lock=none,addr=10.55.1.2 0 0
10.55.1.2:/mnt/wszystko/PVE/pvess/nfs-pvess /mnt/pve/truenas-nfs nfs4 rw,relatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.55.0.1,local_lock=none,addr=10.55.1.2 0 0
PVE2
Code:
172.16.1.10:/volume3/NFS/pvett /mnt/pve/mmds_nfs nfs4 rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.0.11,local_lock=none,addr=172.16.1.10 0 0
172.16.1.62:/mnt/wszystko/PVE/pvett/nfs-pvett /mnt/pve/truenas_nfs nfs rw,relatime,vers=3,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=172.16.1.62,mountvers=3,mountport=822,mountproto=udp,local_lock=none,addr=172.16.1.62 0 0
And same on Ubuntu:
Code:
10.55.1.2:/mnt/wszystko/Pliki/e-book /mnt/nfs/truenas/ebooki nfs rw,noatime,vers=3,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,nolock,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.55.1.2,mountvers=3,mountport=822,mountproto=udp,local_lock=all,addr=10.55.1.2 0 0
10.55.1.2:/mnt/wszystko/Multimedia /mnt/nfs/truenas/media nfs4 rw,noatime,vers=4.2,rsize=131072,wsize=131072,namlen=255,acregmin=1800,acregmax=1800,acdirmin=1800,acdirmax=1800,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.55.1.3,local_lock=none,addr=10.55.1.2 0 0
172.16.1.10:/volume3/NFS/Subiekt /mnt/nfs/mmds/Subiekt nfs4 rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.1.20,local_lock=none,addr=172.16.1.10 0 0
172.16.1.10:/volume3/NFS/mailcow /mnt/nfs/mmds/mailcow nfs4 rw,relatime,vers=4.1,rsize=131072,wsize=131072,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=172.16.1.20,local_lock=none,addr=172.16.1.10 0 0

Second PVE shows exactly same symptoms as main PVE (when it comes to LAN / hardware network connection).
Internal Truenas NFS share (over vmbr0-mtu9000) in PVE is fine untill connection is broken and once is healthy again can't be restored without manual intervention.
Ubuntu server has rock solid NFS mounts of both Truenas and Synology shares. That's why I think also shares and network are OK.
 
So, following is general layout of my network
Nothing jumps out as immediately suspect. Except, of course, this is quite complex for volunteer forum troubleshooting.

My advice - start getting network traces, make sure you get both sides of the communication, compare and contrast them. Try to reduce complexity and it back gradually.

good luck

PS you can also try to switch the NFS in PVE from being handled by PVE to a direct mount, ie like your Ubuntu and see if that makes a difference.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!