Recurrent network storage unavailability/poor performance - TrueNAS

danb35 · Oct 8, 2021

I'm having a recurring problem that NFS mounts on my TrueNAS server go "offline" in my PVE cluster, and/or remain online but show very poor performance. Trying to track down why it's happening, and what I can do to to address it.

My TrueNAS server is running TrueNAS CORE 12.0-U5.1. It has 2x Xeon E5-2670s, 128 GB of RAM, and two storage pools. The first storage pool consists of 4x 6-disk RAIDZ2 vdevs (24 disks total) of varying sizes, and is a little over half full. This pool contains my jails, a few SMB shares, and a couple of NFS exports. The second pool consists of 4x 2TB disks in mirrored pairs. It has one NFS export and one iSCSI target. Other client systems, primarily via SMB, don't seem to have any performance issues with the server.

My PVE cluster is running the latest update of PVE 7. It consists of three nodes of a Dell PowerEdge C6220, each with 2x Xeon E5-2680v2 and ~80 GB of RAM. They're connected to each other, and to the TrueNAS box, via 10 GbE. NFS exports from both pools are mounted to the cluster as storage--the first for ISOs, container templates, and backups (the latter not being used much since I started using PBS), and the second pool for a few low-activity VM disk images (most virtual disks are stored on a Ceph pool, about which I have no complaints).

Frequently, though not constantly, the cluster reports both storages to be unavailable. ls /mnt/pve hangs, and any tasks that involve either of those mounts fail. But there's nothing obviously wrong on the TrueNAS box--there isn't a great deal of I/O latency, there's plenty of CPU capacity, ARC hit ratio is fine. A little stumped about how to track this down--any ideas?

Dunuin · Oct 9, 2021

There is an TrueNAS CORE 12.0-U6 update that fixed a bug "NFSv4 mount does not recover after failover".

danb35 · Oct 9, 2021

Hmmm. I know it's happened with previous versions as well, but it's worth a try. Updated the TrueNAS server to 12.0-U6, and rebooted each of the cluster members. Let's see what happens.

rgbiernat · Nov 13, 2021

danb35 said:
show very poor performance.

I can confirm that since a few versions back the Proxmox VMs are horribly slow. Storage is not even half full.

Proxmox 7.0.13
TrueNAS 12.0-U6

No hang on /mnt/pve but "storage not online". Not persistent but once while trying to start a VM. Never had that before.

Another culprit could be my UniFi Switch equipment. Do you by chance also have Ubiquiti switches?

danb35 · Nov 13, 2021

I should have updated this thread earlier, but was reluctant to call it "solved" without a good bit of experience. Since updating TrueNAS to -U6, I haven't seen the NAS be marked "offline" for any of my PVE hosts--I don't watch them constantly, of course, but I haven't seen it. Performance has been acceptable since then, but the only thing I use the NFS on the TrueNAS box for are ISOs, a backup of my PBS VM (I guess it makes sense that I can't back it up to itself), and a very few low-utilization VM disks. I use iSCSI to provide the storage disk for the PBS VM, and that works just fine. Other than that, my VMs live on Ceph. So as far as I can see, the problem appears solved--even if I'm a little uneasy calling it that, as it had been intermittently present with earlier TrueNAS/FreeNAS releases as well.

I have a small Ubiquiti switch, but it isn't in the path from the TrueNAS box to the PVE hosts.

Search

Search

Recurrent network storage unavailability/poor performance - TrueNAS

danb35

Renowned Member

Dunuin

Distinguished Member

danb35

Renowned Member

rgbiernat

Member

danb35

Renowned Member