I'm seeing a very odd and difficult to troubleshoot disk share performance issue in case anyone has thoughts.
I have a 4 node PVE cluster, server by a few nodes of FreeNAS with NFS shares. One of the FreeNAS boxes is new, with SSD storage, and as part of bringing it online, I have rearranged the cluser network connections on my Unifi XG switch. Each node has a 10G connection on VLAN100 (192.168.100.0/24) for data and a 10G on VLAN101 (192.168.101.0/24) for SAN/FreeNAS connections. All nodes use the same NFS share for Image files, but 1 node is seeing extremely slow disk access to the share. Iperf3 tests to the share come back at 10G speeds.
I've tried using different ports on the switch, checking that the interfaces are really connected to the right networks, I'm at a complete loss as to what to do. It has to be related to the change (and I can't just 'put it back', I moved enough ports I don't know which ones it was on previous, but the other 3 nodes are fine, and there are no new ports in the configuration, I simply set it up so that the nodes all were in one colum on the switch, with VLAN100 on top for every node and VLAN101 on bottom). I've tried moving the node from SPF+ to native copper ports on the switch with no apparent effect, and I've rebooted the Node multiple times.
Here is some data.
Iperf3 results on FreeNAS share from Node 1 (which is having the slow reads).
Disk performance test from VM running on Node 1
similar test on Node 2
Some stats from Node 1
Same stats from Node 2
I have a 4 node PVE cluster, server by a few nodes of FreeNAS with NFS shares. One of the FreeNAS boxes is new, with SSD storage, and as part of bringing it online, I have rearranged the cluser network connections on my Unifi XG switch. Each node has a 10G connection on VLAN100 (192.168.100.0/24) for data and a 10G on VLAN101 (192.168.101.0/24) for SAN/FreeNAS connections. All nodes use the same NFS share for Image files, but 1 node is seeing extremely slow disk access to the share. Iperf3 tests to the share come back at 10G speeds.
I've tried using different ports on the switch, checking that the interfaces are really connected to the right networks, I'm at a complete loss as to what to do. It has to be related to the change (and I can't just 'put it back', I moved enough ports I don't know which ones it was on previous, but the other 3 nodes are fine, and there are no new ports in the configuration, I simply set it up so that the nodes all were in one colum on the switch, with VLAN100 on top for every node and VLAN101 on bottom). I've tried moving the node from SPF+ to native copper ports on the switch with no apparent effect, and I've rebooted the Node multiple times.
Here is some data.
Iperf3 results on FreeNAS share from Node 1 (which is having the slow reads).
root@freenas2[~]# iperf3 -s
-----------------------------------------------------------
Server listening on 5201
-----------------------------------------------------------
Accepted connection from 192.168.101.11, port 42486
[ 5] local 192.168.101.102 port 5201 connected to 192.168.101.11 port 42488
[ ID] Interval Transfer Bitrate
[ 5] 0.00-1.00 sec 1.05 GBytes 9.00 Gbits/sec
[ 5] 1.00-2.00 sec 1.04 GBytes 8.92 Gbits/sec
[ 5] 2.00-3.00 sec 972 MBytes 8.15 Gbits/sec
[ 5] 3.00-4.00 sec 1.08 GBytes 9.29 Gbits/sec
[ 5] 4.00-5.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 5] 5.00-6.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 5] 6.00-7.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 5] 7.00-8.00 sec 1.10 GBytes 9.41 Gbits/sec
[ 5] 8.00-9.00 sec 1.07 GBytes 9.21 Gbits/sec
[ 5] 9.00-10.00 sec 1.03 GBytes 8.82 Gbits/sec
[ 5] 10.00-10.00 sec 1.32 MBytes 9.38 Gbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate
[ 5] 0.00-10.00 sec 10.6 GBytes 9.10 Gbits/sec receiver
Disk performance test from VM running on Node 1
Code:
bferrell@ntp:~$ sudo hdparm -Tt /dev/sda
[sudo] password for bferrell:
/dev/sda:
Timing cached reads: 2 MB in 32.17 seconds = 63.66 kB/sec
Timing buffered disk reads: 2 MB in 28.34 seconds = 72.26 kB/sec
bferrell@ntp:~$
similar test on Node 2
bferrell@pihole:~$ sudo hdparm -Tt /dev/sda
[sudo] password for bferrell:
/dev/sda:
Timing cached reads: 15382 MB in 1.99 seconds = 7744.40 MB/sec
Timing buffered disk reads: 396 MB in 3.00 seconds = 131.98 MB/sec
bferrell@pihole:~$
Some stats from Node 1
root@svr-01:~# traceroute 192.168.101.102
traceroute to 192.168.101.102 (192.168.101.102), 30 hops max, 60 byte packets
1 192.168.101.102 (192.168.101.102) 0.449 ms 0.418 ms 0.396 ms
root@svr-01:~# ping -c 4 192.168.101.102
PING 192.168.101.102 (192.168.101.102) 56(84) bytes of data.
64 bytes from 192.168.101.102: icmp_seq=1 ttl=64 time=0.233 ms
64 bytes from 192.168.101.102: icmp_seq=2 ttl=64 time=0.117 ms
64 bytes from 192.168.101.102: icmp_seq=3 ttl=64 time=0.218 ms
64 bytes from 192.168.101.102: icmp_seq=4 ttl=64 time=0.214 ms
--- 192.168.101.102 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 55ms
rtt min/avg/max/mdev = 0.117/0.195/0.233/0.047 ms
root@svr-01:~#
Same stats from Node 2
root@svr-02:~# traceroute 192.168.101.102
traceroute to 192.168.101.102 (192.168.101.102), 30 hops max, 60 byte packets
1 192.168.101.102 (192.168.101.102) 0.224 ms 0.189 ms 0.176 ms
root@svr-02:~# ping -c 4 192.168.101.102
PING 192.168.101.102 (192.168.101.102) 56(84) bytes of data.
64 bytes from 192.168.101.102: icmp_seq=1 ttl=64 time=0.332 ms
64 bytes from 192.168.101.102: icmp_seq=2 ttl=64 time=0.147 ms
64 bytes from 192.168.101.102: icmp_seq=3 ttl=64 time=0.171 ms
64 bytes from 192.168.101.102: icmp_seq=4 ttl=64 time=0.245 ms
--- 192.168.101.102 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 56ms
rtt min/avg/max/mdev = 0.147/0.223/0.332/0.074 ms
root@svr-02:~#