Hey guys, first time posting here so forgive me if I skip important details. Also English is not my first lenguage so apologies.
In the past I had 2 pve nodes and 1 NAS running Ubuntu 22.04 hosting 2 nfs shares (1x RAIDz5 and 1x 500GB NVME). Both pve nodes were interconected via 10gb cards and NAS still had a 1GB nic. Most of my VMs were stored in that NFS (NVME) and were running fine over 1GB.
Yesterday I upgraded the NAS to a new platform and i switched from ubuntu to pve for the nas to take advantage of the new hardware (old nas was ancient hardware) and i've added a 10gb card to the NAS, trying to improve even further the speeds.
Since then, I've getting horrible vm speed and general unresponivess. I've launched a W10 vm and it took like 7 minutes to start, when normally it should take 1 minute at most.
I've checked connectivity between nodes with iperf and it all looks fine to me:
Note: This measurements are with both nodes hitting the nas at the same time
What am I missing? I'm at a loss here.
Thanks,
Iyán
Edit: I also tried to hit the network and the nvme at the same time to check that there were not any weird shenanigans with the PCI-E lanes:
In the past I had 2 pve nodes and 1 NAS running Ubuntu 22.04 hosting 2 nfs shares (1x RAIDz5 and 1x 500GB NVME). Both pve nodes were interconected via 10gb cards and NAS still had a 1GB nic. Most of my VMs were stored in that NFS (NVME) and were running fine over 1GB.
Yesterday I upgraded the NAS to a new platform and i switched from ubuntu to pve for the nas to take advantage of the new hardware (old nas was ancient hardware) and i've added a 10gb card to the NAS, trying to improve even further the speeds.
Since then, I've getting horrible vm speed and general unresponivess. I've launched a W10 vm and it took like 7 minutes to start, when normally it should take 1 minute at most.
I've checked connectivity between nodes with iperf and it all looks fine to me:
Code:
root@pveprod01:~# iperf -e -t 30 -i 3 -c 192.168.5.3
------------------------------------------------------------
Client connecting to 192.168.5.3, TCP port 5001 with pid 131785 (1 flows)
Write buffer size: 131072 Byte
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.5.1%cluster1 port 38158 connected with 192.168.5.3 port 5001 (sock=3) (icwnd/mss/irtt=87/8948/189) (ct=0.22 ms) on 2024-11-10 00:17:17 (CET)
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT(var) NetPwr
[ 1] 0.0000-3.0000 sec 2.78 GBytes 7.95 Gbits/sec 22746/0 91 795K/806(256) us 1232987
[ 1] 3.0000-6.0000 sec 2.85 GBytes 8.16 Gbits/sec 23337/0 45 1371K/1257(313) us 811145
[ 1] 6.0000-9.0000 sec 1.93 GBytes 5.51 Gbits/sec 15771/0 16 1494K/2190(373) us 314633
[ 1] 9.0000-12.0000 sec 1.60 GBytes 4.58 Gbits/sec 13112/0 0 1607K/1614(243) us 354939
[ 1] 12.0000-15.0000 sec 1.65 GBytes 4.72 Gbits/sec 13498/0 0 1669K/2252(894) us 261872
^C[ 1] 15.0000-16.3698 sec 1.13 GBytes 7.08 Gbits/sec 9248/0 5 1328K/1047(108) us 845196
[ 1] 0.0000-16.3698 sec 11.9 GBytes 6.26 Gbits/sec 97712/0 157 1328K/1047(108) us 747254
Code:
root@pveprod02:~# iperf -e -t 30 -i 1 -c 192.168.5.3
------------------------------------------------------------
Client connecting to 192.168.5.3, TCP port 5001 with pid 75395 (1 flows)
Write buffer size: 131072 Byte
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.5.2%cluster1 port 43322 connected with 192.168.5.3 port 5001 (sock=3) (icwnd/mss/irtt=87/8948/1446) (ct=1.47 ms) on 2024-11-10 00:17:24 (CET)
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT(var) NetPwr
[ 1] 0.0000-1.0000 sec 510 MBytes 4.28 Gbits/sec 4078/0 0 3469K/6645(282) us 80438
[ 1] 1.0000-2.0000 sec 511 MBytes 4.28 Gbits/sec 4085/0 0 3469K/6417(236) us 83439
[ 1] 2.0000-3.0000 sec 500 MBytes 4.20 Gbits/sec 4001/0 0 3469K/6373(272) us 82288
[ 1] 3.0000-4.0000 sec 494 MBytes 4.15 Gbits/sec 3956/0 0 3469K/5661(104) us 91595
[ 1] 4.0000-5.0000 sec 486 MBytes 4.08 Gbits/sec 3889/0 2 2656K/5189(150) us 98235
[ 1] 5.0000-6.0000 sec 477 MBytes 4.00 Gbits/sec 3817/0 0 3303K/5386(169) us 92889
[ 1] 6.0000-7.0000 sec 478 MBytes 4.01 Gbits/sec 3821/0 0 3320K/5295(226) us 94585
[ 1] 7.0000-8.0000 sec 477 MBytes 4.00 Gbits/sec 3816/0 0 3320K/5563(120) us 89910
^C[ 1] 8.0000-8.6933 sec 324 MBytes 3.92 Gbits/sec 2589/0 0 3320K/6914(322) us 70798
[ 1] 0.0000-8.6933 sec 4.16 GBytes 4.11 Gbits/sec 34052/0 2 3320K/6914(322) us 74258
root@pveprod02:~#
Note: This measurements are with both nodes hitting the nas at the same time
What am I missing? I'm at a loss here.
Thanks,
Iyán
Edit: I also tried to hit the network and the nvme at the same time to check that there were not any weird shenanigans with the PCI-E lanes:
Attachments
Last edited: