Extremely bad network speed between PVE nodes

ksl28

New Member
Aug 31, 2023
25
4
3
Hi,

I have a test setup consisting of 2 x PVE nodes (dk1prx01 & dk1prx02), that are connected to the same single switch, with the same uplink to the rest of the network.
If i copy data between VMs on the same node, i reach speeds of between 850-980Mbit/s - even if its routed between VLANs.
If i copy data from the PVE nodes to an external Hyper-V server, i also get the full speed of about 900Mbit/s.
Both PVE nodes are on the same switch, and are reaching the same Hyper-V host using the same uplink.

But if i copy from a VM on dk1prx01 to dk1prx02, then i am throttled to about 150Mbit/s - and this goes the other way as well.
1693997315002.png

I have read on the other forum posts, but they all had the issues that the link was negotiated to 100Mbit - but thats not the case here, since i reach near full Gbit connections across different VLANs - just as long, as the VMs are on the same PVE node.

Code:
root@dk1prx02:~# ethtool enp3s0
Settings for enp3s0:
        Supported ports: [ TP    MII ]
        Supported link modes:   10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                2500baseT/Full
        Supported pause frame use: Symmetric Receive-only
        Supports auto-negotiation: Yes
        Supported FEC modes: Not reported
        Advertised link modes:  10baseT/Half 10baseT/Full
                                100baseT/Half 100baseT/Full
                                1000baseT/Full
                                2500baseT/Full
        Advertised pause frame use: Symmetric Receive-only
        Advertised auto-negotiation: Yes
        Advertised FEC modes: Not reported
        Link partner advertised link modes:  10baseT/Half 10baseT/Full
                                             100baseT/Half 100baseT/Full
                                             1000baseT/Full
        Link partner advertised pause frame use: No
        Link partner advertised auto-negotiation: Yes
        Link partner advertised FEC modes: Not reported
        Speed: 1000Mb/s
        Duplex: Full
        Auto-negotiation: on
        master-slave cfg: preferred slave
        master-slave status: slave
        Port: Twisted Pair
        PHYAD: 0
        Transceiver: external
        MDI-X: Unknown
        Supports Wake-on: pumbg
        Wake-on: d
        Link detected: yes


Code:
root@dk1prx02:~# dmesg |grep -i "enp3s0"
[    1.690058] r8169 0000:03:00.0 enp3s0: renamed from eth0
[    5.640832] vmbr0: port 1(enp3s0) entered blocking state
[    5.640835] vmbr0: port 1(enp3s0) entered disabled state
[    5.640879] device enp3s0 entered promiscuous mode
[    5.873197] r8169 0000:03:00.0 enp3s0: Link is Down
[    5.876481] vmbr0: port 1(enp3s0) entered blocking state
[    5.876484] vmbr0: port 1(enp3s0) entered forwarding state
[    6.656824] vmbr0: port 1(enp3s0) entered disabled state
[    8.687919] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control off
[    8.687938] vmbr0: port 1(enp3s0) entered blocking state
[    8.687942] vmbr0: port 1(enp3s0) entered forwarding state
[  194.443319] vmbr0v3: port 1(enp3s0.3) entered blocking state
[  194.443322] vmbr0v3: port 1(enp3s0.3) entered disabled state
[  194.443804] device enp3s0.3 entered promiscuous mode
[  194.446788] vmbr0v3: port 1(enp3s0.3) entered blocking state
[  194.446790] vmbr0v3: port 1(enp3s0.3) entered forwarding state
[  611.415599] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control off
[  611.415833] r8169 0000:03:00.0 enp3s0: Link is Down
[  611.415975] vmbr0: port 1(enp3s0) entered disabled state
[  611.416506] vmbr0v3: port 1(enp3s0.3) entered disabled state
[  614.852713] r8169 0000:03:00.0 enp3s0: Link is Up - 1Gbps/Full - flow control off
[  614.852749] vmbr0: port 1(enp3s0) entered blocking state
[  614.852757] vmbr0: port 1(enp3s0) entered forwarding state
[  614.853293] vmbr0v3: port 1(enp3s0.3) entered blocking state
[  614.853297] vmbr0v3: port 1(enp3s0.3) entered forwarding state


What am i doing wrong here?
 
limited by storage speed on write maybe
The table above is from iPerf, but the same happens if i copy files over SMB.
I can easily pull files from the Hyper-V host over SMB, and achieve near full gigabit.

Both hosts are equipped with 1 x 2,5" SSD for PVE, and 2 x NVMe disks for VMs & Containers - so i doubt its the storage :)
 
Could please you post the specific models of the NVMe and SSDs?
 
Could please you post the specific models of the NVMe and SSDs?
SSD = Samsung 850 Pro 512GB
NVMe / M2 = Samsung 980 Pro 2TB

I just want to highlight, that if i copy between two VMs on the same PVE node, then i get the full performance.
Did a crystal disk mark, and this is the result:
1694011041548.png
 
Most consumer grade disks are notorious for becoming excruciatingly slow after they fill their cache, by doing a 2gb benchmark you are not going to hit this limit. On the other hand, day to day operations on VMs will probably fill the cache of a consumer SSD/NVMe rather quickly. This is specially true if you have some kind of write amplification, e.g. due to the redundancy added by zfs.

Lets suppose the disks are not related the issue.

Could you please share with us:

- The VM configs (`/etc/pve/qemu-server/{ID}.conf`)
- The network interfaces (`/etc/network/interfaces`)
- Network routing (`ip route show`)

of the affected nodes? Where is the Hyper-V server? In which network? How did you invoke iperf (what was the exact command)?
 
Most consumer grade disks are notorious for becoming excruciatingly slow after they fill their cache, by doing a 2gb benchmark you are not going to hit this limit. On the other hand, day to day operations on VMs will probably fill the cache of a consumer SSD/NVMe rather quickly. This is specially true if you have some kind of write amplification, e.g. due to the redundancy added by zfs.

Lets suppose the disks are not related the issue.

Could you please share with us:

- The VM configs (`/etc/pve/qemu-server/{ID}.conf`)
- The network interfaces (`/etc/network/interfaces`)
- Network routing (`ip route show`)

of the affected nodes? Where is the Hyper-V server? In which network? How did you invoke iperf (what was the exact command)?
Hi,

I just discovered a rather interesting fact - it seems that the issue only persists, once a Windows VM is involved.
If i use iPerf3 on Ubuntu VMs that resides on different PVE nodes, i get a full gig link - but on Windows i get limited.
And yes - the VirtIO driver is installed :)

1694014378848.png
192.168.2.44 is the Ubuntu VM on the other PVE node.

So i guess we can rule out the storage now - any suggestions now?
 
Could you please send us the config of the windows VM?
 
Additionally, do you have Window's firewall on?
 
Could you please send us the config of the windows VM?
Here you go :)

Code:
root@dk1prx01:/etc/pve/qemu-server# cat 102.conf
agent: 1
balloon: 8192
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
cpu: x86-64-v2-AES
efidisk0: dir01:102/vm-102-disk-0.raw,efitype=4m,format=raw,pre-enrolled-keys=1,size=528K
ide2: none,media=cdrom
machine: pc-q35-8.0
memory: 16134
meta: creation-qemu=8.0.2,ctime=1693481109
name: dk1term01
net0: virtio=FA:4B:87:9E:27:84,bridge=vmbr0
numa: 0
ostype: win11
scsi0: dir01:102/vm-102-disk-3.qcow2,size=60G
scsi1: dir01:102/vm-102-disk-1.qcow2,size=80G
scsihw: virtio-scsi-pci
smbios1: uuid=da6599ab-56ab-4f8f-8b27-1705ecb6669a
sockets: 1
tpmstate0: dir01:102/vm-102-disk-2.raw,size=4M,version=v2.0
vmgenid: d2eae411-ffd1-4c8e-9223-3666ce47040d
 
VM config looks ok, it uses `scsi` drives and `virtio` for network which is good. I would suggest to enable Discard on the disks [1], but thats unrelated to the issue at hand.

If I had to guess I would say that Window's firewall is involved.

[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_trim_discard
I have just disabled Windows firewall, and tried once again - and the performance is no better :(

I found this post regarding the same issue (https://forum.proxmox.com/threads/b...etween-two-guest-machines-on-same-host.62204/) - and it helped that they downloaded the latest driver, instead of the stable one.

But it looks to me, that the stable and the latest are both the same version - 0.1.229 https://pve.proxmox.com/wiki/Window...,Installation,Using the ISO,-You can download

So thats not an option for me
 
@Maximiliano - Any suggestions ? :)
Just tested with an iPerf once more - the first attempt is when the VMs are spererated, and the second one is when they are located on the same PVE node.
1694110781832.png

Its important to highlight, that the performance is perfectly fine between the PVE nodes - so they achieve 1Gbit when running the test - its only an issue on Windows VMs
 
try set vcpu type to host
Hi,

I just tried that, and that sadly didnt work.
But i just learned something new, that its only the download speed that is the issue for Windows servers running on Proxmox.
1694157340758.png

The same test from other physical machines / Hyper-V VMs using the same switch / routers to reach the internet shows 920/940 Mbit.
So its not isolated to connections between Windows VMs on the Proxmox servers - its for all incoming sessions for Windows VMs.

It seems that its a common issue out there:
https://www.reddit.com/r/Proxmox/comments/137y1fv/slow_network_download_speeds_on_proxmox_ve/

I really appreciate the help and time spend on this!
 
I finally found a solution for this :)
I installed Ubuntu on the physical hardware, and spawned a Windows VM on the box, and performed a new iPerf test - only to realize, that i got full speed in both up and down.

I then googled "Proxmox 2,5Gbit NIC Realtek", and realized several other people had the exact same issue when using it on Proxmox.
So i switched to a USB 1Gbit NIC instead, and i now get the full link :)

Thanks for the time spend on this!
 
  • Like
Reactions: _gabriel
@Maximiliano - It seems that its an issue related to Proxmox. So if you want to debug on this from a Proxmox / Vendor perspective, please let me know. I would be happy to isolate one of the nodes, and perform various tests and gather logs. :)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!