Connection errors/timeouts between 2 2012R2 guests on same bridge

sapphiron

Well-Known Member
Nov 2, 2012
30
0
46
Hi All

I have 2 Windows server 2012 R2 on the same host connected on an internal bridge network. They access internet connectivity via an Untangle Firewall with an interface bridged on an external adapter and another on the internal bridge. The Internal Bridge has no physical adaptor assigned to it.

Its running on Proxmox 4.4 with the latest patches:
proxmox-ve: 4.4-78 (running kernel: 4.4.35-2-pve)
pve-manager: 4.4-5 (running version: 4.4-5/c43015a5)
pve-kernel-4.4.35-1-pve: 4.4.35-77
pve-kernel-4.4.35-2-pve: 4.4.35-78
pve-kernel-4.4.21-1-pve: 4.4.21-71
pve-kernel-4.4.19-1-pve: 4.4.19-66
lvm2: 2.02.116-pve3
corosync-pve: 2.4.0-1
libqb0: 1.0-1
pve-cluster: 4.0-48
qemu-server: 4.0-102
pve-firmware: 1.1-10
libpve-common-perl: 4.0-85
libpve-access-control: 4.0-19
libpve-storage-perl: 4.0-71
pve-libspice-server1: 0.12.8-1
vncterm: 1.2-1
pve-docs: 4.4-1
pve-qemu-kvm: 2.7.0-10
pve-container: 1.0-90
pve-firewall: 2.0-33
pve-ha-manager: 1.0-38
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.6-5
lxcfs: 2.0.5-pve2
criu: 1.6.0-1
novnc-pve: 0.5-8
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.8-pve13~bpo80


One is an Active directory and fileserver. The other is a RDS server with about 50-60 users logging in. Both guests have the full set of Virtio drivers installed. (version 2017-01-15 - 63.74.104.13000)

We are having weird connectivity issues between the two windows hosts when users go over 40 or so. This may range from stutters, "not responding" applications to timeouts.

When bench-marked, traffic throughput is normal. i.e. a file copy between hosts will run at 2Gbps to 3Gbps. Storage on the server is PCI-E NVME based with a secondary mechanical disk storage array. We have had to set disk IOPS limits to prevent the storage system from over-saturating the system bus. This does seemed to have reduced the timeouts somewhat. The network timeouts also occur in accessing data on mechanical storage drives. The Guest non-network related operations are completely normal.

Initially I had the network multi-path set to default (blank). I have set it to 8, which has made no difference.

Looking at Atop on the server, I noticed that there are periods where the TAP interfaces would have very high load. Disk and CPU load is low.
Normal load is as follows:
TAP220i 21% pcki 2169 | pcko 2131 si 2.1 Mbps | so 6.2 Mbps
TAP210i 16% pcki 1581 | pcko 1646 si 1.6 Mbps | so 1.1 Mbps
TAP200i 10% pcki 561 | pcko 642 si 1 Mbps | so 0.3 Mbps

Spikes look as follows and last as long as the network load:
TAP220i 2857% pcki 10415 | pcko 9709 si 94 Mbps | so 98 Mbps
TAP210i 2850% pcki 8382 | pcko 9044 si 98 Mbps | so 92 Mbps
TAP200i 12% pcki 7213 | pcko 6620 si 1.6 Mbps | so 0.3 Mbps
without the disk IO limits, this can peak as high as 50000%

Has anyone seen something similar and managed to solve it?

VM config files as follows:

VM200
bootdisk: virtio0
cores: 1
cpu: host
cpuunits: 2000
memory: 2048
name: UntangleFirewall
net0: virtio=3A:32:33:32:64:31,bridge=vmbr0
net1: virtio=66:64:65:64:63:30,bridge=vmbr1,queues=8
numa: 1
ostype: l26
smbios1: uuid=13a36d07-90d1-4f86-ad99-45fba791979b
sockets: 1
virtio0: local-lvm:vm-200-disk-1,cache=writethrough,size=200G


VM210
balloon: 0
bootdisk: virtio0
cores: 4
cpu: host
cpuunits: 4000
memory: 20480
name: GeneralServer
net0: virtio=66:30:61:32:36:35,bridge=vmbr1,queues=8
numa: 1
ostype: win8
smbios1: uuid=19aefb3d-efaf-4822-908b-7535fa3a3730
sockets: 1
virtio0: VG_NVME:vm-210-disk-1,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=70G
virtio1: VG_NVME:vm-210-disk-2,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=100G
virtio10: local-lvm:vm-210-disk-2,backup=0,size=3000G
virtio2: local-lvm:vm-210-disk-5,size=650G
virtio3: local-lvm:vm-210-disk-6,size=701G
virtio4: VG_NVME:vm-210-disk-5,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=150G
virtio5: VG_NVME:vm-210-disk-6,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=300G
virtio6: VG_NVME:vm-210-disk-3,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=10G
virtio7: local-lvm:vm-210-disk-1,backup=0,size=502G
virtio8: VG_NVME:vm-210-disk-8,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=42G
virtio9: VG_NVME:vm-210-disk-9,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=41G


VM220
balloon: 0
bootdisk: virtio0
cores: 6
cpu: host
cpuunits: 12000
memory: 40960
name: RDS1
net0: virtio=32:33:39:32:34:36,bridge=vmbr1,queues=8
numa: 1
ostype: win8
smbios1: uuid=57084703-4b05-440a-a25d-a96edac5fc46
sockets: 2
virtio0: VG_NVME:vm-220-disk-1,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=100G
virtio1: VG_NVME:vm-220-disk-6,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=19G
virtio2: VG_NVME:vm-220-disk-3,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=20G
virtio3: VG_NVME:vm-220-disk-4,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=21G
virtio4: VG_NVME:vm-220-disk-5,iops_rd=1500,iops_wr=1500,mbps_rd=150,mbps_wr=150,size=22G
 
Last edited:
When you're talking about multi-path in the network what do you exactly mean ?
Is that the Multiqueue Option ? ( http://pve.proxmox.com/pve-docs/chapter-qm.html#qm_network_device )

IIRC In Bridge node the kernel has to inspect every Ethernet packet coming to the bridge to send it to the proper tap device. So yes this can be cpu intensive if you have loads of traffic ( which you seem to have, plus you have two bridges here which need to inspect incoming all packets)
If you server have multiple NICS hardware you could think of doing PCI passthrough of a NIC directly to the guest to get rid of the bridge, and move the firewall outside the PVE box.
 
manu, Thank you for the input.

yes, i was referring to Multi-queue.

There are 3 unused NICs in the server. I will look into the passthrough option, but it does not leave a lot of flexibility, so I will consider it a last resort.

Would using OVS be more or less efficient?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!