Proxmox Network Drops

evlute

New Member
Oct 5, 2015
19
0
1
Hello,

on our proxmox server we notice strange network behavior. The network rate drops on our windows server 2012 guest. Every client notices a break. On the windows server guest there is a programm running for doctors office software. Everytime the network bandwidth is shaking from 100megabyte/s to 0 Mb/s, the software is freezing. I noticed this behavior on older proxmox as on the new proxmox version. Kernel : 4.2.2-1-pve.

I don't know why the network speed is breaking and going down. TCP DUMP shoed me 1468SMB-over-TCP packet:(raw data or continuation). could this be related to the problem? if yes, how to fix that?

Edit: It does not matter if i use virtio or e1000. (Our solution for now is to go back to the old bare metal server - so because of that it must be proxmox related)


h_1450169138_1266115_0d12c5c1d0.png



Dec 14 23:58:32 diavital kernel: [178287.422068] kvm: zapping shadow pages for mmio generation wraparound
Dec 14 23:58:32 diavital kernel: [178287.426264] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:03:12 diavital kernel: [178567.584827] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:03:12 diavital kernel: [178567.586571] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:03:15 diavital kernel: [178570.396991] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:03:15 diavital kernel: [178570.399774] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:28:34 diavital rrdcached[1350]: flushing old values
Dec 15 00:28:34 diavital rrdcached[1350]: rotating journals
Dec 15 00:28:34 diavital rrdcached[1350]: started new journal /var/lib/rrdcached/journal/rrd.journal.1450135714.055780
Dec 15 00:28:34 diavital rrdcached[1350]: removing old journal /var/lib/rrdcached/journal/rrd.journal.1450128514.055769
Dec 15 00:41:32 diavital kernel: [180867.909823] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:41:32 diavital kernel: [180867.913451] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:41:34 diavital kernel: [180869.753447] kvm: zapping shadow pages for mmio generation wraparound
Dec 15 00:41:34 diavital kernel: [180869.755503] kvm: zapping shadow pages for mmio generation wraparound
 
Last edited:

snowman66

Active Member
Dec 1, 2010
254
1
38
-What hardware do you use (cpu, nic, switch, storage)?
-What happens with CPU usage/IO delays when speed drops on the host?
-Did you install PVE from official ISO or on Debian?
-Can you test speed with Linux VM machine (ubuntu, debian) or with iperf directly from host to virtual machine and from host to physical machine (in both directions)?
 

Q-wulf

Well-Known Member
Mar 3, 2013
613
38
48
my test location
-What hardware do you use (cpu, nic, switch, storage)?
-What happens with CPU usage/IO delays when speed drops on the host?
-Did you install PVE from official ISO or on Debian?
-Can you test speed with Linux VM machine (ubuntu, debian) or with iperf directly from host to virtual machine and from host to physical machine (in both directions)?

Mind posting the output of "cat /etc/network/interfaces" as well ?

If i am not mistake you are using 2 Nics connected on a KVM used for VPN , what OS/VPN solution is that machine running ?

On what VM/host/client do you notice the bandwith drops ? Is it VM to VM or Host to VM or CLient to VM ? or even remote to VM ?
 

evlute

New Member
Oct 5, 2015
19
0
1
  • Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
  • Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
  • Samsung 850 Pro 512gb (Raid)
  • TP Link GB Switch
Code:
auto lo
iface lo inet loopback

iface eth0 inet manual

iface eth1 inet manual

iface eth2 inet manual

iface eth3 inet manual

#lan
auto vmbr0
iface vmbr0 inet static
    address  192.168.100.254
    netmask  255.255.255.0
    bridge_ports eth0
    bridge_stp off
    bridge_fd 0

#wan
auto vmbr1
iface vmbr1 inet static
    address  192.168.2.254
    netmask  255.255.255.0
    gateway  192.168.2.1
    bridge_ports eth1
    bridge_stp off
    bridge_fd 0

Linux VM's are running nice. Only the Windows Server 2012 starts lagging and dropping bandwidth. Please help :)
 

Q-wulf

Well-Known Member
Mar 3, 2013
613
38
48
my test location
On that Windows VM, do you use 2 separate vNICs that use vmbr0 and vmbr1 respectively ?

When you have the issue you describe, what happens on your Proxmox-node AND your windows Vm with regards to "Load avarage" , "cpu usage" and "IO delay" ?

Is there any difference on those 3(6) values when using virtio or E1000 ?

Edit: how much Ram has the Proxmox-Node and how much Ram is assigned to the Windows Vm in question ?
Edit2: Pretty sure you already checked this, but lets make sure Your eth0 and eth1 nics are NOT saturated when this issue occurs, right ?
 

evlute

New Member
Oct 5, 2015
19
0
1
I tried different Styles on the Windows VM. Two vNics, one vNic. E1000 or Virtio. No Changes.
I can't say anything to cpu usage and IO delay, only one thing: Every time the network speed went down like you see in the screenshot, the maschine was fully usable without any problems.

The Proxmox Node has 64 GB Ram. The Windows VM has now 32 GB, tried with Baloon Driver and without. No difference.
 

Q-wulf

Well-Known Member
Mar 3, 2013
613
38
48
my test location
okay - check ram from your list.

Can you please do the following:
  1. Reproduce the issue
  2. while it happens on Proxmox go to "Datacenter > "NodeName" > Summary" - note down "Load Avarage", "CPU Usage" and "Io delay".
  3. while it happens on your windows VM follow the following guide (s) to get disk latency: https://technet.microsoft.com/en-us/library/cc749115.aspx and http://blogs.technet.com/b/askcore/...with-windows-performance-monitor-perfmon.aspx your looking for "What counters in Windows Performance Monitor show the physical disk latency? "
And then report those results.
 

evlute

New Member
Oct 5, 2015
19
0
1
Okay, i will reproduce it but i can't do it right now. But here i can give you the screenshots from my last test:
h_1450386009_9500940_d01565d44f.png
 

Q-wulf

Well-Known Member
Mar 3, 2013
613
38
48
my test location
without the other values (cpu usage + load avarage + io wait on Node / cpu usage on VM) its hard to put that into context. Other then you generating 9MB/s / 72 Mbit/s traffic causing 2 MB/s writes and 1,2 MB/s reads on that vDisk for that VM.

just let us know once you have the values.
 

evlute

New Member
Oct 5, 2015
19
0
1
I don't understand the stats of proxmox. Because i was sending 100mb/s via the windows guest server and a physical machine. But the stats does not point it out. Do you have any other ideas why this network drop could happen?
 

Q-wulf

Well-Known Member
Mar 3, 2013
613
38
48
my test location
You mean Mb right as in Mbit/s as Mega Bit per second.
Proxmox graphs show MB as in MB/s as in Mega Byte per second.

one MB is 8 Mbit

so if your proxmox graph for your VM shows 9.0 M netout and 0.5 Netin, then you are talking 9.5 MB/s = 76 Mbit/s for that VM in total traffic over its network (aggregate traffic for all vNic(s) ).

If you look at the traffic stats on the actual Proxmox-Node, your looking afaik at the aggregate traffic for all physical Nic(s)


edit regarding ideas - once i have the Load avarage + cpu + io wait for Proxmox node and windows VM as indicated above i can tell you more.
 

evlute

New Member
Oct 5, 2015
19
0
1
I mean 100 Megabyte / second - round about 1gbit. But the network graph shoes very low values.

it falls down from 100mbyte/gbit to zero .... And i don't have any explanation
 

evlute

New Member
Oct 5, 2015
19
0
1
Today i started a last test, because i have to move the server to bare metal.

Test: Copy a huge file from the old server(wm2008,physical) to the new server (wm2012,proxmox) and vice versa. It worked. And the graph worked right too.

h_1450782734_9116283_5e73c7f30f.png

I'm a little bit clueless why we always struggle if i change the old server to the new one. But as soon as i change the winsname from the old to the new server, and the ip from the new server from their ip address and wins name to the old one, and shutdown the old one...
 

evlute

New Member
Oct 5, 2015
19
0
1
I will try something new today, before i finally move over to bare metall. Because WINS is used on the clients to address the server, we will change it to the server ip. Maybe WINS is causing problems on a virtual machine on proxmox with the network stack and maybe this is a workround - i don't know.
 

evlute

New Member
Oct 5, 2015
19
0
1
Yesterday was a hard day. I worked 15 hours. First i made a clonezilla image of my virtual server. Installed it on bare metall, it worked but then i saw an eventually loop, so i changed cabling of the server. Saw a chance that this could be the reason why proxmox guest clients(WSrv2012) had problems. So i said to myself fuck it, i try it again with the newest version, so installed the new proxmox 4.1 iso from scratch, was pissed why rootdelay isn't stored from the start in the grub string or in advanced options. I restored my server vm from my nfs drive, copied the newest version of our database from the old server to the guest server on proxmox - 100 mbyte / s(1Gbit), everything was fine. So i decided to switch over. And as a last test i was copying data from the new server back to the old server. (The old server became a new ip, and new wins name). And the network drops came back, soon as i left the browser window of my vm. It's really pissing me off and for all the time i invested in this bullshit - i could save my time and invest the money in another product.... No offense at all, i'm still a fan, that's why i'm puttin so much time in this, but it kinda sucks. Now bare metal and hyper v....
h_1450860054_9815088_a6f61e4cc3.png
h_1450860297_6268102_9e472c608e.png

h_1450860092_2094786_bc4b73efdb.png

h_1450860519_7930083_e4bc085d9c.png
 
Last edited:

Q-wulf

Well-Known Member
Mar 3, 2013
613
38
48
my test location
let me be quite blunt.

You are spending time (15 hours) on "tests" that do not "accomplish" anything, besides verifying the issue persists, instead of listening to the (limited) advice for conflict resolution you are receiving on this forum.

let me point this out again:
[...]
Can you please do the following:
  1. Reproduce the issue
  2. while it happens on Proxmox go to "Datacenter > "NodeName" > Summary" - note down "Load Avarage", "CPU Usage" and "Io delay".
  3. while it happens on your windows VM follow the following guide (s) to get disk latency: https://technet.microsoft.com/en-us/library/cc749115.aspx and http://blogs.technet.com/b/askcore/...with-windows-performance-monitor-perfmon.aspx your looking for "What counters in Windows Performance Monitor show the physical disk latency? "
And then report those results.

In case it is not obvious to you from my request for information, i am going after resource saturation as a source of your issues. Same as Snowman66 did.

ps.: limited advice because me and snowman66 are the only ones bothering :p
 
Last edited:

evlute

New Member
Oct 5, 2015
19
0
1
As soon as possible i try the vm directly on the server, and if it works.... the whole proxmox thing is done for me on this machine....
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!