Network throughput of the virtual machines collapses after a few days

BertrandBB

Member
Oct 13, 2019
4
0
6
48
Hi

I use proxmox on a Soyoustart server (OVH). The installation dates from a few weeks before the release of the V6 Proxmox. The migration to v6 was made hoping a resolution of the problem that I will explain here, but this was not the case.

The problem is that the network throughput of the virtual machines collapses after a few days. There are 4 VMs on the server and they are all concerned. The flow ended by capping at 1Mo / s. It is very weak compared to real capacities.

I have been installing servers for my personal use for 20 years to host websites. It did not transform me as a network expert because it is the first time I am confronted with the problem.

The only solution to find a normal flow of VM is the reboot of the server. This morning I still tried this:

Code:
systemctl restart networking
Job for networking.service failed because the control process exited with error code.
See "systemctl status networking.service" and "journalctl -xe" for details.
root@ns31****:/etc/init.d# systemctl start networking
Job for networking.service failed because the control process exited with error code.
See "systemctl status networking.service" and "journalctl -xe" for details.

# systemctl status networking.service
● networking.service - Raise network interfaces
   Loaded: loaded (/lib/systemd/system/networking.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Sun 2019-10-13 11:54:42 CEST; 16s ago
     Docs: man:interfaces(5)
  Process: 15283 ExecStart=/sbin/ifup -a --read-environment (code=exited, status=1/FAILURE)
Main PID: 15283 (code=exited, status=1/FAILURE)

oct. 13 11:54:42 ns31**** systemd[1]: Starting Raise network interfaces...
oct. 13 11:54:42 ns31**** ifup[15283]: Waiting for vmbr0 to get ready (MAXWAIT is 2 seconds).
oct. 13 11:54:42 ns31**** ifup[15283]: RTNETLINK answers: File exists
oct. 13 11:54:42 ns31**** ifup[15283]: ifup: failed to bring up vmbr0
oct. 13 11:54:42 ns31**** systemd[1]: networking.service: Main process exited, code=exited, status=1/FAILURE
oct. 13 11:54:42 ns31**** systemd[1]: networking.service: Failed with result 'exit-code'.
oct. 13 11:54:42 ns31**** systemd[1]: Failed to start Raise network interfaces.

I ended up rebooting the machine because I could not afford to leave my users without services ...

I of course contacted the customer service, but I am asked to do my tests in rescue mode ... Obviously the results will be good and the conclusion will probably be as always: the problem lies between the screen and the chair, there where was made the configuration ...

Thank you to those who can help me !

Bertrand B.
 
Last edited:
you can't restart networking service, because vm tap interfaces are not defined in /etc/network/interfaces. (so at best, vmbr will be restarted without any vm pluggged on it).
Anyway, that shouldn't fix your problem.

can you post your /etc/network/interfaces file ?
do you have any logs in /var/log/kern.log, /var/log/messages ?
#pve-version -v ?
Do you tried to do network benchmark between vm on same host ?
 
Thanks for your answer ! Here the elements :

Bash:
cat  /etc/network/interfaces
# This file describes the network interfaces available on your system
# and how to activate them. For more information, see interfaces(5).

# The loopback network interface
auto lo
iface lo inet loopback


# vmbr0: Bridging. Make sure to use only MAC adresses that were assigned to you.
auto vmbr0
iface vmbr0 inet static
    address 188.165.203.***/24
    gateway 188.165.203.254
    bridge_ports eno1
    bridge_stp off
    bridge_fd 0

Lines in kern.log after the reboot (a lot of lines in first startup seconds) - (At this time, network is ok)
Bash:
Oct 13 12:12:44 ns31**** kernel: [  913.132254] sctp: Hash tables configured (bind 512/512)
Oct 13 13:44:25 ns31**** kernel: [ 6414.884788] hrtimer: interrupt took 31224 ns
Oct 13 16:49:04 ns31**** kernel: [17494.149919] perf: interrupt took too long (2576 > 2500), lowering kernel.perf_event_max_sample_rate to 77500

The same for messages :
Bash:
Oct 13 12:12:44 ns3***** kernel: [  913.132254] sctp: Hash tables configured (bind 512/512)
Oct 13 13:44:25 ns31***** kernel: [ 6414.884788] hrtimer: interrupt took 31224 ns
Oct 13 16:49:04 ns31**** kernel: [17494.149919] perf: interrupt took too long (2576 > 2500), lowering kernel.perf_event_max_sample_rate to 77500

pveversion -v :
Bash:
pveversion -v
proxmox-ve: 6.0-2 (running kernel: 5.0.21-2-pve)
pve-manager: 6.0-7 (running version: 6.0-7/28984024)
pve-kernel-5.0: 6.0-8
pve-kernel-helper: 6.0-8
pve-kernel-4.15: 5.4-9
pve-kernel-5.0.21-2-pve: 5.0.21-6
pve-kernel-4.15.18-21-pve: 4.15.18-48
ceph-fuse: 12.2.12-pve1
corosync: 3.0.2-pve2
criu: 3.11-3
glusterfs-client: 5.5-3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.12-pve1
libpve-access-control: 6.0-2
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-5
libpve-guest-common-perl: 3.0-1
libpve-http-server-perl: 3.0-2
libpve-storage-perl: 6.0-9
libqb0: 1.0.5-1
lvm2: 2.03.02-pve3
lxc-pve: 3.1.0-65
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.0-7
pve-cluster: 6.0-7
pve-container: 3.0-7
pve-docs: 6.0-4
pve-edk2-firmware: 2.20190614-1
pve-firewall: 4.0-7
pve-firmware: 3.0-2
pve-ha-manager: 3.0-2
pve-i18n: 2.0-3
pve-qemu-kvm: 4.0.0-5
pve-xtermjs: 3.13.2-1
qemu-server: 6.0-9
smartmontools: 7.0-pve2
spiceterm: 3.1-1
vncterm: 1.6-1

The iperf command asked by OVH support, runed when the netword problem was present :

Code:
iperf -c iperf.ovh.net -i 2 -t 20 -P 5 -f m -o 5
------------------------------------------------------------
Client connecting to iperf.ovh.net, TCP port 5001
TCP window size: 0.08 MByte (default)
------------------------------------------------------------
[  7] local 188.165.203.*** port 49106 connected with 188.165.12.136 port 5001
[  5] local 188.165.203.*** port 49102 connected with 188.165.12.136 port 5001
[  3] local 188.165.203.*** port 49104 connected with 188.165.12.136 port 5001
[  6] local 188.165.203.*** port 49100 connected with 188.165.12.136 port 5001
[  4] local 188.165.203.*** port 49098 connected with 188.165.12.136 port 5001
[ ID] Interval       Transfer     Bandwidth
[  7]  0.0- 2.0 sec  0.62 MBytes  2.62 Mbits/sec
[  5]  0.0- 2.0 sec  0.62 MBytes  2.62 Mbits/sec
[  6]  0.0- 2.0 sec  0.62 MBytes  2.62 Mbits/sec
[  4]  0.0- 2.0 sec  0.62 MBytes  2.62 Mbits/sec
[  3]  0.0- 2.0 sec  0.75 MBytes  3.15 Mbits/sec
[SUM]  0.0- 2.0 sec  3.25 MBytes  13.6 Mbits/sec
[  4]  2.0- 4.0 sec  0.38 MBytes  1.57 Mbits/sec
[  6]  2.0- 4.0 sec  0.38 MBytes  1.57 Mbits/sec
[  3]  2.0- 4.0 sec  0.38 MBytes  1.57 Mbits/sec
[  7]  2.0- 4.0 sec  0.50 MBytes  2.10 Mbits/sec
[  5]  2.0- 4.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM]  2.0- 4.0 sec  2.12 MBytes  8.91 Mbits/sec
[  4]  4.0- 6.0 sec  0.50 MBytes  2.10 Mbits/sec
[  6]  4.0- 6.0 sec  0.50 MBytes  2.10 Mbits/sec
[  3]  4.0- 6.0 sec  0.37 MBytes  1.57 Mbits/sec
[  7]  4.0- 6.0 sec  0.50 MBytes  2.10 Mbits/sec
[  5]  4.0- 6.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM]  4.0- 6.0 sec  2.37 MBytes  9.96 Mbits/sec
[  3]  6.0- 8.0 sec  0.38 MBytes  1.57 Mbits/sec
[  7]  6.0- 8.0 sec  0.38 MBytes  1.57 Mbits/sec
[  5]  6.0- 8.0 sec  0.38 MBytes  1.57 Mbits/sec
[  4]  6.0- 8.0 sec  0.50 MBytes  2.10 Mbits/sec
[  6]  6.0- 8.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM]  6.0- 8.0 sec  2.12 MBytes  8.91 Mbits/sec
[  7]  8.0-10.0 sec  0.38 MBytes  1.57 Mbits/sec
[  5]  8.0-10.0 sec  0.38 MBytes  1.57 Mbits/sec
[  6]  8.0-10.0 sec  0.38 MBytes  1.57 Mbits/sec
[  4]  8.0-10.0 sec  0.38 MBytes  1.57 Mbits/sec
[  3]  8.0-10.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM]  8.0-10.0 sec  2.00 MBytes  8.39 Mbits/sec
[  4] 10.0-12.0 sec  0.38 MBytes  1.57 Mbits/sec
[  6] 10.0-12.0 sec  0.38 MBytes  1.57 Mbits/sec
[  3] 10.0-12.0 sec  0.38 MBytes  1.57 Mbits/sec
[  7] 10.0-12.0 sec  0.50 MBytes  2.10 Mbits/sec
[  5] 10.0-12.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM] 10.0-12.0 sec  2.12 MBytes  8.91 Mbits/sec
[  5] 12.0-14.0 sec  0.38 MBytes  1.57 Mbits/sec
[  6] 12.0-14.0 sec  0.50 MBytes  2.10 Mbits/sec
[  4] 12.0-14.0 sec  0.50 MBytes  2.10 Mbits/sec
[  3] 12.0-14.0 sec  0.50 MBytes  2.10 Mbits/sec
[  7] 12.0-14.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM] 12.0-14.0 sec  2.38 MBytes  9.96 Mbits/sec
[  6] 14.0-16.0 sec  0.38 MBytes  1.57 Mbits/sec
[  4] 14.0-16.0 sec  0.38 MBytes  1.57 Mbits/sec
[  3] 14.0-16.0 sec  0.38 MBytes  1.57 Mbits/sec
[  7] 14.0-16.0 sec  0.38 MBytes  1.57 Mbits/sec
[  5] 14.0-16.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM] 14.0-16.0 sec  2.00 MBytes  8.39 Mbits/sec
[  5] 16.0-18.0 sec  0.38 MBytes  1.57 Mbits/sec
[  6] 16.0-18.0 sec  0.50 MBytes  2.10 Mbits/sec
[  4] 16.0-18.0 sec  0.50 MBytes  2.10 Mbits/sec
[  3] 16.0-18.0 sec  0.50 MBytes  2.10 Mbits/sec
[  7] 16.0-18.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM] 16.0-18.0 sec  2.38 MBytes  9.96 Mbits/sec
[  3] 18.0-20.0 sec  0.38 MBytes  1.57 Mbits/sec
[  3]  0.0-20.1 sec  4.50 MBytes  1.88 Mbits/sec
[  7] 18.0-20.0 sec  0.38 MBytes  1.57 Mbits/sec
[  7]  0.0-20.2 sec  4.62 MBytes  1.92 Mbits/sec
[  5] 18.0-20.0 sec  0.50 MBytes  2.10 Mbits/sec
[  5]  0.0-20.3 sec  4.62 MBytes  1.91 Mbits/sec
[  6] 18.0-20.0 sec  0.50 MBytes  2.10 Mbits/sec
[  6]  0.0-20.6 sec  4.62 MBytes  1.89 Mbits/sec
[  4] 18.0-20.0 sec  0.50 MBytes  2.10 Mbits/sec
[SUM] 18.0-20.0 sec  2.25 MBytes  9.44 Mbits/sec
[  4]  0.0-20.6 sec  4.62 MBytes  1.89 Mbits/sec
[SUM]  0.0-20.6 sec  23.0 MBytes  9.38 Mbits/sec


Thanks !

B.
 
As expected, after receiving the technical test report on a newly rebooted server in rescue mode, the OVH technical service concludes with a simple sentence that the problem is that of an internal configuration.
Me in 20 years, I have never seen a Debian that reduced its ethernet traffic randomly like that for no reason after a week of normal operation. Either it works or it does not work. In short, if anyone has an idea before it all ends at Amazon ... Thank you!

B.
 
When the problem occur, do you have global network stats usage of the nic ?

(Just an idea, maybe something is flooding you at this moment. (ddos,...)) ?

Is the iperf done from the proxmox host directly ?
 
Thank you for this answer and this idea! I did not think about flood! I will try to watch what happens (but would it be strange that a reboot server stops the flood?)
The iperf command is well done on the host. Curiously, VMs are also limited to 1 MB at this time (wget a file on a VM from the host or from my home computer). In short, it's curious all that.
Thank you !
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!