VM network freeze

Kiril

New Member
Sep 25, 2017
2
0
1
33
Hello,

I'm using Proxmox 5.0. On random intervals my monitoring (Nagios) notify me that one of my VM is not reachable. I do not have ping to this machine from any workstation nor outside from the VM. The machine is running fine just seems like network is not working. Workaround is resetting the interface from the UI and then the machine is fine. What could be the cause of the problem?


Thanks
 
What OS and config (qm config <vimid>) are you running?

he machine is running fine just seems like network is not working. Workaround is resetting the interface from the UI and then the machine is fine.
What do you mean by that, a reset of the VM or do you reset something inside the VM?
 
I have a similar problem. I have a CentOS 7 VM on Proxmox 5.0 and the network stops working completely (can't ping anything) under heavy traffic. When it happens, the only way I found to restore connectivity besides rebooting the VM is to rmmod and modprobe the virtio-net driver on the VM. There doesn't seem to be any useful information in the system log. I tried upgrading the VM's kernel to 4.13 from the ELRepo kernel-ml repo but it doesn't fix the problem.
 
Last edited:
Hello martinb,

I had the same issue last two weeks, suddenly arp packets are not received by the virtual machine. can you give more information about your hardware? e.g. NIC.

I didnt found any logs or issues, after upgrading NIC firmware and NIC driver, the issue seems to be gone.
 
Hi helloworld, I had the problem with both QLogic / Broadcom NetXtreme II (bnx2x) and Intel X710 (i40e) hardware. In both cases my host interfaces were using vlan tagging.
 
Hello Alwin,

Here is the output of the config
Code:
root@compute9:~# qm config 175
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = (unset),
    LC_ALL = (unset),
    LC_CTYPE = “UTF-8”,
    LANG = “en_US.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to a fallback locale (“en_US.UTF-8”).
agent: 1
balloon: 0
boot: cdn
bootdisk: virtio0
cores: 2
ide2: none,media=cdrom
memory: 12288
name: VMNAME
net0: virtio=FE:03:1E:0F:44:92,bridge=vmbr405
numa: 1
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=44709ae7-4f4f-4f76-9fae-a059415c0d48
sockets: 1
virtio0: ssdpool2tb:vm-175-disk-1,iothread=1,size=50G

Machine is running CentOS Linux release 7.3.1611 (Core).
I'm resetting the network device from the Proxmox web portal. From the disconnect option.
I have experienced the same issue again.

Thanks,
 
Hi helloworld, I had the problem with both QLogic / Broadcom NetXtreme II (bnx2x) and Intel X710 (i40e) hardware. In both cases my host interfaces were using vlan tagging.

I'm using vlan tagging as well, exact the same behavior, when I click disconnect, vm is reachable.

I'm using vlan tagging with openvswtich:


root@:~# qm config asd
agent: 1
balloon: 0
bootdisk: virtio0
cores: 16
cpu: host
memory: 65536
name: asd
net0: virtio=8A:02:B6:42:69:EB,bridge=vmbr0,tag=asd
numa: 0
ostype: l26
scsihw: virtio-scsi-pci
smbios1: uuid=8417b003-ef01-4b9f-b886-1aa7e1ceb6a1
sockets: 2
virtio0: asd-asd-disk-1,size=128G

with latest driver, firmware:

root@:~# ethtool -i ens2f2
driver: i40e
version: 2.1.26
firmware-version: 6.01 0x80003493 0.0.0
expansion-rom-version:
bus-info: 0000:02:00.2
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: yes
supports-priv-flags: yes
 
I get this issue too. I have switched my machines to the vmxnet3 network driver instead of virtio for the time being. e1000 accumulates RX errors for some reason while vmxnet3 doesn't. There is unfortunately some performance lost.

I would really like to figure out why virtio_net keeps falling over when under load. Like the other poster I can only fix my network by modprobe -r virtio_net and then modprobe virtio_net. After doing that it pops right back up as if nothing happened.
 
@helloworld:
> I had the same issue last two weeks, suddenly arp packets are not received by the virtual machine.

* can you see the arp packets when running tcpdump against the tap device of the VM ?
* also which kind of load would make the problem reproduciblle ? does the problem happens with linux bridges too ?
 
@manu

I testet this with openvswitch and bridge, both issue persists. I see arp packages on tap interface aswell inside VM (without vlan tag = correct). arp packets are not being replied until rmmod virtio / or connect/disconnect via gui

I guess main focus should be virtio
 
Hello,
we are facing similar issues with 2 heavy network loaded VMs after upgrading Proxmox 4.x to 5.0.

The first VM runs Debian 8 and Zen LB.
The second VM runs Debian 9 and Icecast 2.

Both VMs lost their network connectivity within the last 24 hours after the upgrade.
The VMs could not ping any other VMs or even the gateway.
The VMs could not be pinged from other VMs or the host system.
The VMs were only accessible by the novnc console.
A reboot of the VM restored the network connectivity.
A ifdown/ifup cycle or an network restart not.

The reloading of virtio_net we will try the next time.

The VMs are running in bridged mode with virtio.
The bridge interfaces are controlled by openvswitch:
Code:
ovs-vsctl (Open vSwitch) 2.7.0
DB Schema 7.14.0

The pve-firewall is disabled.
 
oh yes, actually this documention is avaible on each installed pve host, so you can replace the intern.lab address with your PVE hostname, so you will have the documentation which matches the installed version of PVE.
 
Right now it happend again on our icecast VM.

This didn't solve it alone:
Code:
rmmod virtio_net
modprobe virtio_net

Had to restart the network. ens18 was missing. ifup didn't work (already configured):
Code:
/etc/init.d/networking restart

dmesg is showing some info:
[123299.606929] ------------[ cut here ]------------
[123299.606933] WARNING: CPU: 1 PID: 764 at /build/linux-EAZfyE/linux-4.9.51/kernel/cpu.c:1705 __cpuhp_remove_state+0x120/0x130
[123299.606934] Error: Removing state 11 which has instances left.
[123299.606934] Modules linked in: hid_generic usbhid hid joydev evdev serio_raw ppdev sg parport_pc shpchp parport virtio_balloon button ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache sr_mod cdrom ata_generic dm_mod virtio_net(-) virtio_blk floppy uhci_hcd ehci_hcd usbcore usb_common i2c_piix4 psmouse virtio_pci virtio_ring virtio ata_piix libata scsi_mod
[123299.606952] CPU: 1 PID: 764 Comm: rmmod Not tainted 4.9.0-4-amd64 #1 Debian 4.9.51-1
[123299.606953] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[123299.606954] 0000000000000000 ffffffffae129974 ffffb9120062be58 0000000000000000
[123299.606955] ffffffffade76eae 0000000000000000 ffffb9120062beb0 ffffffffaea34318
[123299.606957] 0000000000000800 0000000000000000 0000558b468081f0 ffffffffade76f2f
[123299.606958] Call Trace:
[123299.606962] [<ffffffffae129974>] ? dump_stack+0x5c/0x78
[123299.606963] [<ffffffffade76eae>] ? __warn+0xbe/0xe0
[123299.606965] [<ffffffffade76f2f>] ? warn_slowpath_fmt+0x5f/0x80
[123299.606966] [<ffffffffade78dc0>] ? __cpuhp_remove_state+0x120/0x130
[123299.606968] [<ffffffffc037ffd4>] ? virtio_net_driver_exit+0xc/0x38 [virtio_net]
[123299.606970] [<ffffffffadefd53c>] ? SyS_delete_module+0x18c/0x260
[123299.606972] [<ffffffffade0326c>] ? exit_to_usermode_loop+0x8c/0xb0
[123299.606974] [<ffffffffae4085bb>] ? system_call_fast_compare_end+0xc/0x9b
[123299.606975] ---[ end trace 5a8eedf0f5bf00c6 ]---
[123299.606976] ------------[ cut here ]------------
[123299.606977] WARNING: CPU: 1 PID: 764 at /build/linux-EAZfyE/linux-4.9.51/kernel/cpu.c:1705 __cpuhp_remove_state+0x120/0x130
[123299.606977] Error: Removing state 123 which has instances left.
[123299.606977] Modules linked in: hid_generic usbhid hid joydev evdev serio_raw ppdev sg parport_pc shpchp parport virtio_balloon button ip_tables x_tables autofs4 ext4 crc16 jbd2 crc32c_generic fscrypto ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache sr_mod cdrom ata_generic dm_mod virtio_net(-) virtio_blk floppy uhci_hcd ehci_hcd usbcore usb_common i2c_piix4 psmouse virtio_pci virtio_ring virtio ata_piix libata scsi_mod
[123299.606988] CPU: 1 PID: 764 Comm: rmmod Tainted: G W 4.9.0-4-amd64 #1 Debian 4.9.51-1
[123299.606989] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[123299.606989] 0000000000000000 ffffffffae129974 ffffb9120062be58 0000000000000000
[123299.606990] ffffffffade76eae 0000000000000000 ffffb9120062beb0 ffffffffaea33cf8
[123299.606992] 0000000000000800 0000000000000000 0000558b468081f0 ffffffffade76f2f
[123299.606993] Call Trace:
[123299.606994] [<ffffffffae129974>] ? dump_stack+0x5c/0x78
[123299.606995] [<ffffffffade76eae>] ? __warn+0xbe/0xe0
[123299.606997] [<ffffffffade76f2f>] ? warn_slowpath_fmt+0x5f/0x80
[123299.606998] [<ffffffffade78dc0>] ? __cpuhp_remove_state+0x120/0x130
[123299.607000] [<ffffffffc037ffe1>] ? virtio_net_driver_exit+0x19/0x38 [virtio_net]
[123299.607001] [<ffffffffadefd53c>] ? SyS_delete_module+0x18c/0x260
[123299.607002] [<ffffffffade0326c>] ? exit_to_usermode_loop+0x8c/0xb0
[123299.607004] [<ffffffffae4085bb>] ? system_call_fast_compare_end+0xc/0x9b
[123299.607005] ---[ end trace 5a8eedf0f5bf00c7 ]---
[123315.035439] virtio_net virtio2 ens18: renamed from eth0
[123349.197772] TCP: request_sock_TCP: Possible SYN flooding on port 80. Sending cookies. Check SNMP counters.

syslog isn't helping:
Oct 10 20:17:01 icecast-01 CRON[715]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 10 20:17:01 icecast-01 CRON[715]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 10 20:19:02 icecast-01 systemd-timesyncd[338]: Timed out waiting for reply from 148.251.68.100:123 (0.debian.pool.ntp.org).
 
Code:
rmmod virtio_net
modprobe virtio_net

Had to restart the network. ens18 was missing. ifup didn't work (already configured):
Code:
/etc/init.d/networking restart

yes you have to ifdown, then ifup in that case

@helloworld @nowrap

can you test if the kernel pve-kernel-4.10.17-4-pve on the pvetest repository fixes the virtio connectivity issue for you ?
this has kernel fixes a bug which occurs with virtio guests having to process a large number of connections
 
Hi,

We also have this kind of issue for the last two weeks. It happens sometimes under avg/high load (KVM - debian 8 64 bits).
We have to reboot the VM or disconnect/reconnect the virtio interface via GUI.

Do you suggest to switch the net virtio to e1000 if the latest kernel doesn't fix this issue ?

Code:
proxmox-ve: 5.0-20 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-20
pve-kernel-4.10.15-1-pve: 4.10.15-15
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-4
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.11-pve17~bpo90
ceph: 12.1.2-pve1
 
Hi, we have the issue too, it seem when you disable gso and tso on the virtual servers it looks like the issue is gone.
It looks like a bug and whe you disable the segmentation the issue is gone.
ethtool -K eth0 gso off
ethtool -K eth0 tso off
ethtool -k eth0 | grep segment

It is happening on a machines with heavy traffic load.
 
it would be interesting to have a feedback if the pve-test kernel I mentioned fixes the issue or not that way the workaround mentionned would not be needed
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!