PVE 4.4 - VM crashes after live migration if virt-net is highly loaded

stefws

Renowned Member
Jan 29, 2015
302
4
83
Denmark
siimnet.dk
Last two live migrations of a VM running relative much network traffic seemed to crash the VM on target host at resume in virt-net driver. See attached SD from target VM console.

~# pveversion -v
proxmox-ve: 4.4-101 (running kernel: 4.4.98-2-pve)
pve-manager: 4.4-20 (running version: 4.4-20/2650b7b5)
pve-kernel-4.4.98-2-pve: 4.4.98-101
pve-kernel-4.4.83-1-pve: 4.4.83-96
lvm2: 2.02.116-pve3
corosync-pve: 2.4.2-2~pve4+1
libqb0: 1.0.1-1
pve-cluster: 4.0-54
qemu-server: 4.0-113
pve-firmware: 1.1-11
libpve-common-perl: 4.0-96
libpve-access-control: 4.0-23
libpve-storage-perl: 4.0-76
pve-libspice-server1: 0.12.8-2
vncterm: 1.3-2
pve-docs: 4.4-4
pve-qemu-kvm: 2.9.1-5~pve4
pve-container: 1.0-104
pve-firewall: 2.0-33
pve-ha-manager: 1.0-41
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u3
lxc-pve: 2.0.7-4
lxcfs: 2.0.6-pve1
criu: 1.6.0-1
novnc-pve: 0.5-9
smartmontools: 6.5+svn4324-1~pve80
zfsutils: 0.6.5.9-pve15~bpo80
openvswitch-switch: 2.6.0-2
 

Attachments

  • HapA crashed during live VM migration.png
    HapA crashed during live VM migration.png
    71.2 KB · Views: 5
could you get the full stack trace (e.g., via serial?)
 
Weirdly enough other VMs with even higher network traffic but running nginx load balancers instead of HAproxy don't seem to crash during live migration. HAproxy VMs didn't crash either in the past, maybe it's due to a newer HAproxy version (1.7.9) that in past... VMs are otherwise similar, same virtio NICs, multiple queues though HAp VM got multiple queues on all its NICs, nginx only on the single NIC with large traffic, not on management NIC etc.
 
if you can reproduce it, it might still be worth it to collect a full stack trace and post it together with more details about guest config, host package versions, guest OS and package versions etc.
 
Managed to capture on serial console, see attached text file.

guest is a CentOS 6.9 just patched as of today.


CentOS release 6.9 (Final)
Kernel 4.14.12-1.el6.elrepo.x86_64 on an x86_64

hapA login:

root@n1:~# cat /etc/pve/qemu-server/400.conf
#HA proxy load balancer node A
bootdisk: virtio0
cores: 4
memory: 4096
name: hapA
net0: bridge=vmbr1,virtio=62:32:66:31:35:63,tag=40
net1: bridge=vmbr1,firewall=1,virtio=02:73:A0:7A:68:2A,queues=8,tag=41
net2: bridge=vmbr1,virtio=3A:62:33:66:61:31,queues=8,tag=42
net3: bridge=vmbr1,virtio=36:39:62:30:31:36,queues=4,tag=43
net4: bridge=vmbr1,virtio=32:36:61:35:63:63,queues=4,tag=44
net5: bridge=vmbr1,virtio=62:63:37:61:31:37,queues=4,tag=45
numa: 0
onboot: 1
ostype: l26
serial0: socket
smbios1: uuid=e92e51b6-f95a-49e4-9afd-ad730a3bdfb7
sockets: 2
tablet: 0
virtio0: vgA:vm-400-disk-1,cache=writeback,size=50G

Just let me know if you need more info...
 

Attachments

Is this with KPTI enabled on host and guest? if yes, you could try if it also occurs with KPTI disabled in the guest (obviously only for testing!)?
 
It happened but both with host on proxmox-ve: 4.4-101 (no KPTI) and now on proxmox-ve: 4.4-102 (KPTI)
KPTI is not enabled in guest. (Testing is a bit hard as we're talk full 24x7 production site :)
 
It happened but both with host on proxmox-ve: 4.4-101 (no KPTI) and now on proxmox-ve: 4.4-102 (KPTI)
KPTI is not enabled in guest. (Testing is a bit hard as we're talk full 24x7 production site :)

if you can reproduce it in a test environment (e.g., similar VM setup, artifically generated network load from the outside), it would probably be worthwhile to report it upstream. could be a bug in 4.14.12 (or CentOS' version thereof), or a bug in our 4.13 kernel or Qemu, or a combination.
 
Sorry haven't got the luxury of a similar testlab, only got an older PVE 3.4 lab :/

Guests are running EPEL kernel-ml 4.x, CentOS 6 it self are on kernel 2.6, also happened at least on previous EPEL kernel-ml 4.13.4-1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!