Problem when live migrating from Intel to AMD server

RomainM

New Member
Mar 18, 2018
2
0
1
31
Hi everyone.

I have 3 servers : 2 are based on Intel CPUs, and 1 is based on AMD CPU.
They all are in the last promox 5 version :
Kernel Version
Linux 4.13.13-6-pve #1 SMP PVE 4.13.13-42 (Fri, 9 Mar 2018 11:55:18 +0100
PVE Manager Version
pve-manager/5.1-46/ae8241d4


My VMs are having kernel panic when I do something like that (when the
vm goes from Intel server to AMD, or from AMD to intel, I don't have any
problem between the two Intel servers) : qm migrate 110 hv2 -online
-with-local-disks -migration_type insecure

VMs are debian 9, up to date.

I don't have many logs, but here is the trace :

278.881292] BUG: unable to handle kernel paging request at
ffffffff9f6573f3
[ 278.883451] IP: [<ffffffff9f6573f3>] kvm_kick_cpu+0x23/0x30
[ 278.885035] PGD 19e0d067 [ 278.885771] PUD 19e0e063
PMD 192000e1 [ 278.886974]
[ 278.887473] Oops: 0003 [#1] SMP
[ 278.888407] Modules linked in: binfmt_misc ppdev bochs_drm joydev
evdev pcspkr serio_raw ttm drm_kms_helper drm shpchp parport_pc
virtio_console parport virtio_balloon button sunrpc ip_tables x_tables
autofs4 hid_generic usbhid hid ext4 crc16 jbd2 crc32c_generic fscrypto
ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache
ata_generic virtio_net virtio_blk psmouse ata_piix floppy libata
scsi_mod virtio_pci virtio_ring virtio i2c_piix4 uhci_hcd ehci_hcd
usbcore usb_common
[ 278.902629] CPU: 1 PID: 404 Comm: snmpd Not tainted 4.9.0-5-amd64 #1
Debian 4.9.65-3+deb9u2
[ 278.905019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[ 278.908399] task: ffff91f69c988100 task.stack: ffffa6d840460000
[ 278.910101] RIP: 0010:[<ffffffff9f6573f3>] [<ffffffff9f6573f3>]
kvm_kick_cpu+0x23/0x30
[ 278.912456] RSP: 0018:ffffa6d840463ba0 EFLAGS: 00010046
[ 278.913929] RAX: 0000000000000005 RBX: 0000000000000000 RCX:
0000000000000000
[ 278.915915] RDX: ffff91f69fc00000 RSI: 0000000000000100 RDI:
0000000000000000
[ 278.917930] RBP: 0000000000000000 R08: 0000000000000100 R09:
ffff91f69ffcb6c0
[ 278.919885] R10: 00000000000000c0 R11: 000000000001e005 R12:
ffff91f69f00ac1c
[ 278.921845] R13: 0000000000000000 R14: 0000000000000046 R15:
ffff91f69fc18940
[ 278.923777] FS: 00007fe78b549f80(0000) GS:ffff91f69fd00000(0000)
knlGS:0000000000000000
[ 278.926084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 278.927725] CR2: ffffffff9f6573f3 CR3: 000000001a9e8000 CR4:
0000000000000670
[ 278.929787] Stack:
[ 278.930407] ffff91f69f00a500 ffffffff9f6c0c90 ffffffff9f6c06f1
000000000001e005
[ 278.932698] 0000000000017225 0000000000000001 ffffffff9fe12ec0
ffff91f69fc18940
[ 278.934996] 0000000000000003 0000000000000000 00000040ec7198b4
ffffffff9f6c071b
[ 278.937280] Call Trace:
[ 278.937998] [<ffffffff9f6c0c90>] ?
__pv_queued_spin_unlock_slowpath+0xa0/0xd0
[ 278.940225] [<ffffffff9f6c06f1>] ?
__raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
[ 278.942737] [<ffffffff9f6c071b>] ? .slowpath+0x9/0xe
[ 278.944205] [<ffffffff9f6a0e3e>] ? try_to_wake_up+0x18e/0x3a0
[ 278.945883] [<ffffffff9f6b8109>] ? __wake_up_common+0x49/0x80
[ 278.947551] [<ffffffff9f84b5c9>] ? ep_poll_callback+0xa9/0x230
[ 278.949281] [<ffffffff9f6b8109>] ? __wake_up_common+0x49/0x80
[ 278.951980] [<ffffffff9f6b844d>] ? __wake_up_sync_key+0x3d/0x60
[ 278.953731] [<ffffffff9fae7489>] ? sock_def_readable+0x39/0x60
[ 278.955607] [<ffffffff9fba8de6>] ? unix_dgram_sendmsg+0x396/0x720
[ 278.957357] [<ffffffff9fae3880>] ? sock_sendmsg+0x30/0x40
[ 278.958861] [<ffffffff9fae3e03>] ? SYSC_sendto+0xd3/0x150
[ 278.960432] [<ffffffff9f8029a4>] ? vfs_write+0x144/0x190
[ 278.961934] [<ffffffff9f803d52>] ? SyS_write+0x52/0xc0
[ 278.963406] [<ffffffff9f60326c>] ? exit_to_usermode_loop+0x8c/0xb0
[ 278.965187] [<ffffffff9fc0761e>] ? system_call_fast_compare_end+0xc/0xb7
[ 278.967129] Code: f3 c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 63 ff 53
48 c7 c0 fc d1 00 00 48 8b 14 fd c0 93 06 a0 31 db 0f b7 0c 02 b8 05 00
00 00 <0f> 01 c1 5b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
[ 279.048279] RIP [<ffffffff9f6573f3>] kvm_kick_cpu+0x23/0x30
[ 279.087012] RSP <ffffa6d840463ba0>
[ 279.126480] CR2: ffffffff9f6573f3
[ 279.164879] ---[ end trace 2d3bf8ed49aa5ab1 ]---

Do you know if it's a bug ? A misconfiguration ? Any advice to
trobleshoot this problem ?

If you need more informations, tell me. Thanks for the help.
 
please post your VM config.

> qm config VMID
 
Dear all, I have read about that in kvm mail list. If I have good read that you can't to do live migration between Intel and AMD. It works only for Intel to Intel or AMD to AMD.
 
Thanks for your replies.

That's a bad news :(

But, here my qm config VMID :
:~# qm config 110
agent: 1
balloon: 0
bootdisk: virtio0
cores: 2
cpu: kvm64
keyboard: fr
memory: 512
name: vm-test
net0: virtio=FA:5A:0E:58:25:E6,bridge=vmbr0,tag=4
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=4455415f-6266-4c03-8532-b8999ce98258
sockets: 1
virtio0: lvm-vms:vm-110-disk-1,format=raw,size=16G

EDIT : Oh sorry, I missed this bug when I did some resarch before posting : https://bugzilla.proxmox.com/show_bug.cgi?id=1660

Maybe we can just continue on the bugtracker ?
 
Last edited:
Thought I jump into this conversation.

So my little setup had a node change, from an all intel setup to an amd and intel.

I thought it a bit strange the kernel dumping when migrating between nodes.

From further investigation the KVM project says it should work and in theory it does, but I believe this is hitting a bug or a cpu feature set that makes it bomb out.

After doing many kdump and kvm debugging concluded there was a cpu feature set that was causing this issue.

Looking at 'Tuning_KVM' page google it-

Setting as example for proxmox - qm set *VMID* --args "-cpu 'kvm64,+ssse3,+sse4.1,+sse4.2,+x2apic'"

Seems the most compatible and so far no kernel dumps.

The following cpu features seem to be the same for intel and amd

fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx
fxsr,sse,sse2,ht,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,rep_good,nopl,nonstop_tsc
cpuid,aperfmperf,pni,pclmulqdq,monitor,ssse3,fma,cx16,sse4_1,sse4_2,popcnt,aes,xsave,avx
f16c,lahf_lm,abm,id,fsgsbase,bmi1,xsaveopt,arat

Probably worth checking to see what works.

Hope this helps?
 
Hi,

In my case, live-migration from AMD -> INTEL it is done without any problem(CPU model is "host" type). The same is in reverse(INTEL->AMD). I test this as soon that I have seen this post!
 
Interesting I had assumed using 'host' would cause even further issues? for live migration!

Any way I think a bug upstream report to KVM might help understand the issue if it is a bug that is?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!