Problem when live migrating from Intel to AMD server

RomainM

New Member
Mar 18, 2018
2
0
1
33
Hi everyone.

I have 3 servers : 2 are based on Intel CPUs, and 1 is based on AMD CPU.
They all are in the last promox 5 version :
Kernel Version
Linux 4.13.13-6-pve #1 SMP PVE 4.13.13-42 (Fri, 9 Mar 2018 11:55:18 +0100
PVE Manager Version
pve-manager/5.1-46/ae8241d4


My VMs are having kernel panic when I do something like that (when the
vm goes from Intel server to AMD, or from AMD to intel, I don't have any
problem between the two Intel servers) : qm migrate 110 hv2 -online
-with-local-disks -migration_type insecure

VMs are debian 9, up to date.

I don't have many logs, but here is the trace :

278.881292] BUG: unable to handle kernel paging request at
ffffffff9f6573f3
[ 278.883451] IP: [<ffffffff9f6573f3>] kvm_kick_cpu+0x23/0x30
[ 278.885035] PGD 19e0d067 [ 278.885771] PUD 19e0e063
PMD 192000e1 [ 278.886974]
[ 278.887473] Oops: 0003 [#1] SMP
[ 278.888407] Modules linked in: binfmt_misc ppdev bochs_drm joydev
evdev pcspkr serio_raw ttm drm_kms_helper drm shpchp parport_pc
virtio_console parport virtio_balloon button sunrpc ip_tables x_tables
autofs4 hid_generic usbhid hid ext4 crc16 jbd2 crc32c_generic fscrypto
ecb glue_helper lrw gf128mul ablk_helper cryptd aes_x86_64 mbcache
ata_generic virtio_net virtio_blk psmouse ata_piix floppy libata
scsi_mod virtio_pci virtio_ring virtio i2c_piix4 uhci_hcd ehci_hcd
usbcore usb_common
[ 278.902629] CPU: 1 PID: 404 Comm: snmpd Not tainted 4.9.0-5-amd64 #1
Debian 4.9.65-3+deb9u2
[ 278.905019] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
[ 278.908399] task: ffff91f69c988100 task.stack: ffffa6d840460000
[ 278.910101] RIP: 0010:[<ffffffff9f6573f3>] [<ffffffff9f6573f3>]
kvm_kick_cpu+0x23/0x30
[ 278.912456] RSP: 0018:ffffa6d840463ba0 EFLAGS: 00010046
[ 278.913929] RAX: 0000000000000005 RBX: 0000000000000000 RCX:
0000000000000000
[ 278.915915] RDX: ffff91f69fc00000 RSI: 0000000000000100 RDI:
0000000000000000
[ 278.917930] RBP: 0000000000000000 R08: 0000000000000100 R09:
ffff91f69ffcb6c0
[ 278.919885] R10: 00000000000000c0 R11: 000000000001e005 R12:
ffff91f69f00ac1c
[ 278.921845] R13: 0000000000000000 R14: 0000000000000046 R15:
ffff91f69fc18940
[ 278.923777] FS: 00007fe78b549f80(0000) GS:ffff91f69fd00000(0000)
knlGS:0000000000000000
[ 278.926084] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 278.927725] CR2: ffffffff9f6573f3 CR3: 000000001a9e8000 CR4:
0000000000000670
[ 278.929787] Stack:
[ 278.930407] ffff91f69f00a500 ffffffff9f6c0c90 ffffffff9f6c06f1
000000000001e005
[ 278.932698] 0000000000017225 0000000000000001 ffffffff9fe12ec0
ffff91f69fc18940
[ 278.934996] 0000000000000003 0000000000000000 00000040ec7198b4
ffffffff9f6c071b
[ 278.937280] Call Trace:
[ 278.937998] [<ffffffff9f6c0c90>] ?
__pv_queued_spin_unlock_slowpath+0xa0/0xd0
[ 278.940225] [<ffffffff9f6c06f1>] ?
__raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20
[ 278.942737] [<ffffffff9f6c071b>] ? .slowpath+0x9/0xe
[ 278.944205] [<ffffffff9f6a0e3e>] ? try_to_wake_up+0x18e/0x3a0
[ 278.945883] [<ffffffff9f6b8109>] ? __wake_up_common+0x49/0x80
[ 278.947551] [<ffffffff9f84b5c9>] ? ep_poll_callback+0xa9/0x230
[ 278.949281] [<ffffffff9f6b8109>] ? __wake_up_common+0x49/0x80
[ 278.951980] [<ffffffff9f6b844d>] ? __wake_up_sync_key+0x3d/0x60
[ 278.953731] [<ffffffff9fae7489>] ? sock_def_readable+0x39/0x60
[ 278.955607] [<ffffffff9fba8de6>] ? unix_dgram_sendmsg+0x396/0x720
[ 278.957357] [<ffffffff9fae3880>] ? sock_sendmsg+0x30/0x40
[ 278.958861] [<ffffffff9fae3e03>] ? SYSC_sendto+0xd3/0x150
[ 278.960432] [<ffffffff9f8029a4>] ? vfs_write+0x144/0x190
[ 278.961934] [<ffffffff9f803d52>] ? SyS_write+0x52/0xc0
[ 278.963406] [<ffffffff9f60326c>] ? exit_to_usermode_loop+0x8c/0xb0
[ 278.965187] [<ffffffff9fc0761e>] ? system_call_fast_compare_end+0xc/0xb7
[ 278.967129] Code: f3 c3 66 0f 1f 44 00 00 0f 1f 44 00 00 48 63 ff 53
48 c7 c0 fc d1 00 00 48 8b 14 fd c0 93 06 a0 31 db 0f b7 0c 02 b8 05 00
00 00 <0f> 01 c1 5b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
[ 279.048279] RIP [<ffffffff9f6573f3>] kvm_kick_cpu+0x23/0x30
[ 279.087012] RSP <ffffa6d840463ba0>
[ 279.126480] CR2: ffffffff9f6573f3
[ 279.164879] ---[ end trace 2d3bf8ed49aa5ab1 ]---

Do you know if it's a bug ? A misconfiguration ? Any advice to
trobleshoot this problem ?

If you need more informations, tell me. Thanks for the help.
 
please post your VM config.

> qm config VMID
 
Dear all, I have read about that in kvm mail list. If I have good read that you can't to do live migration between Intel and AMD. It works only for Intel to Intel or AMD to AMD.
 
Thanks for your replies.

That's a bad news :(

But, here my qm config VMID :
:~# qm config 110
agent: 1
balloon: 0
bootdisk: virtio0
cores: 2
cpu: kvm64
keyboard: fr
memory: 512
name: vm-test
net0: virtio=FA:5A:0E:58:25:E6,bridge=vmbr0,tag=4
numa: 0
onboot: 1
ostype: l26
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=4455415f-6266-4c03-8532-b8999ce98258
sockets: 1
virtio0: lvm-vms:vm-110-disk-1,format=raw,size=16G

EDIT : Oh sorry, I missed this bug when I did some resarch before posting : https://bugzilla.proxmox.com/show_bug.cgi?id=1660

Maybe we can just continue on the bugtracker ?
 
Last edited:
Thought I jump into this conversation.

So my little setup had a node change, from an all intel setup to an amd and intel.

I thought it a bit strange the kernel dumping when migrating between nodes.

From further investigation the KVM project says it should work and in theory it does, but I believe this is hitting a bug or a cpu feature set that makes it bomb out.

After doing many kdump and kvm debugging concluded there was a cpu feature set that was causing this issue.

Looking at 'Tuning_KVM' page google it-

Setting as example for proxmox - qm set *VMID* --args "-cpu 'kvm64,+ssse3,+sse4.1,+sse4.2,+x2apic'"

Seems the most compatible and so far no kernel dumps.

The following cpu features seem to be the same for intel and amd

fpu,vme,de,pse,tsc,msr,pae,mce,cx8,apic,sep,mtrr,pge,mca,cmov,pat,pse36,clflush,mmx
fxsr,sse,sse2,ht,syscall,nx,pdpe1gb,rdtscp,lm,constant_tsc,rep_good,nopl,nonstop_tsc
cpuid,aperfmperf,pni,pclmulqdq,monitor,ssse3,fma,cx16,sse4_1,sse4_2,popcnt,aes,xsave,avx
f16c,lahf_lm,abm,id,fsgsbase,bmi1,xsaveopt,arat

Probably worth checking to see what works.

Hope this helps?
 
Hi,

In my case, live-migration from AMD -> INTEL it is done without any problem(CPU model is "host" type). The same is in reverse(INTEL->AMD). I test this as soon that I have seen this post!
 
Interesting I had assumed using 'host' would cause even further issues? for live migration!

Any way I think a bug upstream report to KVM might help understand the issue if it is a bug that is?