Online migration stuck VM

tech-medi

New Member
Aug 31, 2023
7
0
1
Hi all,
it's seems not a new problem, but we have a cluster with ceph and we want to add a new server, all servers are on last 7.4 version.
1. 48 x Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz (2 Sockets)
2. 48 x Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz (2 Sockets)
3. 48 x Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz (2 Sockets)
4. 48 x Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz (2 Sockets)
5. 40 x Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (2 Sockets) (aka. pve9)

we test a vm migration with 2 cpu (it's an ubuntu) with a default setting.
when we move the vm 2->5 the vm goes 100% and syslog in host N°5 got a kernel error :

Code:
Aug 31 16:28:44 pve9 QEMU[3627]: kvm: warning: TSC frequency mismatch between VM (2095078 kHz) and host (2297338 kHz), and TSC scaling unavailable
Aug 31 16:28:44 pve9 QEMU[3627]: kvm: warning: TSC frequency mismatch between VM (2095078 kHz) and host (2297338 kHz), and TSC scaling unavailable
Aug 31 16:28:45 pve9 kernel: [   57.675015] ------------[ cut here ]------------
Aug 31 16:28:45 pve9 kernel: [   57.675056] WARNING: CPU: 26 PID: 3791 at arch/x86/kvm/x86.c:10883 kvm_arch_vcpu_ioctl_run+0x16b3/0x17f0 [kvm]
Aug 31 16:28:45 pve9 kernel: [   57.675271] Modules linked in: veth 8021q garp mrp ebtable_filter ebtables ip_set ip6table_raw iptable_raw ip6table_filter ip6_tables iptable_filter bpfilter sctp ip6_udp_tunnel udp_tunnel nf_tables softdog bonding tls nfnetlink_log nfnetlink ipmi_ssif intel_rapl_msr intel_rapl_common sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 aesni_intel crypto_simd cryptd rapl mgag200 drm_shmem_helper acpi_ipmi drm_kms_helper intel_cstate pcspkr efi_pstore i2c_algo_bit ioatdma syscopyarea ipmi_si input_leds joydev sysfillrect ipmi_devintf sysimgblt hpilo dca mac_hid acpi_tad ipmi_msghandler acpi_power_meter vhost_net vhost vhost_iotlb tap ib_iser rdma_cm iw_cm ib_cm ib_core iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi drm sunrpc ip_tables x_tables autofs4 zfs(PO) zunicode(PO) zzstd(O) zlua(O) zavl(PO) icp(PO) zcommon(PO) znvpair(PO) spl(O) usbkbd btrfs blake2b_generic xor
Aug 31 16:28:45 pve9 kernel: [   57.675363]  raid6_pq uas usb_storage hid_generic usbmouse usbhid hid simplefb xhci_pci i2c_i801 xhci_pci_renesas crc32_pclmul i2c_smbus uhci_hcd lpc_ich ehci_pci bnx2x xhci_hcd ehci_hcd hpsa mdio tg3 libcrc32c scsi_transport_sas wmi
Aug 31 16:28:45 pve9 kernel: [   57.675588] CPU: 26 PID: 3791 Comm: CPU 1/KVM Tainted: P           O       6.2.16-4-bpo11-pve #1
Aug 31 16:28:45 pve9 kernel: [   57.675615] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 01/12/2023
Aug 31 16:28:45 pve9 kernel: [   57.675638] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x16b3/0x17f0 [kvm]
Aug 31 16:28:45 pve9 kernel: [   57.675753] Code: 00 08 0f 84 d7 f8 ff ff e9 4b fe ff ff 49 8b 84 24 08 02 00 00 48 85 c0 0f 84 b1 f8 ff ff e9 8a fe ff ff 0f 0b e9 af fa ff ff <0f> 0b e9 8c fa ff ff 49 8b 44 24 38 a9 00 00 00 40 74 0f f0 41 80
Aug 31 16:28:45 pve9 kernel: [   57.675797] RSP: 0018:ffffb4b0a33e3cc0 EFLAGS: 00010202
Aug 31 16:28:45 pve9 kernel: [   57.675818] RAX: 0000000000000001 RBX: ffff9eb09670d000 RCX: 0000000000000000
Aug 31 16:28:45 pve9 kernel: [   57.675838] RDX: 000035c0ffa14e00 RSI: 00000000fffffe01 RDI: ffff9eb08b988000
Aug 31 16:28:45 pve9 kernel: [   57.675858] RBP: ffffb4b0a33e3d58 R08: 0000000000000001 R09: 0000000000000000
Aug 31 16:28:45 pve9 kernel: [   57.675878] R10: 0000000000000000 R11: 0000000000000000 R12: ffff9eb08b988000
Aug 31 16:28:45 pve9 kernel: [   57.675897] R13: ffff9eb08b988000 R14: ffff9eb08b988048 R15: ffff9eb09eaf4c00
Aug 31 16:28:45 pve9 kernel: [   57.675917] FS:  00007fb1237fe700(0000) GS:ffff9ecf7fc00000(0000) knlGS:0000000000000000
Aug 31 16:28:45 pve9 kernel: [   57.675939] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 31 16:28:45 pve9 kernel: [   57.676817] CR2: 000055d3155aa0a8 CR3: 00000001406a8005 CR4: 00000000001726e0
Aug 31 16:28:45 pve9 kernel: [   57.677510] Call Trace:
Aug 31 16:28:45 pve9 kernel: [   57.678213]  <TASK>
Aug 31 16:28:45 pve9 kernel: [   57.678848]  ? __smp_call_single_queue+0x59/0x90
Aug 31 16:28:45 pve9 kernel: [   57.679483]  ? ttwu_queue_wakelist+0x12f/0x1c0
Aug 31 16:28:45 pve9 kernel: [   57.680080]  kvm_vcpu_ioctl+0x24f/0x6d0 [kvm]
Aug 31 16:28:45 pve9 kernel: [   57.680735]  ? kvm_vcpu_ioctl+0x2b8/0x6d0 [kvm]
Aug 31 16:28:45 pve9 kernel: [   57.681387]  ? futex_wake+0x7c/0x190
Aug 31 16:28:45 pve9 kernel: [   57.682002]  ? __fget_light.part.0+0x8c/0xd0
Aug 31 16:28:45 pve9 kernel: [   57.682634]  __x64_sys_ioctl+0x95/0xd0
Aug 31 16:28:45 pve9 kernel: [   57.683201]  do_syscall_64+0x5c/0x90
Aug 31 16:28:45 pve9 kernel: [   57.683765]  ? do_futex+0xbd/0x1d0
Aug 31 16:28:45 pve9 kernel: [   57.684283]  ? exit_to_user_mode_prepare+0x37/0x180
Aug 31 16:28:45 pve9 kernel: [   57.684804]  ? syscall_exit_to_user_mode+0x26/0x50
Aug 31 16:28:45 pve9 kernel: [   57.685361]  ? do_syscall_64+0x69/0x90
Aug 31 16:28:45 pve9 kernel: [   57.685904]  ? syscall_exit_to_user_mode+0x26/0x50
Aug 31 16:28:45 pve9 kernel: [   57.686462]  ? do_syscall_64+0x69/0x90
Aug 31 16:28:45 pve9 kernel: [   57.687003]  ? irqentry_exit_to_user_mode+0x9/0x20
Aug 31 16:28:45 pve9 kernel: [   57.687517]  ? irqentry_exit+0x3b/0x50
Aug 31 16:28:45 pve9 kernel: [   57.688030]  ? sysvec_call_function+0x4e/0x90
Aug 31 16:28:45 pve9 kernel: [   57.688542]  entry_SYSCALL_64_after_hwframe+0x72/0xdc
Aug 31 16:28:45 pve9 kernel: [   57.689057] RIP: 0033:0x7fb14cff8237
Aug 31 16:28:45 pve9 kernel: [   57.689567] Code: 00 00 00 48 8b 05 59 cc 0d 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 29 cc 0d 00 f7 d8 64 89 01 48
Aug 31 16:28:45 pve9 kernel: [   57.690721] RSP: 002b:00007fb1237f90c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 31 16:28:45 pve9 kernel: [   57.691356] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007fb14cff8237
Aug 31 16:28:45 pve9 kernel: [   57.691933] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 0000000000000028
Aug 31 16:28:45 pve9 kernel: [   57.692527] RBP: 0000564024c6f950 R08: 000056402291e240 R09: 0000564023022940
Aug 31 16:28:45 pve9 kernel: [   57.693138] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Aug 31 16:28:45 pve9 kernel: [   57.693880] R13: 0000564023029020 R14: 00007fb1237f9380 R15: 0000000000802000
Aug 31 16:28:45 pve9 systemd[1]: session-5.scope: Succeeded.
Aug 31 16:28:45 pve9 systemd[1]: session-5.scope: Consumed 2.806s CPU time.
Aug 31 16:28:45 pve9 kernel: [   57.694488]  </TASK>
Aug 31 16:28:45 pve9 kernel: [   57.695088] ---[ end trace 0000000000000000 ]---

we try to upgrade kernel to 6.2 but same issue with same error
we try top upgrade server bios but same issue with same error

don't know what to try

it's surprise me as the vm is an emulation, the underline layer should not impact the vm hardware view or instruction, but it seems....

if we launch the vm on server N°5 it works normally... any idea where to go ? kernel problem ? hardware problem ?...
 
Last edited:
What CPU type does the VM have? If it's "host" the problem is to be expected.
You need the lowest common denominator of the mixed physical CPUs for the VMs.
the vm is in default state cpu : kvm64, and even if we change it the error is here.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!