VM crash / kernel panic after live migration from one host to another

bushev

New Member
Jan 25, 2024
12
0
1
Hi guys,

I've recently incorporated a second host into my environment and have been testing the live migration process. Surprisingly, it doesn't seem to work as expected. About 75% of the time, I've had to forcefully reboot the VM after migration due to crashes.

I noticed a few posts about this issue here but they seem outdated and reference a previous major version of the PM kernel.

How are things holding up in 2024?

I'm running "Linux pve2 6.5.13-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) x86_64" on both nodes.

Markdown (GitHub flavored):
### 1st node

H/W path                        Device           Class          Description
===========================================================================
                                                 system         System Product Name (SKU)
/0                                               bus            Pro WS WRX90E-SAGE SE
/0/0                                             memory         64KiB BIOS
/0/13                                            memory         2MiB L1 cache
/0/14                                            memory         32MiB L2 cache
/0/15                                            memory         128MiB L3 cache
/0/16                                            processor      AMD Ryzen Threadripper PRO 7975WX 32-Cores
/0/19                                            memory         192GiB System Memory
/0/19/0                                          memory         96GiB DIMM Synchronous Registered (Buffered) 5600 MHz (0.2 ns)
/0/19/1                                          memory         [empty]
/0/19/2                                          memory         [empty]
/0/19/3                                          memory         [empty]
/0/19/4                                          memory         96GiB DIMM Synchronous Registered (Buffered) 5600 MHz (0.2 ns)
/0/19/5                                          memory         [empty]
/0/19/6                                          memory         [empty]
/0/19/7                                          memory         [empty]
/0/100/5.1/0                    eno1             network        Ethernet Controller X710 for 10GBASE-T
/0/100/5.1/0.1                  eno2             network        Ethernet Controller X710 for 10GBASE-T
/0/100/7.1/0.4/0                usb3             bus            xHCI Host Controller
/0/100/7.1/0.4/1                usb4             bus            xHCI Host Controller
/0/100/7.1/0.7/0                input10          input          HD-Audio Generic Rear Mic
/0/100/7.1/0.7/1                input11          input          HD-Audio Generic Front Mic
/0/100/7.1/0.7/2                input12          input          HD-Audio Generic Line Out
/0/100/7.1/0.7/3                input13          input          HD-Audio Generic Front Headphone
/0/100/14                                        bus            FCH SMBus Controller
/0/100/14.3                                      bridge         FCH LPC Bridge


### 2st node

H/W path           Device           Class          Description
==============================================================
                                    system         MS-7C77 (Default string)
/0                                  bus            MEG Z490I UNIFY (MS-7C77)
/0/0                                memory         64KiB BIOS
/0/3a                               memory         64GiB System Memory
/0/3a/0                             memory         32GiB DIMM DDR4 Synchronous 3600 MHz (0.3 ns)
/0/3a/1                             memory         [empty]
/0/3a/2                             memory         32GiB DIMM DDR4 Synchronous 3600 MHz (0.3 ns)
/0/3a/3                             memory         [empty]
/0/47                               memory         640KiB L1 cache
/0/48                               memory         2560KiB L2 cache
/0/49                               memory         20MiB L3 cache
/0/4a                               processor      Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
/0/100                              bridge         Comet Lake-S 6c Host Bridge/DRAM Controller
/0/100/1                            bridge         6th-10th Gen Core Processor PCIe Controller (x16)

Is it possible that the crashes occur because the hardware between both hosts is too different? Additionally, I've observed the chances of a crash appear to be higher when I move VMs with large drives over a 2.5G network, which tends to take around 20 minutes.

Looking forward to your thoughts!
 

Attachments

  • photo_2024-04-24 16.55.57.jpeg
    photo_2024-04-24 16.55.57.jpeg
    140.8 KB · Views: 10

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!