VM crash / kernel panic after live migration from one host to another

bushev

New Member
Jan 25, 2024
13
1
3
Hi guys,

I've recently incorporated a second host into my environment and have been testing the live migration process. Surprisingly, it doesn't seem to work as expected. About 75% of the time, I've had to forcefully reboot the VM after migration due to crashes.

I noticed a few posts about this issue here but they seem outdated and reference a previous major version of the PM kernel.

How are things holding up in 2024?

I'm running "Linux pve2 6.5.13-5-pve #1 SMP PREEMPT_DYNAMIC PMX 6.5.13-5 (2024-04-05T11:03Z) x86_64" on both nodes.

Markdown (GitHub flavored):
### 1st node

H/W path                        Device           Class          Description
===========================================================================
                                                 system         System Product Name (SKU)
/0                                               bus            Pro WS WRX90E-SAGE SE
/0/0                                             memory         64KiB BIOS
/0/13                                            memory         2MiB L1 cache
/0/14                                            memory         32MiB L2 cache
/0/15                                            memory         128MiB L3 cache
/0/16                                            processor      AMD Ryzen Threadripper PRO 7975WX 32-Cores
/0/19                                            memory         192GiB System Memory
/0/19/0                                          memory         96GiB DIMM Synchronous Registered (Buffered) 5600 MHz (0.2 ns)
/0/19/1                                          memory         [empty]
/0/19/2                                          memory         [empty]
/0/19/3                                          memory         [empty]
/0/19/4                                          memory         96GiB DIMM Synchronous Registered (Buffered) 5600 MHz (0.2 ns)
/0/19/5                                          memory         [empty]
/0/19/6                                          memory         [empty]
/0/19/7                                          memory         [empty]
/0/100/5.1/0                    eno1             network        Ethernet Controller X710 for 10GBASE-T
/0/100/5.1/0.1                  eno2             network        Ethernet Controller X710 for 10GBASE-T
/0/100/7.1/0.4/0                usb3             bus            xHCI Host Controller
/0/100/7.1/0.4/1                usb4             bus            xHCI Host Controller
/0/100/7.1/0.7/0                input10          input          HD-Audio Generic Rear Mic
/0/100/7.1/0.7/1                input11          input          HD-Audio Generic Front Mic
/0/100/7.1/0.7/2                input12          input          HD-Audio Generic Line Out
/0/100/7.1/0.7/3                input13          input          HD-Audio Generic Front Headphone
/0/100/14                                        bus            FCH SMBus Controller
/0/100/14.3                                      bridge         FCH LPC Bridge


### 2st node

H/W path           Device           Class          Description
==============================================================
                                    system         MS-7C77 (Default string)
/0                                  bus            MEG Z490I UNIFY (MS-7C77)
/0/0                                memory         64KiB BIOS
/0/3a                               memory         64GiB System Memory
/0/3a/0                             memory         32GiB DIMM DDR4 Synchronous 3600 MHz (0.3 ns)
/0/3a/1                             memory         [empty]
/0/3a/2                             memory         32GiB DIMM DDR4 Synchronous 3600 MHz (0.3 ns)
/0/3a/3                             memory         [empty]
/0/47                               memory         640KiB L1 cache
/0/48                               memory         2560KiB L2 cache
/0/49                               memory         20MiB L3 cache
/0/4a                               processor      Intel(R) Core(TM) i9-10900K CPU @ 3.70GHz
/0/100                              bridge         Comet Lake-S 6c Host Bridge/DRAM Controller
/0/100/1                            bridge         6th-10th Gen Core Processor PCIe Controller (x16)

Is it possible that the crashes occur because the hardware between both hosts is too different? Additionally, I've observed the chances of a crash appear to be higher when I move VMs with large drives over a 2.5G network, which tends to take around 20 minutes.

Looking forward to your thoughts!
 

Attachments

  • photo_2024-04-24 16.55.57.jpeg
    photo_2024-04-24 16.55.57.jpeg
    140.8 KB · Views: 15