Memory hotplug: System is deadlocked on memory

leesteken

Distinguished Member
May 31, 2020
5,570
1,360
213
I recently had good experiences with Ubuntu 20.04 VMs and CPU & memory hotplug (on pve-kernel-5.15), even with PCIe passthrough. Since then I have switched to Linux Mint 21 (base on Ubuntu with kernel 5.15) and LMDE 5 (very similar, (based on Debian with kernel 5.10) and started using pve-kernel-5.19. I know that I have changed at least two things between a working state and my current issue, so I will do some testing with the latest 5.15 kernel (but that somewhat interferes with the PCIe passthrough).

Both variant of Mint (and with and without PCIe passthrough) regularly fail to start properly and show the following error (or very similar) on the virtual serial console of the VM:
Code:
[    1.009629] Kernel panic - not syncing: System is deadlocked on memory
[    1.010183] CPU: 3 PID: 8 Comm: kworker/u32:0 Not tainted 5.15.0-47-generic #51-Ubuntu
[    1.010720] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
[    1.011249] Workqueue: events_unbound async_run_entry_fn
[    1.011608] Call Trace:
[    1.011775]  <TASK>
[    1.011922]  show_stack+0x52/0x5c
[    1.012150]  dump_stack_lvl+0x4a/0x63
[    1.012400]  dump_stack+0x10/0x16
[    1.012627]  panic+0x149/0x321
[    1.012837]  out_of_memory.cold+0x75/0x97
[    1.013113]  __alloc_pages_slowpath.constprop.0+0x97e/0xa40
[    1.013476]  __alloc_pages+0x311/0x330
[    1.013476]  alloc_pages+0x9e/0x1e0
[    1.013476]  __page_cache_alloc+0x7e/0x90
[    1.013476]  pagecache_get_page+0x152/0x590
[    1.013476]  grab_cache_page_write_begin+0x21/0x40
[    1.013476]  simple_write_begin+0x29/0xa0
[    1.013476]  generic_perform_write+0xc1/0x1f0
[    1.013476]  __generic_file_write_iter+0x10f/0x1e0
[    1.013476]  generic_file_write_iter+0x68/0xc0
[    1.013476]  __kernel_write+0x14a/0x2e0
[    1.013476]  kernel_write+0x78/0x170
[    1.013476]  xwrite.constprop.0+0x36/0x6d
[    1.013476]  do_copy+0xbd/0x107
[    1.013476]  write_buffer+0x43/0x5a
[    1.013476]  flush_buffer+0x30/0x8e
[    1.013476]  ? initrd_load+0x48/0x48
[    1.013476]  __unzstd.constprop.0+0x353/0x431
[    1.013476]  ? write_buffer+0x5a/0x5a
[    1.013476]  ? __unzstd.constprop.0+0x431/0x431
[    1.013476]  unzstd+0xc/0x12
[    1.013476]  ? initrd_load+0x48/0x48
[    1.013476]  unpack_to_rootfs+0x17e/0x2c5
[    1.013476]  ? initrd_load+0x48/0x48
[    1.013476]  do_populate_rootfs+0x5e/0x112
[    1.013476]  async_run_entry_fn+0x33/0x120
[    1.013476]  process_one_work+0x22b/0x3d0
[    1.013476]  worker_thread+0x53/0x420
[    1.013476]  ? process_one_work+0x3d0/0x3d0
[    1.013476]  kthread+0x12a/0x150
[    1.013476]  ? set_kthread_struct+0x50/0x50
[    1.013476]  ret_from_fork+0x22/0x30
[    1.013476]  </TASK>

CPU hotplug and unplug works fine, even though Mint does not provide a pci_hotplug module. I just use the SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}="1" as a work-around.

Memory hotplug and unplug appears to work fine if the VM starts properly. I did not use SUBSYSTEM=="memory", ACTION=="add", TEST=="state", ATTR{state}=="offline", ATTR{state}="online" (although I have tried it to see if it would make a difference) because movable_node and memhp_default_state=online appeared to work fine. I also added memory_hotplug.memmap_on_memory=1. But all this is only when the VM does not panic and freeze (with one CPU at 100%) with the memory deadlock during boot.

Am I doing something wrong? Is there a know issue with the Linux Mint 21 and LMDE 5? Or might this be related to the new 5.19 kernel of Proxmox? Any suggestions of what to try and/or stories about similar experiences are appreciated.

PS: What does enabling USB hotplug do for a VM? Adding USB Spice ports or USB port/device passthrough still appears to require a VM shutdown.
 
Last edited:
I have more trouble reproducing this issue today than I had yesterday. Maybe the issue depends on memory usage and/or fragmentation over time?
The only VM that currently systematically has this issue, I cannot easily test with 5.15 due to the passthrough thingy. However I did find out that it boots fine with 21GB of memory or less but not with 22GB or more (23, 24 and 32)!
The System is deadlocked on memory error appears to come from the VM not adding/enabling/using new memory (quickly enough?), as I do see OOM-kills sometimes when memhp_default_state=online is not present and the udev-rule is also missing.
Is there a way to make Proxmox or QEMU/KVM start the VM with more memory pre-allocated (and maybe not removable)?

EDIT: It might be something like Red Hat Bug 1866360 but I don't know how to change the starting amount of memory. Also I noticed that repeatedly hot plugging and unplugging 2GB of memory causes errors about virtual dimm 35 or 36 already (or still) existing. Maybe there's a concurrency or failure handling issue there?

EDIT2: The Ubuntu 20.04 VM that worked before with 32GB hot plugged still works with kernel 5.19 and 22GB. Maybe it's a Linux Mint kernel configuration issue (even though it's based on Ubuntu)?
 
Last edited:
Hi!

The problem is still there in Proxmox 7.2-11
If the memory hotplug is activated, I get a kernel panic after a reboot.
The problem does not occur with Ubuntu VM, only with Almalinux and Rockylinux.
I tried several kernel versions, but there was a kernel panic with all of them.

I also often encounter such errors:
TASK ERROR: memory size (65536) must be aligned to 2048 for hotplugging

Thanks!
 

Attachments

  • Bildschirmfoto_2022-09-22_10-47-27.png
    Bildschirmfoto_2022-09-22_10-47-27.png
    53.5 KB · Views: 13
  • Bildschirmfoto_2022-09-21_14-30-41.png
    Bildschirmfoto_2022-09-21_14-30-41.png
    221.9 KB · Views: 14
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!