Remove VM hangs on lvremove command

hoss

Member
Mar 31, 2021
7
1
6
47
Proxmox hangs (high CPU load resulting in a required reboot of server) when trying to delete a VM via GUI if the hard disk of the VM is a LVM disk. Note I have tried several methods to ‘delete’ the VM, all roads lead to a hang on the ‘lvremove’ command which then hangs up udev and ALL disks on the server become slow resulting in a required reboot and/or run-away CPU load. In the images attached I opt to remove the VM config file by hand, confirm qm is not running the VM and use "lvremove -f" command by hand (hoping the -f will force it to work... it did not). I have 3 servers in a cluster and this happens on at least 2 (have not tested server 3).


see qm1.png pmox1-3.png t2-3.png
qm1.PNGpmox1.PNGpmox2.PNGpmox3.PNGt2.PNGt3.PNG

After running "qm list" and confirming 136 is no longer running I then do:
"lvremove -f /dev/vg-os/vm-136-disk-0"
This hangs and NEVER returns. Opening other terminals I can run "lvdisplay" this hangs and takes minutes to finish. Running top:

see cpuLoad.php
cpuLoad.PNG

Notice the kvm processes - they all will eventually shoot over 200%1617291515351.png

All VM’s on the box become un responsive.

Dmesg (parts so message is not to long)



[31180.696577] INFO: task kvm:18794 blocked for more than 120 seconds.

[31180.696606] Tainted: P O 5.4.106-1-pve #1

[31180.696633] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[31180.696668] kvm D 0 18794 1 0x00000000

[31180.696669] Call Trace:

[31180.696672] __schedule+0x2e6/0x700

[31180.696673] ? wbt_exit+0x30/0x30

[31180.696675] ? __wbt_done+0x40/0x40

[31180.696676] schedule+0x33/0xa0

[31180.696678] io_schedule+0x16/0x40

[31180.696680] rq_qos_wait+0xf7/0x170

[31180.696682] ? sysv68_partition+0x2d0/0x2d0

[31180.696684] ? wbt_exit+0x30/0x30

[31180.696685] wbt_wait+0x9f/0xe0

[31180.696687] __rq_qos_throttle+0x28/0x40

[31180.696690] blk_mq_make_request+0x11f/0x5b0

[31180.696693] generic_make_request+0xcf/0x310

[31180.696696] flush_bio_list+0x4e/0xf0 [raid1]

[31180.696698] raid1_unplug+0xc8/0xf0 [raid1]

[31180.696700] blk_flush_plug_list+0xb3/0x110

[31180.696703] blk_finish_plug+0x26/0x40

[31180.696705] blkdev_write_iter+0xbe/0x140

[31180.696708] aio_write+0xf1/0x190

[31180.696710] ? poll_select_finish+0x210/0x210

[31180.696712] ? poll_select_finish+0x210/0x210

[31180.696714] ? _cond_resched+0x19/0x30

[31180.696716] io_submit_one+0x77c/0xb20

[31180.696719] __x64_sys_io_submit+0xa9/0x190

[31180.696721] ? __x64_sys_io_submit+0xa9/0x190

[31180.696724] ? fput+0x13/0x20

[31180.696726] ? ksys_read+0xa3/0xe0

[31180.696729] do_syscall_64+0x57/0x190

[31180.696731] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[31180.696732] RIP: 0033:0x7fcdccc0ff59

[31180.696735] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 07 6f 0c 00 f7 d8 64 89 01 48

[31180.696736] RSP: 002b:00007fffcd884878 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1

[31180.696737] RAX: ffffffffffffffda RBX: 00007fcdc207ce20 RCX: 00007fcdccc0ff59

[31180.696738] RDX: 00007fffcd8848c0 RSI: 0000000000000001 RDI: 00007fcdc201c000

[31180.696739] RBP: 00007fcdc201c000 R08: 00007fcdc207f310 R09: 0000000000000001

[31180.696739] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001

[31180.696740] R13: 0000000000000005 R14: 00007fffcd8848c0 R15: 0000000000000001

[31180.696744] INFO: task kvm:19000 blocked for more than 120 seconds.

[31180.696774] Tainted: P O 5.4.106-1-pve #1

[31180.696800] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[31180.696833] kvm D 0 19000 1 0x00004000

[31180.696834] Call Trace:

[31180.696835] __schedule+0x2e6/0x700

[31180.696836] schedule+0x33/0xa0

[31180.696837] md_super_wait+0x74/0xb0

[31180.696838] ? wait_woken+0x80/0x80

[31180.696839] md_bitmap_wait_writes+0x9b/0xb0

[31180.696840] md_bitmap_unplug+0xfd/0x120

[31180.696841] flush_bio_list+0x23/0xf0 [raid1]

[31180.696842] raid1_unplug+0xc8/0xf0 [raid1]

[31180.696843] blk_flush_plug_list+0xb3/0x110

[31180.696844] blk_finish_plug+0x26/0x40

[31180.696845] __x64_sys_io_submit+0x122/0x190

[31180.696846] do_syscall_64+0x57/0x190

[31180.696847] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[31180.696847] RIP: 0033:0x7f8618d19f59

[31180.696849] Code: Bad RIP value.

[31180.696849] RSP: 002b:00007ffde2acb988 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1

[31180.696850] RAX: ffffffffffffffda RBX: 00007f860e186e20 RCX: 00007f8618d19f59

[31180.696850] RDX: 00007ffde2acb9d0 RSI: 000000000000001e RDI: 00007f860e126000

[31180.696850] RBP: 00007f860e126000 R08: 00007f860e189310 R09: 000000000000001e

[31180.696851] R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000001e

[31180.696851] R13: 0000000000000005 R14: 00007ffde2acb9d0 R15: 0000000000000001

[31180.696858] INFO: task kvm:19099 blocked for more than 120 seconds.

[31180.696872] Tainted: P O 5.4.106-1-pve #1

[31180.696885] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[31180.696902] kvm D 0 19099 1 0x00000000
.......


[31180.696953] INFO: task lvremove:188475 blocked for more than 120 seconds. ////<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

[31180.696968] Tainted: P O 5.4.106-1-pve #1

[31180.696982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

[31180.696999] lvremove D 0 188475 21657 0x00000000

[31180.697000] Call Trace:

[31180.697002] __schedule+0x2e6/0x700

[31180.697003] schedule+0x33/0xa0

[31180.697004] schedule_timeout+0x205/0x330

[31180.697005] ? blk_flush_plug_list+0xb3/0x110

[31180.697006] io_schedule_timeout+0x1e/0x50

[31180.697006] wait_for_completion_io+0xb7/0x140

[31180.697009] ? wake_up_q+0x80/0x80

[31180.697009] submit_bio_wait+0x61/0x90

[31180.697010] blkdev_issue_discard+0x7a/0xd0

[31180.697012] blk_ioctl_discard+0xc4/0x110

[31180.697013] blkdev_ioctl+0x4e7/0x9e0

[31180.697014] block_ioctl+0x3d/0x50

[31180.697015] do_vfs_ioctl+0xa9/0x640

[31180.697016] ? __do_sys_newfstat+0x5f/0x70

[31180.697017] ksys_ioctl+0x67/0x90

[31180.697017] __x64_sys_ioctl+0x1a/0x20

[31180.697018] do_syscall_64+0x57/0x190

[31180.697019] entry_SYSCALL_64_after_hwframe+0x44/0xa9

[31180.697020] RIP: 0033:0x7fbeae0f7427

[31180.697021] Code: Bad RIP value.

[31180.697021] RSP: 002b:00007ffd4fb311d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010

[31180.697022] RAX: ffffffffffffffda RBX: 0000564696f132e3 RCX: 00007fbeae0f7427

[31180.697022] RDX: 00007ffd4fb31200 RSI: 0000000000001277 RDI: 0000000000000007

[31180.697022] RBP: 00007ffd4fb31230 R08: 0000564696f134f8 R09: 00007ffd4fb310e0

[31180.697023] R10: 0000000000000000 R11: 0000000000000246 R12: 0000564696d53ec0

[31180.697023] R13: 00007ffd4fb31c60 R14: 0000000000000000 R15: 0000000000000000

[31229.860265] sd 0:2:14:0: [sdd] tag#3306 Abort request is for SMID: 3307

[31229.860269] sd 0:2:14:0: attempting task abort! scmd(0x00000000d4c9c50b) tm_dev_handle 0xe

[31229.926556] sd 0:2:14:0: [sdd] tag#3171 BRCM Debug mfi stat 0x2d, data len requested/completed 0x20000/0x0

[31229.926600] sd 0:2:14:0: [sdd] tag#3200 BRCM Debug mfi stat 0x2d, data len requested/completed 0x6000/0x0

[31229.926623] sd 0:2:14:0: [sdd] tag#759 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0

[31229.926725] sd 0:2:14:0: [sdd] tag#761 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0

[31229.926747] sd 0:2:14:0: [sdd] tag#3262 BRCM Debug mfi stat 0x2d, data len requested/completed 0x6000/0x0

[31229.926768] sd 0:2:14:0: [sdd] tag#757 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0

[31229.926924] sd 0:2:14:0: [sdd] tag#755 BRCM Debug mfi stat 0x2d, data len requested/completed 0x1000/0x0

[31229.926976] sd 0:2:14:0: [sdd] tag#3258 BRCM Debug mfi stat 0x2d, data len requested/completed 0xa00/0x0

.....
more upon request
 
Last edited:
root@pmox1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.10-pve1
ceph-fuse: 15.2.10-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-network-perl: 0.4-6
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.11-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-4
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-9
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
root@pmox1:~# pveversion -v
proxmox-ve: 6.3-1 (running kernel: 5.4.106-1-pve)
pve-manager: 6.3-6 (running version: 6.3-6/2184247e)
pve-kernel-5.4: 6.3-8
pve-kernel-helper: 6.3-8
pve-kernel-5.4.106-1-pve: 5.4.106-1
pve-kernel-5.4.103-1-pve: 5.4.103-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph: 15.2.10-pve1
ceph-fuse: 15.2.10-pve1
corosync: 3.1.0-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.20-pve1
libproxmox-acme-perl: 1.0.8
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.1-3
libpve-common-perl: 6.3-5
libpve-guest-common-perl: 3.1-5
libpve-http-server-perl: 3.1-1
libpve-network-perl: 0.4-6
libpve-storage-perl: 6.3-7
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.6-2
lxcfs: 4.0.6-pve1
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.11-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-9
pve-cluster: 6.2-1
pve-container: 3.3-4
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.2-2
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.2.0-4
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-9
smartmontools: 7.2-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 2.0.4-pve1
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!