lvm errors in 5.4.44-1+

sahostking

Renowned Member
When I do an resize on lvm partitions or anything runs against lvm.
If I dont then its fine but as soon as I try to make a change to lvm partition it dies. Tried it on few servers same issue on pve-kernel-5.4.44-2-pve.

I see these stuck:
root 5938 0.0 0.0 15848 8724 ? D 08:40 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
root 7077 0.0 0.0 15848 8572 ? D 08:45 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
root 7985 0.0 0.0 6072 892 pts/0 S+ 08:49 0:00 grep vgs
root 15259 0.0 0.0 15848 8640 ? D 02:36 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
root 19983 0.0 0.0 15848 8724 ? D 02:56 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count


pve-kernel-5.4.44-2-pve atleast stops the vms going down randomly anymore so thats good but still caused the above for me.

Moved back to 5.3 pve kernel which has no issues and I can resize etc. fine.



I see when backup runs it also gets stuck when on that kernel:


INFO: Backup started at 2020-07-06 02:56:32
INFO: status = running
INFO: VM Name:
INFO: include disk 'scsi0' 'local-lvm:vm-138-disk-0' 50G
command '/sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count' failed: interrupted by signal
/dev/sdb: open failed: No medium found
TASK ERROR: got unexpected control message:


Rebooted one server into 5.3 kernel and all is good. Going to do the same with this server tonight.
 
I do see this running dmesg though

[Mon Jul 6 02:42:32 2020] vgs D 0 15259 1668 0x00000000
[Mon Jul 6 02:42:32 2020] Call Trace:
[Mon Jul 6 02:42:32 2020] __schedule+0x2e6/0x6f0
[Mon Jul 6 02:42:32 2020] schedule+0x33/0xa0
[Mon Jul 6 02:42:32 2020] schedule_preempt_disabled+0xe/0x10
[Mon Jul 6 02:42:32 2020] __mutex_lock.isra.10+0x2c9/0x4c0
[Mon Jul 6 02:42:32 2020] __mutex_lock_slowpath+0x13/0x20
[Mon Jul 6 02:42:32 2020] mutex_lock+0x2c/0x30
[Mon Jul 6 02:42:32 2020] disk_block_events+0x31/0x80
[Mon Jul 6 02:42:32 2020] __blkdev_get+0x72/0x560
[Mon Jul 6 02:42:32 2020] blkdev_get+0xe0/0x140
[Mon Jul 6 02:42:32 2020] ? blkdev_get_by_dev+0x50/0x50
[Mon Jul 6 02:42:32 2020] blkdev_open+0x87/0xa0
[Mon Jul 6 02:42:32 2020] do_dentry_open+0x143/0x3a0
[Mon Jul 6 02:42:32 2020] vfs_open+0x2d/0x30
[Mon Jul 6 02:42:32 2020] path_openat+0x2e9/0x16f0
[Mon Jul 6 02:42:32 2020] ? filename_lookup.part.60+0xe0/0x170
[Mon Jul 6 02:42:32 2020] do_filp_open+0x93/0x100
[Mon Jul 6 02:42:32 2020] ? __alloc_fd+0x46/0x150
[Mon Jul 6 02:42:32 2020] do_sys_open+0x177/0x280
[Mon Jul 6 02:42:32 2020] __x64_sys_openat+0x20/0x30
[Mon Jul 6 02:42:32 2020] do_syscall_64+0x57/0x190
[Mon Jul 6 02:42:32 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Mon Jul 6 02:42:32 2020] RIP: 0033:0x7f6c7d5251ae
[Mon Jul 6 02:42:32 2020] Code: Bad RIP value.
[Mon Jul 6 02:42:32 2020] RSP: 002b:00007ffc846f8f20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[Mon Jul 6 02:42:32 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6c7d5251ae
[Mon Jul 6 02:42:32 2020] RDX: 0000000000044000 RSI: 000055a0a9377e48 RDI: 00000000ffffff9c
[Mon Jul 6 02:42:32 2020] RBP: 00007ffc846f9080 R08: 00007f6c7d5f6ea0 R09: 00007f6c7d5f6cf0
[Mon Jul 6 02:42:32 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc846f9edf
[Mon Jul 6 02:42:32 2020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Mon Jul 6 02:44:33 2020] INFO: task snmpd:1303 blocked for more than 362 seconds.
[Mon Jul 6 02:44:33 2020] Tainted: P IO 5.4.44-2-pve #1


But I am assuming that is just our monitoring trying to get diskspace value off lvm partition and it cannot so it freezes.
 
Same issue here. I have set my server to restart if a kernel thread hangs for a certain timeout, and I get greeted with a restarted server every morning. Reverting to pve-kernel-5.4.41-1-pve fixes the issue. Something is very wrong with pve-kernel-5.4.44-1-pve+ kernel.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!