lvm errors in 5.4.44-1+

sahostking

Renowned Member
When I do an resize on lvm partitions or anything runs against lvm.
If I dont then its fine but as soon as I try to make a change to lvm partition it dies. Tried it on few servers same issue on pve-kernel-5.4.44-2-pve.

I see these stuck:
root 5938 0.0 0.0 15848 8724 ? D 08:40 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
root 7077 0.0 0.0 15848 8572 ? D 08:45 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
root 7985 0.0 0.0 6072 892 pts/0 S+ 08:49 0:00 grep vgs
root 15259 0.0 0.0 15848 8640 ? D 02:36 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count
root 19983 0.0 0.0 15848 8724 ? D 02:56 0:00 /sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count


pve-kernel-5.4.44-2-pve atleast stops the vms going down randomly anymore so thats good but still caused the above for me.

Moved back to 5.3 pve kernel which has no issues and I can resize etc. fine.



I see when backup runs it also gets stuck when on that kernel:


INFO: Backup started at 2020-07-06 02:56:32
INFO: status = running
INFO: VM Name:
INFO: include disk 'scsi0' 'local-lvm:vm-138-disk-0' 50G
command '/sbin/vgs --separator : --noheadings --units b --unbuffered --nosuffix --options vg_name,vg_size,vg_free,lv_count' failed: interrupted by signal
/dev/sdb: open failed: No medium found
TASK ERROR: got unexpected control message:


Rebooted one server into 5.3 kernel and all is good. Going to do the same with this server tonight.
 
I do see this running dmesg though

[Mon Jul 6 02:42:32 2020] vgs D 0 15259 1668 0x00000000
[Mon Jul 6 02:42:32 2020] Call Trace:
[Mon Jul 6 02:42:32 2020] __schedule+0x2e6/0x6f0
[Mon Jul 6 02:42:32 2020] schedule+0x33/0xa0
[Mon Jul 6 02:42:32 2020] schedule_preempt_disabled+0xe/0x10
[Mon Jul 6 02:42:32 2020] __mutex_lock.isra.10+0x2c9/0x4c0
[Mon Jul 6 02:42:32 2020] __mutex_lock_slowpath+0x13/0x20
[Mon Jul 6 02:42:32 2020] mutex_lock+0x2c/0x30
[Mon Jul 6 02:42:32 2020] disk_block_events+0x31/0x80
[Mon Jul 6 02:42:32 2020] __blkdev_get+0x72/0x560
[Mon Jul 6 02:42:32 2020] blkdev_get+0xe0/0x140
[Mon Jul 6 02:42:32 2020] ? blkdev_get_by_dev+0x50/0x50
[Mon Jul 6 02:42:32 2020] blkdev_open+0x87/0xa0
[Mon Jul 6 02:42:32 2020] do_dentry_open+0x143/0x3a0
[Mon Jul 6 02:42:32 2020] vfs_open+0x2d/0x30
[Mon Jul 6 02:42:32 2020] path_openat+0x2e9/0x16f0
[Mon Jul 6 02:42:32 2020] ? filename_lookup.part.60+0xe0/0x170
[Mon Jul 6 02:42:32 2020] do_filp_open+0x93/0x100
[Mon Jul 6 02:42:32 2020] ? __alloc_fd+0x46/0x150
[Mon Jul 6 02:42:32 2020] do_sys_open+0x177/0x280
[Mon Jul 6 02:42:32 2020] __x64_sys_openat+0x20/0x30
[Mon Jul 6 02:42:32 2020] do_syscall_64+0x57/0x190
[Mon Jul 6 02:42:32 2020] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Mon Jul 6 02:42:32 2020] RIP: 0033:0x7f6c7d5251ae
[Mon Jul 6 02:42:32 2020] Code: Bad RIP value.
[Mon Jul 6 02:42:32 2020] RSP: 002b:00007ffc846f8f20 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[Mon Jul 6 02:42:32 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f6c7d5251ae
[Mon Jul 6 02:42:32 2020] RDX: 0000000000044000 RSI: 000055a0a9377e48 RDI: 00000000ffffff9c
[Mon Jul 6 02:42:32 2020] RBP: 00007ffc846f9080 R08: 00007f6c7d5f6ea0 R09: 00007f6c7d5f6cf0
[Mon Jul 6 02:42:32 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc846f9edf
[Mon Jul 6 02:42:32 2020] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
[Mon Jul 6 02:44:33 2020] INFO: task snmpd:1303 blocked for more than 362 seconds.
[Mon Jul 6 02:44:33 2020] Tainted: P IO 5.4.44-2-pve #1


But I am assuming that is just our monitoring trying to get diskspace value off lvm partition and it cannot so it freezes.
 
Same issue here. I have set my server to restart if a kernel thread hangs for a certain timeout, and I get greeted with a restarted server every morning. Reverting to pve-kernel-5.4.41-1-pve fixes the issue. Something is very wrong with pve-kernel-5.4.44-1-pve+ kernel.