thin-lvm problem

jacky0815

Renowned Member
Aug 26, 2010
19
0
66
Hello,

after upgrade from proxmox 5 to proxmox 6 the lvm is changed to thin-lvm. So I started to migrate vom qcow2 to thin-lvm. But there is a big problem: the server hangs when importing the backups or migration VMs from other maschines.

Code:
qmrestore vzdump-qemu-103-2020_06_13-07_41_18.vma.zst 102 --force true --storage local-thin
restore vma archive: zstd -q -d -c /mnt/vz/vzdump-qemu-103-2020_06_13-07_41_18.vma.zst | vma extract -v -r /var/tmp/vzdumptmp3722.fifo - /var/tmp/vzdumptmp3722
CFG: size: 313 name: qemu-server.conf
DEV: dev_id=1 size: 1099511627776 devname: drive-virtio0
CTIME: Sat Jun 13 07:41:23 2020
  Logical volume "vm-102-disk-1" created.
Hangs for days, nothing is done.

The log says:
Code:
[ 1813.519851] INFO: task systemd-udevd:3005 blocked for more than 1087 seconds.
[ 1813.519902]       Tainted: P           OE     5.4.41-1-pve #1
[ 1813.519935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1813.519963] systemd-udevd   D    0  3005    565 0x00004324
[ 1813.519965] Call Trace:
[ 1813.519974]  __schedule+0x2e6/0x6f0
[ 1813.519976]  schedule+0x33/0xa0
[ 1813.519978]  schedule_preempt_disabled+0xe/0x10
[ 1813.519981]  __mutex_lock.isra.10+0x2c9/0x4c0
[ 1813.519984]  ? exact_lock+0x11/0x20
[ 1813.519986]  ? disk_map_sector_rcu+0x70/0x70
[ 1813.519988]  __mutex_lock_slowpath+0x13/0x20
[ 1813.519990]  mutex_lock+0x2c/0x30
[ 1813.519992]  __blkdev_get+0x7a/0x560
[ 1813.519994]  blkdev_get+0xe0/0x140
[ 1813.519996]  ? blkdev_get_by_dev+0x50/0x50
[ 1813.519997]  blkdev_open+0x87/0xa0
[ 1813.520000]  do_dentry_open+0x143/0x3a0
[ 1813.520001]  vfs_open+0x2d/0x30
[ 1813.520004]  path_openat+0x2e9/0x16f0
[ 1813.520007]  ? unlock_page_memcg+0x12/0x20
[ 1813.520010]  ? page_add_file_rmap+0x131/0x190
[ 1813.520012]  ? wp_page_copy+0x37b/0x750
[ 1813.520014]  do_filp_open+0x93/0x100
[ 1813.520017]  ? __alloc_fd+0x46/0x150
[ 1813.520019]  do_sys_open+0x177/0x280
[ 1813.520020]  __x64_sys_openat+0x20/0x30
[ 1813.520024]  do_syscall_64+0x57/0x190
[ 1813.520027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1813.520029] RIP: 0033:0x7f299ba2a1ae
[ 1813.520033] Code: Bad RIP value.
[ 1813.520034] RSP: 002b:00007ffd577d3b10 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 1813.520036] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f299ba2a1ae
[ 1813.520036] RDX: 0000000000080000 RSI: 000055c3ba8707b0 RDI: 00000000ffffff9c
[ 1813.520037] RBP: 00007f299b249c60 R08: 000055c3b918c270 R09: 000000000000000f
[ 1813.520038] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
[ 1813.520039] R13: 0000000000000000 R14: 0000000000000000 R15: 000055c3ba86f050
When I lock at the screen there is following messages is print:
Code:
INFO: task systemd-udev:3005 blocked for more than xxx seconds.

lvs says timeout

To fix the problem, I completely reinstalled proxmox and imported all firmware and bios updates. But the problem is still there.
The server is an IBM x3650 M3 with ServeRAID M5014 SAS/SATA Controller (46M0916).

I also read a lot of posts in the forum. Unfortunately, e.g. "udevadm trigger" only improves for a short time and all other solutions also do not work.
Does anyone else have an idea?
 
Hi,

please send the output of the following commands.

Code:
pveversion -v
dpkg -l | grep -e 'intel-microcode'
lsblk --asci
 
The LVM is changed to a normal lvm.

I updated another server over the weekend and found a similar problem. When restoring the backups using qmrestore and simultaneously migrating another VM, the error with "lvs" was back and the system hung. But no kernel-panic. The other server is newer.

[/QUOTE]
Bash:
root@kvm2:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
root@kvm2:~# dpkg -l | grep -e 'intel-microcode'
ii  intel-microcode                      3.20200609.2~deb10u1         amd64        Processor microcode firmware for Intel CPUs
root@kvm2:~# lsblk --asci
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda            8:0    0  3.2T  0 disk
|-sda1         8:1    0 1007K  0 part
|-sda2         8:2    0  512M  0 part
`-sda3         8:3    0  3.2T  0 part
  |-pve-swap 253:0    0    8G  0 lvm  [SWAP]
  |-pve-root 253:1    0   96G  0 lvm  /
  `-pve-data 253:2    0  3.1T  0 lvm  /var/lib/vz
sdb            8:16   0  7.3T  0 disk
|-sdb1         8:17   0  128M  0 part
`-sdb2         8:18   0  7.3T  0 part /mnt
sr0           11:0    1 1024M  0 rom
 
I guess it is kernel-related.
Try to use the pve-kernel-5.4.44-1-pve what is since yesterday available.
 
Hello,

I tested it with the latest packages yesterday. Unfortunately there are still mistakes.

Code:
Jul  2 19:26:12 kvm5 pvedaemon[1236]: <root@pam> starting task UPID:kvm5:000075A3:0309FF6D:5EFE18B4:imgdel:103@ssd-thin:root@pam:
Jul  2 19:26:18 kvm5 pvedaemon[30115]: lvremove 'pve-ssd/vm-103-disk-0' error:   Logical volume pve-ssd/vm-103-disk-0 in use.
Jul  2 19:26:18 kvm5 pvedaemon[1236]: <root@pam> end task UPID:kvm5:000075A3:0309FF6D:5EFE18B4:imgdel:103@ssd-thin:root@pam: lvremove 'pve-ssd/vm-103-disk-0' error:   Logical volume pve-ssd/vm-103-disk-0 in use.
Jul  2 19:26:54 kvm5 systemd-udevd[785]: dm-7: Worker [29673] processing SEQNUM=403630 killed
Jul  2 19:26:57 kvm5 kernel: [509916.361712] INFO: task vma:28050 blocked for more than 120 seconds.
Jul  2 19:26:57 kvm5 kernel: [509916.361768]       Tainted: P          IOE     5.4.44-1-pve #1
Jul  2 19:26:57 kvm5 kernel: [509916.361808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  2 19:26:57 kvm5 kernel: [509916.361853] vma             D    0 28050      1 0x80004002
Jul  2 19:26:57 kvm5 kernel: [509916.361855] Call Trace:
Jul  2 19:26:57 kvm5 kernel: [509916.361864]  __schedule+0x2e6/0x6f0
Jul  2 19:26:57 kvm5 kernel: [509916.361866]  schedule+0x33/0xa0
Jul  2 19:26:57 kvm5 kernel: [509916.361868]  io_schedule+0x16/0x40
Jul  2 19:26:57 kvm5 kernel: [509916.361873]  wait_on_page_bit+0x141/0x210
Jul  2 19:26:57 kvm5 kernel: [509916.361875]  ? file_fdatawait_range+0x30/0x30
Jul  2 19:26:57 kvm5 kernel: [509916.361879]  wait_on_page_writeback+0x43/0x90
Jul  2 19:26:57 kvm5 kernel: [509916.361880]  __filemap_fdatawait_range+0xae/0x120
Jul  2 19:26:57 kvm5 kernel: [509916.361883]  filemap_write_and_wait+0x5e/0xa0
Jul  2 19:26:57 kvm5 kernel: [509916.361887]  __blkdev_put+0x72/0x1e0
Jul  2 19:26:57 kvm5 kernel: [509916.361889]  blkdev_put+0x4c/0xd0
Jul  2 19:26:57 kvm5 kernel: [509916.361890]  blkdev_close+0x25/0x30
Jul  2 19:26:57 kvm5 kernel: [509916.361894]  __fput+0xc6/0x260
Jul  2 19:26:57 kvm5 kernel: [509916.361895]  ____fput+0xe/0x10
Jul  2 19:26:57 kvm5 kernel: [509916.361900]  task_work_run+0x9d/0xc0
Jul  2 19:26:57 kvm5 kernel: [509916.361904]  do_exit+0x367/0xab0
Jul  2 19:26:57 kvm5 kernel: [509916.361906]  do_group_exit+0x47/0xb0
Jul  2 19:26:57 kvm5 kernel: [509916.361909]  get_signal+0x140/0x850
Jul  2 19:26:57 kvm5 kernel: [509916.361911]  ? fsnotify+0x309/0x3d0
Jul  2 19:26:57 kvm5 kernel: [509916.361915]  do_signal+0x34/0x6e0
Jul  2 19:26:57 kvm5 kernel: [509916.361917]  ? vfs_write+0x184/0x1b0
Jul  2 19:26:57 kvm5 kernel: [509916.361922]  exit_to_usermode_loop+0x90/0x130
Jul  2 19:26:57 kvm5 kernel: [509916.361923]  do_syscall_64+0x160/0x190
Jul  2 19:26:57 kvm5 kernel: [509916.361926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul  2 19:26:57 kvm5 kernel: [509916.361928] RIP: 0033:0x7f041a7cbedf
Jul  2 19:26:57 kvm5 kernel: [509916.361932] Code: Bad RIP value.
Jul  2 19:26:57 kvm5 kernel: [509916.361933] RSP: 002b:00007f040d0789c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Jul  2 19:26:57 kvm5 kernel: [509916.361935] RAX: 0000000000007000 RBX: 000000000000000b RCX: 00007f041a7cbedf
Jul  2 19:26:57 kvm5 kernel: [509916.361935] RDX: 0000000000010000 RSI: 00007ffed9f39440 RDI: 000000000000000b
Jul  2 19:26:57 kvm5 kernel: [509916.361936] RBP: 00007ffed9f39440 R08: 0000000000000000 R09: 00000000ffffffff
Jul  2 19:26:57 kvm5 kernel: [509916.361937] R10: 00000005459d0000 R11: 0000000000000293 R12: 0000000000010000
Jul  2 19:26:57 kvm5 kernel: [509916.361937] R13: 00000005459d0000 R14: 00007f040e021e70 R15: 000055d7e90a7b82
Jul  2 19:26:57 kvm5 kernel: [509916.361944] INFO: task systemd-udevd:29673 blocked for more than 120 seconds.
Jul  2 19:26:57 kvm5 kernel: [509916.361965]       Tainted: P          IOE     5.4.44-1-pve #1
Jul  2 19:26:57 kvm5 kernel: [509916.361982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  2 19:26:57 kvm5 kernel: [509916.362003] systemd-udevd   D    0 29673    785 0x00000324
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!