thin-lvm problem

jacky0815

Renowned Member
Aug 26, 2010
33
0
71
Hello,

after upgrade from proxmox 5 to proxmox 6 the lvm is changed to thin-lvm. So I started to migrate vom qcow2 to thin-lvm. But there is a big problem: the server hangs when importing the backups or migration VMs from other maschines.

Code:
qmrestore vzdump-qemu-103-2020_06_13-07_41_18.vma.zst 102 --force true --storage local-thin
restore vma archive: zstd -q -d -c /mnt/vz/vzdump-qemu-103-2020_06_13-07_41_18.vma.zst | vma extract -v -r /var/tmp/vzdumptmp3722.fifo - /var/tmp/vzdumptmp3722
CFG: size: 313 name: qemu-server.conf
DEV: dev_id=1 size: 1099511627776 devname: drive-virtio0
CTIME: Sat Jun 13 07:41:23 2020
  Logical volume "vm-102-disk-1" created.
Hangs for days, nothing is done.

The log says:
Code:
[ 1813.519851] INFO: task systemd-udevd:3005 blocked for more than 1087 seconds.
[ 1813.519902]       Tainted: P           OE     5.4.41-1-pve #1
[ 1813.519935] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1813.519963] systemd-udevd   D    0  3005    565 0x00004324
[ 1813.519965] Call Trace:
[ 1813.519974]  __schedule+0x2e6/0x6f0
[ 1813.519976]  schedule+0x33/0xa0
[ 1813.519978]  schedule_preempt_disabled+0xe/0x10
[ 1813.519981]  __mutex_lock.isra.10+0x2c9/0x4c0
[ 1813.519984]  ? exact_lock+0x11/0x20
[ 1813.519986]  ? disk_map_sector_rcu+0x70/0x70
[ 1813.519988]  __mutex_lock_slowpath+0x13/0x20
[ 1813.519990]  mutex_lock+0x2c/0x30
[ 1813.519992]  __blkdev_get+0x7a/0x560
[ 1813.519994]  blkdev_get+0xe0/0x140
[ 1813.519996]  ? blkdev_get_by_dev+0x50/0x50
[ 1813.519997]  blkdev_open+0x87/0xa0
[ 1813.520000]  do_dentry_open+0x143/0x3a0
[ 1813.520001]  vfs_open+0x2d/0x30
[ 1813.520004]  path_openat+0x2e9/0x16f0
[ 1813.520007]  ? unlock_page_memcg+0x12/0x20
[ 1813.520010]  ? page_add_file_rmap+0x131/0x190
[ 1813.520012]  ? wp_page_copy+0x37b/0x750
[ 1813.520014]  do_filp_open+0x93/0x100
[ 1813.520017]  ? __alloc_fd+0x46/0x150
[ 1813.520019]  do_sys_open+0x177/0x280
[ 1813.520020]  __x64_sys_openat+0x20/0x30
[ 1813.520024]  do_syscall_64+0x57/0x190
[ 1813.520027]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 1813.520029] RIP: 0033:0x7f299ba2a1ae
[ 1813.520033] Code: Bad RIP value.
[ 1813.520034] RSP: 002b:00007ffd577d3b10 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[ 1813.520036] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f299ba2a1ae
[ 1813.520036] RDX: 0000000000080000 RSI: 000055c3ba8707b0 RDI: 00000000ffffff9c
[ 1813.520037] RBP: 00007f299b249c60 R08: 000055c3b918c270 R09: 000000000000000f
[ 1813.520038] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
[ 1813.520039] R13: 0000000000000000 R14: 0000000000000000 R15: 000055c3ba86f050
When I lock at the screen there is following messages is print:
Code:
INFO: task systemd-udev:3005 blocked for more than xxx seconds.

lvs says timeout

To fix the problem, I completely reinstalled proxmox and imported all firmware and bios updates. But the problem is still there.
The server is an IBM x3650 M3 with ServeRAID M5014 SAS/SATA Controller (46M0916).

I also read a lot of posts in the forum. Unfortunately, e.g. "udevadm trigger" only improves for a short time and all other solutions also do not work.
Does anyone else have an idea?
 
Hi,

please send the output of the following commands.

Code:
pveversion -v
dpkg -l | grep -e 'intel-microcode'
lsblk --asci
 
The LVM is changed to a normal lvm.

I updated another server over the weekend and found a similar problem. When restoring the backups using qmrestore and simultaneously migrating another VM, the error with "lvs" was back and the system hung. But no kernel-panic. The other server is newer.

[/QUOTE]
Bash:
root@kvm2:~# pveversion -v
proxmox-ve: 6.2-1 (running kernel: 5.4.41-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1
root@kvm2:~# dpkg -l | grep -e 'intel-microcode'
ii  intel-microcode                      3.20200609.2~deb10u1         amd64        Processor microcode firmware for Intel CPUs
root@kvm2:~# lsblk --asci
NAME         MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda            8:0    0  3.2T  0 disk
|-sda1         8:1    0 1007K  0 part
|-sda2         8:2    0  512M  0 part
`-sda3         8:3    0  3.2T  0 part
  |-pve-swap 253:0    0    8G  0 lvm  [SWAP]
  |-pve-root 253:1    0   96G  0 lvm  /
  `-pve-data 253:2    0  3.1T  0 lvm  /var/lib/vz
sdb            8:16   0  7.3T  0 disk
|-sdb1         8:17   0  128M  0 part
`-sdb2         8:18   0  7.3T  0 part /mnt
sr0           11:0    1 1024M  0 rom
 
I guess it is kernel-related.
Try to use the pve-kernel-5.4.44-1-pve what is since yesterday available.
 
Hello,

I tested it with the latest packages yesterday. Unfortunately there are still mistakes.

Code:
Jul  2 19:26:12 kvm5 pvedaemon[1236]: <root@pam> starting task UPID:kvm5:000075A3:0309FF6D:5EFE18B4:imgdel:103@ssd-thin:root@pam:
Jul  2 19:26:18 kvm5 pvedaemon[30115]: lvremove 'pve-ssd/vm-103-disk-0' error:   Logical volume pve-ssd/vm-103-disk-0 in use.
Jul  2 19:26:18 kvm5 pvedaemon[1236]: <root@pam> end task UPID:kvm5:000075A3:0309FF6D:5EFE18B4:imgdel:103@ssd-thin:root@pam: lvremove 'pve-ssd/vm-103-disk-0' error:   Logical volume pve-ssd/vm-103-disk-0 in use.
Jul  2 19:26:54 kvm5 systemd-udevd[785]: dm-7: Worker [29673] processing SEQNUM=403630 killed
Jul  2 19:26:57 kvm5 kernel: [509916.361712] INFO: task vma:28050 blocked for more than 120 seconds.
Jul  2 19:26:57 kvm5 kernel: [509916.361768]       Tainted: P          IOE     5.4.44-1-pve #1
Jul  2 19:26:57 kvm5 kernel: [509916.361808] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  2 19:26:57 kvm5 kernel: [509916.361853] vma             D    0 28050      1 0x80004002
Jul  2 19:26:57 kvm5 kernel: [509916.361855] Call Trace:
Jul  2 19:26:57 kvm5 kernel: [509916.361864]  __schedule+0x2e6/0x6f0
Jul  2 19:26:57 kvm5 kernel: [509916.361866]  schedule+0x33/0xa0
Jul  2 19:26:57 kvm5 kernel: [509916.361868]  io_schedule+0x16/0x40
Jul  2 19:26:57 kvm5 kernel: [509916.361873]  wait_on_page_bit+0x141/0x210
Jul  2 19:26:57 kvm5 kernel: [509916.361875]  ? file_fdatawait_range+0x30/0x30
Jul  2 19:26:57 kvm5 kernel: [509916.361879]  wait_on_page_writeback+0x43/0x90
Jul  2 19:26:57 kvm5 kernel: [509916.361880]  __filemap_fdatawait_range+0xae/0x120
Jul  2 19:26:57 kvm5 kernel: [509916.361883]  filemap_write_and_wait+0x5e/0xa0
Jul  2 19:26:57 kvm5 kernel: [509916.361887]  __blkdev_put+0x72/0x1e0
Jul  2 19:26:57 kvm5 kernel: [509916.361889]  blkdev_put+0x4c/0xd0
Jul  2 19:26:57 kvm5 kernel: [509916.361890]  blkdev_close+0x25/0x30
Jul  2 19:26:57 kvm5 kernel: [509916.361894]  __fput+0xc6/0x260
Jul  2 19:26:57 kvm5 kernel: [509916.361895]  ____fput+0xe/0x10
Jul  2 19:26:57 kvm5 kernel: [509916.361900]  task_work_run+0x9d/0xc0
Jul  2 19:26:57 kvm5 kernel: [509916.361904]  do_exit+0x367/0xab0
Jul  2 19:26:57 kvm5 kernel: [509916.361906]  do_group_exit+0x47/0xb0
Jul  2 19:26:57 kvm5 kernel: [509916.361909]  get_signal+0x140/0x850
Jul  2 19:26:57 kvm5 kernel: [509916.361911]  ? fsnotify+0x309/0x3d0
Jul  2 19:26:57 kvm5 kernel: [509916.361915]  do_signal+0x34/0x6e0
Jul  2 19:26:57 kvm5 kernel: [509916.361917]  ? vfs_write+0x184/0x1b0
Jul  2 19:26:57 kvm5 kernel: [509916.361922]  exit_to_usermode_loop+0x90/0x130
Jul  2 19:26:57 kvm5 kernel: [509916.361923]  do_syscall_64+0x160/0x190
Jul  2 19:26:57 kvm5 kernel: [509916.361926]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
Jul  2 19:26:57 kvm5 kernel: [509916.361928] RIP: 0033:0x7f041a7cbedf
Jul  2 19:26:57 kvm5 kernel: [509916.361932] Code: Bad RIP value.
Jul  2 19:26:57 kvm5 kernel: [509916.361933] RSP: 002b:00007f040d0789c0 EFLAGS: 00000293 ORIG_RAX: 0000000000000012
Jul  2 19:26:57 kvm5 kernel: [509916.361935] RAX: 0000000000007000 RBX: 000000000000000b RCX: 00007f041a7cbedf
Jul  2 19:26:57 kvm5 kernel: [509916.361935] RDX: 0000000000010000 RSI: 00007ffed9f39440 RDI: 000000000000000b
Jul  2 19:26:57 kvm5 kernel: [509916.361936] RBP: 00007ffed9f39440 R08: 0000000000000000 R09: 00000000ffffffff
Jul  2 19:26:57 kvm5 kernel: [509916.361937] R10: 00000005459d0000 R11: 0000000000000293 R12: 0000000000010000
Jul  2 19:26:57 kvm5 kernel: [509916.361937] R13: 00000005459d0000 R14: 00007f040e021e70 R15: 000055d7e90a7b82
Jul  2 19:26:57 kvm5 kernel: [509916.361944] INFO: task systemd-udevd:29673 blocked for more than 120 seconds.
Jul  2 19:26:57 kvm5 kernel: [509916.361965]       Tainted: P          IOE     5.4.44-1-pve #1
Jul  2 19:26:57 kvm5 kernel: [509916.361982] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul  2 19:26:57 kvm5 kernel: [509916.362003] systemd-udevd   D    0 29673    785 0x00000324