Failed to migrate disk (Device /dev/dm-xx not initialized in udev database even after waiting 10000

dumdum · Aug 9, 2019

pveversion:
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)

Migrate disk fails to every time with the below message.
I am suspecting a kernel related issue with LVM thinpool on Proxmox 6.0, see further below for kern.log.
I see the same error for timeout from udev also when creating a new VM in LVM thinpool storage.

Task Log

Code:

...
Transferred: 21474836480 bytes remaining: 0 bytes total: 21474836480 bytes progression: 100.00 %
  WARNING: Device /dev/dm-17 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/dm-17 not initialized in udev database even after waiting 10000000 microseconds.
  Logical volume "vm-100-disk-0" successfully removed
  WARNING: Device /dev/dm-17 not initialized in udev database even after waiting 10000000 microseconds.
TASK ERROR: storage migration failed: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --options lv_size /dev/vg01/vm-100-disk-0' failed: got timeout

kern.log

Code:

kernel: [ 1453.178236] INFO: task systemd-udevd:4712 blocked for more than 120 seconds.
kernel: [ 1453.178266]       Tainted: P           O      5.0.15-1-pve #1
kernel: [ 1453.178284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [ 1453.178307] systemd-udevd   D    0  4712    934 0x00000100
kernel: [ 1453.178310] Call Trace:
kernel: [ 1453.178322]  __schedule+0x2d4/0x870
kernel: [ 1453.178325]  schedule+0x2c/0x70
kernel: [ 1453.178327]  schedule_preempt_disabled+0xe/0x10
kernel: [ 1453.178329]  __mutex_lock.isra.10+0x2e4/0x4c0
kernel: [ 1453.178338]  ? exact_lock+0x11/0x20
kernel: [ 1453.178339]  ? disk_map_sector_rcu+0x70/0x70
kernel: [ 1453.178341]  __mutex_lock_slowpath+0x13/0x20
kernel: [ 1453.178342]  mutex_lock+0x2c/0x30
kernel: [ 1453.178347]  __blkdev_get+0x7b/0x550
kernel: [ 1453.178348]  ? bd_acquire+0xd0/0xd0
kernel: [ 1453.178350]  blkdev_get+0x10c/0x330
kernel: [ 1453.178351]  ? bd_acquire+0xd0/0xd0
kernel: [ 1453.178352]  blkdev_open+0x92/0x100
kernel: [ 1453.178356]  do_dentry_open+0x143/0x3a0
kernel: [ 1453.178359]  vfs_open+0x2d/0x30
kernel: [ 1453.178361]  path_openat+0x2d4/0x16d0
kernel: [ 1453.178366]  ? page_add_file_rmap+0x5f/0x220
kernel: [ 1453.178370]  ? alloc_set_pte+0x104/0x5b0
kernel: [ 1453.178373]  do_filp_open+0x93/0x100
kernel: [ 1453.178381]  ? strncpy_from_user+0x56/0x1b0
kernel: [ 1453.178397]  ? __alloc_fd+0x46/0x150
kernel: [ 1453.178399]  do_sys_open+0x177/0x280
kernel: [ 1453.178400]  __x64_sys_openat+0x20/0x30
kernel: [ 1453.178407]  do_syscall_64+0x5a/0x110
kernel: [ 1453.178410]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: [ 1453.178413] RIP: 0033:0x7fe8ae3cc1ae
kernel: [ 1453.178418] Code: Bad RIP value.
kernel: [ 1453.178419] RSP: 002b:00007ffe83be4780 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
kernel: [ 1453.178420] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe8ae3cc1ae
kernel: [ 1453.178421] RDX: 0000000000080000 RSI: 00005631364785c0 RDI: 00000000ffffff9c
kernel: [ 1453.178421] RBP: 00007fe8adbebc60 R08: 0000563135465270 R09: 000000000000000f
kernel: [ 1453.178422] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
kernel: [ 1453.178423] R13: 0000000000000000 R14: 0000000000000000 R15: 0000563136454dc0

jenssen99 · Aug 9, 2019

Looks that you have a similar issue as I have, problems with LVM in Promox 6.

https://forum.proxmox.com/threads/taking-rollback-snapshot-take-a-lot-longer-in-proxmox-6.56779/

tomarch · Aug 9, 2019

Seems to be the same bug here: https://forum.proxmox.com/threads/e...tor-noheadings-got-timeout.52519/#post-261766

inetshell · Aug 16, 2019

Please try using this command and see if it works fine after that:
udevadm trigger

zezinho · Oct 17, 2019

inetshell said:
Please try using this command and see if it works fine after that:
udevadm trigger

I had the same problem after removing/adding several times a disk from one VM. Thanks inetshell this command solved the problem so it would be good to mark this thread as SOLVED.

kernel2008 · Oct 21, 2019

dumdum said:

pveversion:
pve-manager/6.0-4/2a719255 (running kernel: 5.0.15-1-pve)

Migrate disk fails to every time with the below message.
I am suspecting a kernel related issue with LVM thinpool on Proxmox 6.0, see further below for kern.log.
I see the same error for timeout from udev also when creating a new VM in LVM thinpool storage.

Task Log

Code:

...
Transferred: 21474836480 bytes remaining: 0 bytes total: 21474836480 bytes progression: 100.00 %
  WARNING: Device /dev/dm-17 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/dm-17 not initialized in udev database even after waiting 10000000 microseconds.
  Logical volume "vm-100-disk-0" successfully removed
  WARNING: Device /dev/dm-17 not initialized in udev database even after waiting 10000000 microseconds.
TASK ERROR: storage migration failed: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --options lv_size /dev/vg01/vm-100-disk-0' failed: got timeout

kern.log

Code:

kernel: [ 1453.178236] INFO: task systemd-udevd:4712 blocked for more than 120 seconds.
kernel: [ 1453.178266]       Tainted: P           O      5.0.15-1-pve #1
kernel: [ 1453.178284] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [ 1453.178307] systemd-udevd   D    0  4712    934 0x00000100
kernel: [ 1453.178310] Call Trace:
kernel: [ 1453.178322]  __schedule+0x2d4/0x870
kernel: [ 1453.178325]  schedule+0x2c/0x70
kernel: [ 1453.178327]  schedule_preempt_disabled+0xe/0x10
kernel: [ 1453.178329]  __mutex_lock.isra.10+0x2e4/0x4c0
kernel: [ 1453.178338]  ? exact_lock+0x11/0x20
kernel: [ 1453.178339]  ? disk_map_sector_rcu+0x70/0x70
kernel: [ 1453.178341]  __mutex_lock_slowpath+0x13/0x20
kernel: [ 1453.178342]  mutex_lock+0x2c/0x30
kernel: [ 1453.178347]  __blkdev_get+0x7b/0x550
kernel: [ 1453.178348]  ? bd_acquire+0xd0/0xd0
kernel: [ 1453.178350]  blkdev_get+0x10c/0x330
kernel: [ 1453.178351]  ? bd_acquire+0xd0/0xd0
kernel: [ 1453.178352]  blkdev_open+0x92/0x100
kernel: [ 1453.178356]  do_dentry_open+0x143/0x3a0
kernel: [ 1453.178359]  vfs_open+0x2d/0x30
kernel: [ 1453.178361]  path_openat+0x2d4/0x16d0
kernel: [ 1453.178366]  ? page_add_file_rmap+0x5f/0x220
kernel: [ 1453.178370]  ? alloc_set_pte+0x104/0x5b0
kernel: [ 1453.178373]  do_filp_open+0x93/0x100
kernel: [ 1453.178381]  ? strncpy_from_user+0x56/0x1b0
kernel: [ 1453.178397]  ? __alloc_fd+0x46/0x150
kernel: [ 1453.178399]  do_sys_open+0x177/0x280
kernel: [ 1453.178400]  __x64_sys_openat+0x20/0x30
kernel: [ 1453.178407]  do_syscall_64+0x5a/0x110
kernel: [ 1453.178410]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: [ 1453.178413] RIP: 0033:0x7fe8ae3cc1ae
kernel: [ 1453.178418] Code: Bad RIP value.
kernel: [ 1453.178419] RSP: 002b:00007ffe83be4780 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
kernel: [ 1453.178420] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fe8ae3cc1ae
kernel: [ 1453.178421] RDX: 0000000000080000 RSI: 00005631364785c0 RDI: 00000000ffffff9c
kernel: [ 1453.178421] RBP: 00007fe8adbebc60 R08: 0000563135465270 R09: 000000000000000f
kernel: [ 1453.178422] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
kernel: [ 1453.178423] R13: 0000000000000000 R14: 0000000000000000 R15: 0000563136454dc0

I have the same problem but the command "udevadm trigger " doesn't work.
I have two LVM-Thin: "local-lvm" and "my-add-lvm".The problem disaper when cloning vm from local-lvm to local-lvm and creating new vm on my-add-lvm.Just I clone vm from my-add-lvm to my-add-lvm, and clone vm from local-lvm to my-add-lvm,the task must failure.The error message in kernel.log is same as you.

Code:

... 
WARNING: Device /dev/dm-23 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/dm-23 not initialized in udev database even after waiting 10000000 microseconds.
  Logical volume "vm-123-disk-0" successfully removed
  WARNING: Device /dev/dm-23 not initialized in udev database even after waiting 10000000 microseconds.
TASK ERROR: clone failed: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --options lv_size /dev/vgdata/vm-123-disk-0' failed: got timeout

kern.log

Code:

... 
kernel: [2330700.135191] INFO: task qemu-img:13646 blocked for more than 120 seconds.
kernel: [2330700.135203]       Tainted: P        W  O      5.0.15-1-pve #1
kernel: [2330700.135207] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: [2330700.135213] qemu-img        D    0 13646  13619 0x00000000
kernel: [2330700.135218] Call Trace:
kernel: [2330700.135232]  __schedule+0x2d4/0x870
kernel: [2330700.135237]  schedule+0x2c/0x70
kernel: [2330700.135244]  io_schedule+0x16/0x40
kernel: [2330700.135251]  wait_on_page_bit+0x141/0x210
kernel: [2330700.135256]  ? file_check_and_advance_wb_err+0xe0/0xe0
kernel: [2330700.135262]  write_cache_pages+0x381/0x4d0
kernel: [2330700.135266]  ? __wb_calc_thresh+0x130/0x130
kernel: [2330700.135272]  generic_writepages+0x56/0x90
kernel: [2330700.135278]  blkdev_writepages+0xe/0x10
kernel: [2330700.135281]  do_writepages+0x41/0xd0
kernel: [2330700.135287]  ? __wake_up_common_lock+0x8e/0xc0
kernel: [2330700.135292]  __filemap_fdatawrite_range+0xc5/0x100
kernel: [2330700.135297]  filemap_write_and_wait+0x31/0x90
kernel: [2330700.135301]  __blkdev_put+0x72/0x1e0
kernel: [2330700.135304]  ? fsnotify+0x28b/0x3c0
kernel: [2330700.135307]  ? fsnotify+0x2ef/0x3c0
kernel: [2330700.135311]  blkdev_put+0x4c/0xd0
kernel: [2330700.135314]  blkdev_close+0x34/0x70
kernel: [2330700.135320]  __fput+0xbc/0x230
kernel: [2330700.135325]  ____fput+0xe/0x10
kernel: [2330700.135331]  task_work_run+0x9d/0xc0
kernel: [2330700.135338]  exit_to_usermode_loop+0xf2/0x100
kernel: [2330700.135342]  do_syscall_64+0xf0/0x110
kernel: [2330700.135348]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
kernel: [2330700.135351] RIP: 0033:0x7f46b99ee5d7
kernel: [2330700.135359] Code: Bad RIP value.
kernel: [2330700.135361] RSP: 002b:00007ffdc0680100 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
kernel: [2330700.135364] RAX: 0000000000000000 RBX: 000000000000000a RCX: 00007f46b99ee5d7
kernel: [2330700.135366] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000a
kernel: [2330700.135367] RBP: 0000000000000000 R08: 0000000000000001 R09: 00007f46ad6d2960
kernel: [2330700.135368] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
kernel: [2330700.135370] R13: 0000000000000000 R14: ffffffffffffffff R15: 0000000000000000

MPG · Nov 7, 2019

kernel2008 said:
I have the same problem but the command "udevadm trigger " doesn't work.
I have two LVM-Thin: "local-lvm" and "my-add-lvm".The problem disaper when cloning vm from local-lvm to local-lvm and creating new vm on my-add-lvm.Just I clone vm from my-add-lvm to my-add-lvm, and clone vm from local-lvm to my-add-lvm,the task must failure.The error message in kernel.log is same as you.

Same problem here.

Any news ?

Christos Dalamagkas · Nov 9, 2019

I was having the same problem, not able to clone/migrate storage to my second lvm-thin stogare (2.18 TiB RAID10 Logical Volume composed of 4 SAS HDDs). The primary lvm-thin (RAID1 logical volume composed of two SSDs) was working fin. The problem does not appear if I configure the second storage as a directory, but is desirable to make lvm-thin work on the second storage.

Although, I had to order a second server, exactly with the same specifications (HPE ProLiant ML350), and lvm-thin on the second storage worked like a charm!

After comparing the hw/sw specifications of the two servers one-by-one, the first server has different RAID controller (HPE Smart Array E208e-p SR Gen10) instead of the HPE Smart Array P408i-a SR Gen10 that is on the second server. That was the only difference between the two servers.

So, what is your RAID controller? Is it possible that the controller is incompatible with lvm-thin on SAS HDDs?

snakeoilos · Feb 27, 2020

What I did to "fix" the problem is to modify /etc/lvm/lvm.conf

Find the line "obtain_device_list_from_udev", change it from 1 to 0.

No reboot required. All LVM operations seems to work again after this change (short test so far).

Not entirely sure how safe this is, but it sure beats running "udevadm trigger" manually every time some LVM operations froze (simple things like running lvscan or vgscan).

jenssen99 · Mar 9, 2020

Hello,

I noticed that the problem is only when using LVM on SSD in Proxmox 6. When I use LVM in Proxmox 6 on a normal harddisk, there is no problem (in Proxmox 5 there is no problem with SSD and no problem with harddisk. I switched to ZFS, which also gives me no problem when taking snapshots. I did not test the option from snakeoilos above, I do not know if this is working.

Kind regards.

dabathor · Mar 25, 2020

Can confirm the above. I'm also running Proxmox VE 6.1-8. I only have problems using Move disk to my SSD LVM pool. Both of my mechanical disk pools have no issues using Move disk.

Running udevadm trigger when the task gets to like 98% worked for me.

When it did fail, I got a few different call traces in dmesg

Bash:

[Mon Mar 23 21:13:32 2020] INFO: task kvm:20935 blocked for more than 120 seconds.
[Mon Mar 23 21:13:32 2020]       Tainted: P           O      5.3.18-2-pve #1
[Mon Mar 23 21:13:32 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Mar 23 21:13:32 2020] kvm             D    0 20935      1 0x00000000
[Mon Mar 23 21:13:32 2020] Call Trace:
[Mon Mar 23 21:13:32 2020]  __schedule+0x2bb/0x660
[Mon Mar 23 21:13:32 2020]  schedule+0x33/0xa0
[Mon Mar 23 21:13:32 2020]  schedule_timeout+0x205/0x300
[Mon Mar 23 21:13:32 2020]  ? dm_make_request+0x56/0xb0
[Mon Mar 23 21:13:32 2020]  io_schedule_timeout+0x1e/0x50
[Mon Mar 23 21:13:32 2020]  wait_for_completion_io+0xb7/0x140
[Mon Mar 23 21:13:32 2020]  ? wake_up_q+0x80/0x80
[Mon Mar 23 21:13:32 2020]  submit_bio_wait+0x61/0x90
[Mon Mar 23 21:13:32 2020]  blkdev_issue_flush+0x8e/0xc0
[Mon Mar 23 21:13:32 2020]  blkdev_fsync+0x35/0x50
[Mon Mar 23 21:13:32 2020]  vfs_fsync_range+0x48/0x80
[Mon Mar 23 21:13:32 2020]  ? __fget_light+0x59/0x70
[Mon Mar 23 21:13:32 2020]  do_fsync+0x3d/0x70
[Mon Mar 23 21:13:32 2020]  __x64_sys_fdatasync+0x17/0x20
[Mon Mar 23 21:13:32 2020]  do_syscall_64+0x5a/0x130
[Mon Mar 23 21:13:32 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Mon Mar 23 21:13:32 2020] RIP: 0033:0x7f066c8b82e7
[Mon Mar 23 21:13:32 2020] Code: Bad RIP value.
[Mon Mar 23 21:13:32 2020] RSP: 002b:00007f05499fa780 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[Mon Mar 23 21:13:32 2020] RAX: ffffffffffffffda RBX: 000000000000001a RCX: 00007f066c8b82e7
[Mon Mar 23 21:13:32 2020] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000001a
[Mon Mar 23 21:13:32 2020] RBP: 00007f065fc6c930 R08: 0000000000000000 R09: 00007f05499fa730
[Mon Mar 23 21:13:32 2020] R10: 000000005e795dee R11: 0000000000000293 R12: 00005650d158b612
[Mon Mar 23 21:13:32 2020] R13: 00007f065fc6c998 R14: 00007f065fd28930 R15: 00007f065fdc6410
[Mon Mar 23 21:13:32 2020] INFO: task kvm:21173 blocked for more than 120 seconds.
[Mon Mar 23 21:13:32 2020]       Tainted: P           O      5.3.18-2-pve #1
[Mon Mar 23 21:13:32 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Mar 23 21:13:32 2020] kvm             D    0 21173      1 0x00000000
[Mon Mar 23 21:13:32 2020] Call Trace:
[Mon Mar 23 21:13:32 2020]  __schedule+0x2bb/0x660
[Mon Mar 23 21:13:32 2020]  schedule+0x33/0xa0
[Mon Mar 23 21:13:32 2020]  schedule_timeout+0x205/0x300
[Mon Mar 23 21:13:32 2020]  ? dm_make_request+0x56/0xb0
[Mon Mar 23 21:13:32 2020]  io_schedule_timeout+0x1e/0x50
[Mon Mar 23 21:13:32 2020]  wait_for_completion_io+0xb7/0x140
[Mon Mar 23 21:13:32 2020]  ? wake_up_q+0x80/0x80
[Mon Mar 23 21:13:32 2020]  submit_bio_wait+0x61/0x90
[Mon Mar 23 21:13:32 2020]  blkdev_issue_flush+0x8e/0xc0
[Mon Mar 23 21:13:32 2020]  blkdev_fsync+0x35/0x50
[Mon Mar 23 21:13:32 2020]  vfs_fsync_range+0x48/0x80
[Mon Mar 23 21:13:32 2020]  ? __fget_light+0x59/0x70
[Mon Mar 23 21:13:32 2020]  do_fsync+0x3d/0x70
[Mon Mar 23 21:13:32 2020]  __x64_sys_fdatasync+0x17/0x20
[Mon Mar 23 21:13:32 2020]  do_syscall_64+0x5a/0x130
[Mon Mar 23 21:13:32 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Mon Mar 23 21:13:32 2020] RIP: 0033:0x7f1d59c512e7
[Mon Mar 23 21:13:32 2020] Code: Bad RIP value.
[Mon Mar 23 21:13:32 2020] RSP: 002b:00007f1b22ef8780 EFLAGS: 00000293 ORIG_RAX: 000000000000004b
[Mon Mar 23 21:13:32 2020] RAX: ffffffffffffffda RBX: 0000000000000020 RCX: 00007f1d59c512e7
[Mon Mar 23 21:13:32 2020] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000020
[Mon Mar 23 21:13:32 2020] RBP: 00007f1d4ce6c930 R08: 0000000000000000 R09: 00000000ffffffff
[Mon Mar 23 21:13:32 2020] R10: 00007f1b22ef8760 R11: 0000000000000293 R12: 000055d6bc7a2612
[Mon Mar 23 21:13:32 2020] R13: 00007f1d4ce6c998 R14: 00007f1d498541c0 R15: 00007f1b3b230010

Bash:

[Mon Mar 23 21:23:36 2020] INFO: task systemd-udevd:1292 blocked for more than 120 seconds.
[Mon Mar 23 21:23:36 2020]       Tainted: P           O      5.3.18-2-pve #1
[Mon Mar 23 21:23:36 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Mon Mar 23 21:23:36 2020] systemd-udevd   D    0  1292    734 0x00000324
[Mon Mar 23 21:23:36 2020] Call Trace:
[Mon Mar 23 21:23:36 2020]  __schedule+0x2bb/0x660
[Mon Mar 23 21:23:36 2020]  schedule+0x33/0xa0
[Mon Mar 23 21:23:36 2020]  schedule_preempt_disabled+0xe/0x10
[Mon Mar 23 21:23:36 2020]  __mutex_lock.isra.10+0x2c9/0x4c0
[Mon Mar 23 21:23:36 2020]  __mutex_lock_slowpath+0x13/0x20
[Mon Mar 23 21:23:36 2020]  mutex_lock+0x2c/0x30
[Mon Mar 23 21:23:36 2020]  __blkdev_get+0x7a/0x560
[Mon Mar 23 21:23:36 2020]  blkdev_get+0xe0/0x140
[Mon Mar 23 21:23:36 2020]  ? blkdev_get_by_dev+0x50/0x50
[Mon Mar 23 21:23:36 2020]  blkdev_open+0x92/0x100
[Mon Mar 23 21:23:36 2020]  do_dentry_open+0x143/0x3a0
[Mon Mar 23 21:23:36 2020]  vfs_open+0x2d/0x30
[Mon Mar 23 21:23:36 2020]  path_openat+0x2bf/0x1570
[Mon Mar 23 21:23:36 2020]  ? page_add_file_rmap+0x119/0x160
[Mon Mar 23 21:23:36 2020]  ? alloc_set_pte+0x104/0x5c0
[Mon Mar 23 21:23:36 2020]  do_filp_open+0x93/0x100
[Mon Mar 23 21:23:36 2020]  ? strncpy_from_user+0x57/0x1b0
[Mon Mar 23 21:23:36 2020]  ? __alloc_fd+0x46/0x150
[Mon Mar 23 21:23:36 2020]  do_sys_open+0x177/0x280
[Mon Mar 23 21:23:36 2020]  __x64_sys_openat+0x20/0x30
[Mon Mar 23 21:23:36 2020]  do_syscall_64+0x5a/0x130
[Mon Mar 23 21:23:36 2020]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[Mon Mar 23 21:23:36 2020] RIP: 0033:0x7f068eb911ae
[Mon Mar 23 21:23:36 2020] Code: Bad RIP value.
[Mon Mar 23 21:23:36 2020] RSP: 002b:00007ffdf5eb9f90 EFLAGS: 00000246 ORIG_RAX: 0000000000000101
[Mon Mar 23 21:23:36 2020] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f068eb911ae
[Mon Mar 23 21:23:36 2020] RDX: 0000000000080000 RSI: 00005578447bdd70 RDI: 00000000ffffff9c
[Mon Mar 23 21:23:36 2020] RBP: 00007f068e3b0c60 R08: 00005578427d9270 R09: 000000000000000f
[Mon Mar 23 21:23:36 2020] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
[Mon Mar 23 21:23:36 2020] R13: 0000000000000000 R14: 0000000000000000 R15: 00005578447f6240

djtobyy · Apr 14, 2020

I have the same issue with moving a disk.
For me it is not possible to move a disk to the lvm-thin storage.

"udevadm trigger" didn't work for me.

demesg:

Code:

[Tue Apr 14 18:26:29 2020] INFO: task systemd-udevd:9740 blocked for more than 483 seconds.
[Tue Apr 14 18:26:29 2020]       Tainted: P           O      5.3.18-3-pve #1
[Tue Apr 14 18:26:29 2020] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Move Disk Log:

Code:

transferred: 536870912000 bytes remaining: 0 bytes total: 536870912000 bytes progression: 100.00 %
  WARNING: Device /dev/dm-29 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/dm-29 not initialized in udev database even after waiting 10000000 microseconds.
  Logical volume "vm-112-disk-0" successfully removed
  WARNING: Device /dev/dm-29 not initialized in udev database even after waiting 10000000 microseconds.
TASK ERROR: storage migration failed: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --options lv_size /dev/raid/vm-112-disk-0' failed: got timeout

sysctl status udev:

Code:

root@proxmox:/etc/udev# systemctl status udev
● systemd-udevd.service - udev Kernel Device Manager
   Loaded: loaded (/lib/systemd/system/systemd-udevd.service; static; vendor preset: enabled)
   Active: active (running) since Tue 2020-04-14 17:30:20 CEST; 1h 50min ago
     Docs: man:systemd-udevd.service(8)
           man:udev(7)
Main PID: 839 (systemd-udevd)
   Status: "Processing with 136 children at max"
    Tasks: 51
   Memory: 62.5M
   CGroup: /system.slice/systemd-udevd.service
           ├─  839 /lib/systemd/systemd-udevd
           ├─19287 /lib/systemd/systemd-udevd
           ├─19353 /lib/systemd/systemd-udevd
           ├─19354 /lib/systemd/systemd-udevd
           ├─19355 /lib/systemd/systemd-udevd
           ├─XXXXXXXX




Apr 14 19:15:28 proxmox systemd-udevd[839]: dm-29: Worker [19287] processing SEQNUM=15586 killed
Apr 14 19:15:35 proxmox systemd-udevd[19423]: Using default interface naming scheme 'v240'.

@Dominic you refered some threads to this one. Do you have a boot persistent solution?

snakeoilos · Apr 15, 2020

djtobyy said:
"udevadm trigger" didn't work for me.

You need to run this command only when the copy/move is around 96 or 98%.. You can also disable udev in lvm.conf as described above, but as mentioned do it at your own risk as I havn't tested this enough to know if it's causing issues or not (i.e. do not do this on production servers).

For now manually running udevadm trigger at the last stage should make it work.

Tmanok · Jun 12, 2020

Hello, I can confirm that I'm also seeing this issue but NOT on SSD, instead on a hardware RAID5. That tells me that the UDEV trigger is not happening at the right time because "something" above my head is unable to read enough information about the underlying storage.

While moving a VM Disk from NFS to local LVM storage I end up with this output:

Bash:

transferred: 34359738368 bytes remaining: 0 bytes total: 34359738368 bytes progression: 100.00 %
  WARNING: Device /dev/dm-7 not initialized in udev database even after waiting 10000000 microseconds.
  WARNING: Device /dev/dm-7 not initialized in udev database even after waiting 10000000 microseconds.
  Logical volume "vm-112-disk-0" successfully removed
  WARNING: Device /dev/dm-7 not initialized in udev database even after waiting 10000000 microseconds.
TASK ERROR: storage migration failed: command '/sbin/lvs --separator : --noheadings --units b --unbuffered --nosuffix --options lv_size /dev/pve/vm-112-disk-0' failed: got timeout

Clearly you can see like all the other output, lvs fails because /dev/pve/vm-112-disk-0 is actually a broken symbolic link to /dev/dm-7 and dm-7 has not be "initialized" because the trigger is not happening at the right time... Maybe? Anyway this is still affecting me on PVE 6-2.
Thanks,

bratak · Aug 18, 2020

PVE 6.2-4

Moving from HDD to SSD - same issue.
It was giving me the same Warnings.

udev admtrigger at 98% / 99% worked for me.
Thanks for this workaround.

brucexx · Aug 25, 2020

Had the same issue on PVE6.2-4 while updating which coincidentally was right after I imported a drive via "qm importdisk command".

udev admintrigger worked for me.

Thx

Rudy · Aug 26, 2020

Same for me after upgrading but only when migrating from a node not yet upgraded to an upgraded one. The udevadm trigger works for me. For the moment, no issue when migrating between same node "version". By the way, I also had this issue during two nodes upgrades (on five) during "Grub generation" and the command udevadm trigger solves this also.

Thank you

petronem · Sep 17, 2020

had same problem with v6.1-8. udevadm trigger didn't work (but i didn't do it at the 98%) i moved the disk a bunch of times between local storage and a vz. i deleted my snapshots and then was able to migrate the vm. i think i have an underlying raid controller issue or ssd issue because under any sort of load and this member hangs and then boots into dell's idrac complaining about things.

snakeoilos · Sep 18, 2020

petronem said:
had same problem with v6.1-8. udevadm trigger didn't work (but i didn't do it at the 98%) i moved the disk a bunch of times between local storage and a vz. i deleted my snapshots and then was able to migrate the vm. i think i have an underlying raid controller issue or ssd issue because under any sort of load and this member hangs and then boots into dell's idrac complaining about things.

You have to run the udevadm trigger command at the 98%, else it will not work. The problem only happens around that 95% mark. At that point the entire LVM sub-system is not working. Commands like lvdisplay etc will hang.

Run udevadm trigger and the LVMs will work again. And when LVM is working, your Proxmox ops is working. You can disable LVM's reliance on UDEV by refering to post #9...

zandut11 · Jan 25, 2021

inetshell said:
Please try using this command and see if it works fine after that:
udevadm trigger

Thanks Man, very helping

Failed to migrate disk (Device /dev/dm-xx not initialized in udev database even after waiting 10000

New Member

Member

New Member

New Member

New Member

Member

New Member

New Member

Active Member

Member

New Member

Active Member

Active Member

Renowned Member

Member

Renowned Member

Renowned Member

New Member

Active Member

New Member

We value your privacy