VM is not clonning

TreantBG · Feb 9, 2022

hello i have created template with clean ubuntu 21 and i try to clone it but it stucks at this in the photo.
can i have some assistance please?
im using Proxmox Virtual Environment 7.1-7

dcsapak · Feb 10, 2022

how long did you wait? sometimes flushing the changes to disk can take a while...
any relevant logs in the journal/syslog during that time?

TreantBG · Feb 10, 2022

i waited like 2 hours and no massages other then that

TreantBG · Feb 10, 2022

i did it again just to follow the logs and i get this

7:07:48 proxmox systemd-udevd[609]: dm-12: Worker [244634] processing SEQNUM=3825 killed
Feb 10 07:07:48 proxmox systemd-udevd[609]: Worker [244634] terminated by signal 9 (KILL)
Feb 10 07:07:48 proxmox systemd-udevd[609]: dm-12: Worker [244634] failed
Feb 10 07:07:50 proxmox pveproxy[1275]: worker 243180 finished
Feb 10 07:07:50 proxmox pveproxy[1275]: starting 1 worker(s)
Feb 10 07:07:50 proxmox pveproxy[1275]: worker 245085 started
Feb 10 07:07:51 proxmox pveproxy[245084]: worker exit
Feb 10 07:08:48 proxmox pveproxy[243412]: worker exit
Feb 10 07:08:48 proxmox pveproxy[1275]: worker 243412 finished
Feb 10 07:08:48 proxmox pveproxy[1275]: starting 1 worker(s)
Feb 10 07:08:48 proxmox pveproxy[1275]: worker 245234 started
Feb 10 07:08:59 proxmox kernel: [102225.471013] INFO: task kworker/u66:1:244386 blocked for more than 241 seconds.
Feb 10 07:08:59 proxmox kernel: [102225.471240] Tainted: P IO 5.13.19-2-pve #1
Feb 10 07:08:59 proxmox kernel: [102225.471410] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 10 07:08:59 proxmox kernel: [102225.471641] task:kworker/u66:1 state

stack: 0 pid:244386 ppid: 2 flags:0x00004000
Feb 10 07:08:59 proxmox kernel: [102225.471647] Workqueue: writeback wb_workfn (flush-253:12)
Feb 10 07:08:59 proxmox kernel: [102225.471655] Call Trace:
Feb 10 07:08:59 proxmox kernel: [102225.471661] __schedule+0x2fa/0x910
Feb 10 07:08:59 proxmox kernel: [102225.471669] schedule+0x4f/0xc0
Feb 10 07:08:59 proxmox kernel: [102225.471673] io_schedule+0x46/0x70
Feb 10 07:08:59 proxmox kernel: [102225.471676] wait_on_page_bit_common+0x127/0x3d0
Feb 10 07:08:59 proxmox kernel: [102225.471680] ? find_get_pages_range_tag+0x1ba/0x230
Feb 10 07:08:59 proxmox kernel: [102225.471682] ? page_cache_next_miss+0xe0/0xe0
Feb 10 07:08:59 proxmox kernel: [102225.471687] __lock_page+0x44/0x50
Feb 10 07:08:59 proxmox kernel: [102225.471688] write_cache_pages+0x20d/0x430
Feb 10 07:08:59 proxmox kernel: [102225.471692] ? __wb_calc_thresh+0x120/0x120
Feb 10 07:08:59 proxmox kernel: [102225.471696] generic_writepages+0x51/0x80
Feb 10 07:08:59 proxmox kernel: [102225.471698] ? fprop_fraction_percpu+0x34/0x80
Feb 10 07:08:59 proxmox kernel: [102225.471702] blkdev_writepages+0xe/0x10
Feb 10 07:08:59 proxmox kernel: [102225.471706] do_writepages+0x38/0xc0
Feb 10 07:08:59 proxmox kernel: [102225.471708] ? wb_calc_thresh+0x4f/0x70
Feb 10 07:08:59 proxmox kernel: [102225.471711] __writeback_single_inode+0x44/0x2a0
Feb 10 07:08:59 proxmox kernel: [102225.471713] writeback_sb_inodes+0x223/0x4d0
Feb 10 07:08:59 proxmox kernel: [102225.471716] __writeback_inodes_wb+0x56/0xf0
Feb 10 07:08:59 proxmox kernel: [102225.471718] wb_writeback+0x1dd/0x290
Feb 10 07:08:59 proxmox kernel: [102225.471720] wb_workfn+0x309/0x500
Feb 10 07:08:59 proxmox kernel: [102225.471722] ? psi_task_switch+0x121/0x250
Feb 10 07:08:59 proxmox kernel: [102225.471727] process_one_work+0x220/0x3c0
Feb 10 07:08:59 proxmox kernel: [102225.471730] worker_thread+0x53/0x420
Feb 10 07:08:59 proxmox kernel: [102225.471732] ? process_one_work+0x3c0/0x3c0
Feb 10 07:08:59 proxmox kernel: [102225.471734] kthread+0x12b/0x150
Feb 10 07:08:59 proxmox kernel: [102225.471738] ? set_kthread_struct+0x50/0x50
Feb 10 07:08:59 proxmox kernel: [102225.471741] ret_from_fork+0x22/0x30
Feb 10 07:08:59 proxmox kernel: [102225.471747] INFO: task qemu-img:244551 blocked for more than 241 seconds.
Feb 10 07:08:59 proxmox kernel: [102225.471944] Tainted: P IO 5.13.19-2-pve #1
Feb 10 07:08:59 proxmox kernel: [102225.472113] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Feb 10 07:08:59 proxmox kernel: [102225.472338] task:qemu-img state

stack: 0 pid:244551 ppid:244498 flags:0x00000000
Feb 10 07:08:59 proxmox kernel: [102225.472341] Call Trace:
Feb 10 07:08:59 proxmox kernel: [102225.472343] __schedule+0x2fa/0x910
Feb 10 07:08:59 proxmox kernel: [102225.472346] schedule+0x4f/0xc0
Feb 10 07:08:59 proxmox kernel: [102225.472348] io_schedule+0x46/0x70
Feb 10 07:08:59 proxmox kernel: [102225.472351] wait_on_page_bit_common+0x127/0x3d0
Feb 10 07:08:59 proxmox kernel: [102225.472353] ? page_cache_next_miss+0xe0/0xe0
Feb 10 07:08:59 proxmox kernel: [102225.472357] wait_on_page_bit+0x3f/0x50
Feb 10 07:08:59 proxmox kernel: [102225.472358] wait_on_page_writeback+0x29/0x80
Feb 10 07:08:59 proxmox kernel: [102225.472361] write_cache_pages+0x146/0x430
Feb 10 07:08:59 proxmox kernel: [102225.472364] ? __wb_calc_thresh+0x120/0x120
Feb 10 07:08:59 proxmox kernel: [102225.472367] generic_writepages+0x51/0x80
Feb 10 07:08:59 proxmox kernel: [102225.472370] blkdev_writepages+0xe/0x10
Feb 10 07:08:59 proxmox kernel: [102225.472372] do_writepages+0x38/0xc0
Feb 10 07:08:59 proxmox kernel: [102225.472375] ? inotify_handle_inode_event+0x10c/0x210
Feb 10 07:08:59 proxmox kernel: [102225.472379] __filemap_fdatawrite_range+0xcc/0x110
Feb 10 07:08:59 proxmox kernel: [102225.472382] filemap_write_and_wait_range+0x2b/0x80
Feb 10 07:08:59 proxmox kernel: [102225.472384] __blkdev_put+0x1da/0x1f0
Feb 10 07:08:59 proxmox kernel: [102225.472387] ? __fsnotify_parent+0xf0/0x2e0
Feb 10 07:08:59 proxmox kernel: [102225.472389] ? __fsnotify_parent+0x100/0x2e0
Feb 10 07:08:59 proxmox kernel: [102225.472392] blkdev_put+0x53/0x140
Feb 10 07:08:59 proxmox kernel: [102225.472394] blkdev_close+0x26/0x30
Feb 10 07:08:59 proxmox kernel: [102225.472396] __fput+0x9f/0x250
Feb 10 07:08:59 proxmox kernel: [102225.472400] ____fput+0xe/0x10
Feb 10 07:08:59 proxmox kernel: [102225.472402] task_work_run+0x6d/0xa0
Feb 10 07:08:59 proxmox kernel: [102225.472405] exit_to_user_mode_prepare+0x1a7/0x1b0
Feb 10 07:08:59 proxmox kernel: [102225.472409] syscall_exit_to_user_mode+0x27/0x50
Feb 10 07:08:59 proxmox kernel: [102225.472412] ? __x64_sys_close+0x12/0x40
Feb 10 07:08:59 proxmox kernel: [102225.472414] do_syscall_64+0x6e/0xb0
Feb 10 07:08:59 proxmox kernel: [102225.472418] ? irqentry_exit_to_user_mode+0x9/0x20
Feb 10 07:08:59 proxmox kernel: [102225.472420] ? irqentry_exit+0x19/0x30
Feb 10 07:08:59 proxmox kernel: [102225.472422] ? exc_page_fault+0x8f/0x170
Feb 10 07:08:59 proxmox kernel: [102225.472423] ? asm_exc_page_fault+0x8/0x30
Feb 10 07:08:59 proxmox kernel: [102225.472426] entry_SYSCALL_64_after_hwframe+0x44/0xae
Feb 10 07:08:59 proxmox kernel: [102225.472429] RIP: 0033:0x7f511c75211b
Feb 10 07:08:59 proxmox kernel: [102225.472431] RSP: 002b:00007ffed7697a90 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Feb 10 07:08:59 proxmox kernel: [102225.472434] RAX: 0000000000000000 RBX: 0000563da92df9c0 RCX: 00007f511c75211b
Feb 10 07:08:59 proxmox kernel: [102225.472436] RDX: 00007f511275dcc0 RSI: 0000000000000000 RDI: 0000000000000008
Feb 10 07:08:59 proxmox kernel: [102225.472437] RBP: 0000563da92d91b0 R08: 0000000000000000 R09: 0000563da92e1320
Feb 10 07:08:59 proxmox kernel: [102225.472439] R10: 0000000000000008 R11: 0000000000000293 R12: 0000563da92c8330
Feb 10 07:08:59 proxmox kernel: [102225.472440] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000008

dcsapak · Feb 11, 2022

the message indicate that the kernel hung for some time, but does not show the root cause (i guess faulty/slow hardware?)

can you post the complete output of 'dmesg' ?

Mad-Max · Aug 5, 2022

Hello,
The same problem on Proxmox 6.3-2.
2 nodes cluster with 3 LVM-thin pools on each node. When trying to do anything with moving/creating VMs the process "hangs" for infinite time that leads pvedaemon to malfunction and to stop processing any other tasks. pvedaemon cant even stop/start VM after that

Syslog output when trying to create VM from template:

Aug 5 11:35:20 pmx01 kernel: [ 2056.950722] INFO: task kworker/u130:3:6748 blocked for more than 120 seconds.
Aug 5 11:35:20 pmx01 kernel: [ 2056.952156] Tainted: P O 5.4.73-1-pve #1
Aug 5 11:35:20 pmx01 kernel: [ 2056.953393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 5 11:35:20 pmx01 kernel: [ 2056.954690] kworker/u130:3 D 0 6748 2 0x80004000
Aug 5 11:35:20 pmx01 kernel: [ 2056.954711] Workqueue: writeback wb_workfn (flush-253:43)
Aug 5 11:35:20 pmx01 kernel: [ 2056.954718] Call Trace:
Aug 5 11:35:20 pmx01 kernel: [ 2056.954734] __schedule+0x2e6/0x6f0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954738] schedule+0x33/0xa0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954741] io_schedule+0x16/0x40
Aug 5 11:35:20 pmx01 kernel: [ 2056.954750] __lock_page+0x122/0x220
Aug 5 11:35:20 pmx01 kernel: [ 2056.954756] ? file_fdatawait_range+0x30/0x30
Aug 5 11:35:20 pmx01 kernel: [ 2056.954760] write_cache_pages+0x22b/0x4a0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954764] ? __wb_calc_thresh+0x130/0x130
Aug 5 11:35:20 pmx01 kernel: [ 2056.954769] generic_writepages+0x56/0x90
Aug 5 11:35:20 pmx01 kernel: [ 2056.954777] blkdev_writepages+0xe/0x10
Aug 5 11:35:20 pmx01 kernel: [ 2056.954786] do_writepages+0x41/0xd0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954795] ? ttwu_do_wakeup+0x1e/0x150
Aug 5 11:35:20 pmx01 kernel: [ 2056.954798] ? ttwu_do_activate+0x5a/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954802] __writeback_single_inode+0x40/0x350
Aug 5 11:35:20 pmx01 kernel: [ 2056.954806] ? try_to_wake_up+0x67/0x650
Aug 5 11:35:20 pmx01 kernel: [ 2056.954810] writeback_sb_inodes+0x209/0x4a0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954818] __writeback_inodes_wb+0x66/0xd0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954825] wb_writeback+0x25b/0x2f0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954834] wb_workfn+0x33e/0x490
Aug 5 11:35:20 pmx01 kernel: [ 2056.954842] ? __switch_to_asm+0x40/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954848] ? __switch_to_asm+0x34/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954855] ? __switch_to_asm+0x40/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954861] ? __switch_to_asm+0x34/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954866] ? __switch_to_asm+0x40/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954871] ? __switch_to_asm+0x34/0x70
Aug 5 11:35:20 pmx01 kernel: [ 2056.954880] ? __switch_to+0x85/0x480
Aug 5 11:35:20 pmx01 kernel: [ 2056.954883] ? __schedule+0x2ee/0x6f0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954890] process_one_work+0x20f/0x3d0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954893] worker_thread+0x34/0x400
Aug 5 11:35:20 pmx01 kernel: [ 2056.954898] kthread+0x120/0x140
Aug 5 11:35:20 pmx01 kernel: [ 2056.954901] ? process_one_work+0x3d0/0x3d0
Aug 5 11:35:20 pmx01 kernel: [ 2056.954904] ? kthread_park+0x90/0x90
Aug 5 11:35:20 pmx01 kernel: [ 2056.954907] ret_from_fork+0x35/0x40
Aug 5 11:35:20 pmx01 kernel: [ 2056.954918] INFO: task qemu-img:13875 blocked for more than 120 seconds.
Aug 5 11:35:20 pmx01 kernel: [ 2056.956150] Tainted: P O 5.4.73-1-pve #1
Aug 5 11:35:20 pmx01 kernel: [ 2056.957412] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 5 11:35:20 pmx01 kernel: [ 2056.958782] qemu-img D 0 13875 13768 0x00004003
Aug 5 11:35:20 pmx01 kernel: [ 2056.958786] Call Trace:
Aug 5 11:35:20 pmx01 kernel: [ 2056.958791] __schedule+0x2e6/0x6f0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958795] schedule+0x33/0xa0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958797] io_schedule+0x16/0x40
Aug 5 11:35:20 pmx01 kernel: [ 2056.958801] wait_on_page_bit+0x141/0x210
Aug 5 11:35:20 pmx01 kernel: [ 2056.958806] ? file_fdatawait_range+0x30/0x30
Aug 5 11:35:20 pmx01 kernel: [ 2056.958814] wait_on_page_writeback+0x43/0x90
Aug 5 11:35:20 pmx01 kernel: [ 2056.958821] write_cache_pages+0x158/0x4a0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958828] ? __wb_calc_thresh+0x130/0x130
Aug 5 11:35:20 pmx01 kernel: [ 2056.958842] ? task_non_contending+0xdf/0x3d0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958853] generic_writepages+0x56/0x90
Aug 5 11:35:20 pmx01 kernel: [ 2056.958860] blkdev_writepages+0xe/0x10
Aug 5 11:35:20 pmx01 kernel: [ 2056.958863] do_writepages+0x41/0xd0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958867] ? __wake_up_common_lock+0x8c/0xc0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958872] ? wbc_attach_and_unlock_inode+0x11d/0x140
Aug 5 11:35:20 pmx01 kernel: [ 2056.958877] __filemap_fdatawrite_range+0xcb/0x100
Aug 5 11:35:20 pmx01 kernel: [ 2056.958882] filemap_write_and_wait+0x42/0xa0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958886] __blkdev_put+0x1f0/0x210
Aug 5 11:35:20 pmx01 kernel: [ 2056.958889] blkdev_put+0x4c/0xd0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958892] blkdev_close+0x25/0x30
Aug 5 11:35:20 pmx01 kernel: [ 2056.958899] __fput+0xc6/0x260
Aug 5 11:35:20 pmx01 kernel: [ 2056.958903] ____fput+0xe/0x10
Aug 5 11:35:20 pmx01 kernel: [ 2056.958907] task_work_run+0x9d/0xc0
Aug 5 11:35:20 pmx01 kernel: [ 2056.958913] ptrace_notify+0x76/0x80
Aug 5 11:35:20 pmx01 kernel: [ 2056.958928] syscall_slow_exit_work+0xe3/0x150
Aug 5 11:35:20 pmx01 kernel: [ 2056.958935] do_syscall_64+0x153/0x190
Aug 5 11:35:20 pmx01 kernel: [ 2056.958943] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Aug 5 11:35:20 pmx01 kernel: [ 2056.958950] RIP: 0033:0x7f8e9256b5d7
Aug 5 11:35:20 pmx01 kernel: [ 2056.958963] Code: Bad RIP value.
Aug 5 11:35:20 pmx01 kernel: [ 2056.958965] RSP: 002b:00007ffd7860cca0 EFLAGS: 00000293 ORIG_RAX: 0000000000000003
Aug 5 11:35:20 pmx01 kernel: [ 2056.958968] RAX: 0000000000000000 RBX: 000000000000000a RCX: 00007f8e9256b5d7
Aug 5 11:35:20 pmx01 kernel: [ 2056.958970] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000000000a
Aug 5 11:35:20 pmx01 kernel: [ 2056.958971] RBP: 0000000000000000 R08: 00007ffd7860cca4 R09: 00007f8e85d80c20
Aug 5 11:35:20 pmx01 kernel: [ 2056.958973] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffd7860cee8
Aug 5 11:35:20 pmx01 kernel: [ 2056.958974] R13: 0000000000000000 R14: 00000000ffffffff R15: 0000000000000000

After that the process of creating VM goes to interruptible sleep and then to zombie state. Later udev shows:

Aug 5 11:36:10 pmx01 systemd-udevd[1167]: dm-43: Worker [1221] processing SEQNUM=22532 killed
Aug 5 11:36:10 pmx01 systemd-udevd[1167]: Worker [1221] terminated by signal 9 (KILL)
Aug 5 11:36:10 pmx01 systemd-udevd[1167]: dm-43: Worker [1221] failed

Dmesg show the same output as syslog. What can be done about that?

Mad-Max · Sep 13, 2022

dcsapak said:
the message indicate that the kernel hung for some time, but does not show the root cause (i guess faulty/slow hardware?)

can you post the complete output of 'dmesg' ?

Can you please give advice what to do in this vary situation?
Because resetup is currently impossible, and there is no way how to create VM

dcsapak · Sep 13, 2022

Mad-Max said:
Aug 5 11:35:20 pmx01 kernel: [ 2056.950722] INFO: task kworker/u130:3:6748 blocked for more than 120 seconds.

Mad-Max said:
Aug 5 11:35:20 pmx01 kernel: [ 2056.954918] INFO: task qemu-img:13875 blocked for more than 120 seconds.

basically it tells you that it the process hung in io tasks for more than 2 minutes.

this is most often an indicator that the storage is too slow for the operations done on it (e.g. too many vms that do io, too many disk operations in parallel, some combination of those, etc.)

Mad-Max · Sep 13, 2022

dcsapak said:
basically it tells you that it the process hung in io tasks for more than 2 minutes.

this is most often an indicator that the storage is too slow for the operations done on it (e.g. too many vms that do io, too many disk operations in parallel, some combination of those, etc.)

I've turned all the VMs on the node, so nothing would use the disk IO, and tried to make several tests on both nodes in cluster:
1. Tried to create VM from template
2. Tried to create VM from ISO

Creation hangs while creating LVM logical volume. Process is in S state and after couple of minutes goes to R state (when checking via ps command) and hangs. When it hangs the SSD disk utilization rises from ~30% to 100% and remains in this state for infinite time

I can't kill this process via kill -9 (waited for 3 hours till tried to kill the process). Syslog shows:

Code:

Aug  5 11:34:10 pmx01 systemd-udevd[1167]: dm-43: Worker [1221] processing SEQNUM=22532 is taking a long time

and then

Code:

Aug  5 11:35:20 pmx01 kernel: [ 2056.950722] INFO: task kworker/u130:3:6748 blocked for more than 120 seconds.
Aug  5 11:35:20 pmx01 kernel: [ 2056.952156]       Tainted: P           O      5.4.73-1-pve #1
Aug  5 11:35:20 pmx01 kernel: [ 2056.953393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug  5 11:35:20 pmx01 kernel: [ 2056.954690] kworker/u130:3  D    0  6748      2 0x80004000
Aug  5 11:35:20 pmx01 kernel: [ 2056.954711] Workqueue: writeback wb_workfn
Aug  5 11:35:20 pmx01 kernel: [ 2056.954718] Call Trace:
Aug  5 11:35:20 pmx01 kernel: [ 2056.954734]  __schedule+0x2e6/0x6f0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954738]  schedule+0x33/0xa0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954741]  io_schedule+0x16/0x40
Aug  5 11:35:20 pmx01 kernel: [ 2056.954750]  __lock_page+0x122/0x220
Aug  5 11:35:20 pmx01 kernel: [ 2056.954756]  ? file_fdatawait_range+0x30/0x30
Aug  5 11:35:20 pmx01 kernel: [ 2056.954760]  write_cache_pages+0x22b/0x4a0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954764]  ? __wb_calc_thresh+0x130/0x130
Aug  5 11:35:20 pmx01 kernel: [ 2056.954769]  generic_writepages+0x56/0x90
Aug  5 11:35:20 pmx01 kernel: [ 2056.954777]  blkdev_writepages+0xe/0x10
Aug  5 11:35:20 pmx01 kernel: [ 2056.954786]  do_writepages+0x41/0xd0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954795]  ? ttwu_do_wakeup+0x1e/0x150
Aug  5 11:35:20 pmx01 kernel: [ 2056.954798]  ? ttwu_do_activate+0x5a/0x70
Aug  5 11:35:20 pmx01 kernel: [ 2056.954802]  __writeback_single_inode+0x40/0x350
Aug  5 11:35:20 pmx01 kernel: [ 2056.954806]  ? try_to_wake_up+0x67/0x650
Aug  5 11:35:20 pmx01 kernel: [ 2056.954810]  writeback_sb_inodes+0x209/0x4a0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954818]  __writeback_inodes_wb+0x66/0xd0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954825]  wb_writeback+0x25b/0x2f0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954834]  wb_workfn+0x33e/0x490
Aug  5 11:35:20 pmx01 kernel: [ 2056.954842]  ? __switch_to_asm+0x40/0x70
Aug  5 11:35:20 pmx01 kernel: [ 2056.954848]  ? __switch_to_asm+0x34/0x70
Aug  5 11:35:20 pmx01 kernel: [ 2056.954855]  ? __switch_to_asm+0x40/0x70
Aug  5 11:35:20 pmx01 kernel: [ 2056.954861]  ? __switch_to_asm+0x34/0x70
Aug  5 11:35:20 pmx01 kernel: [ 2056.954866]  ? __switch_to_asm+0x40/0x70
Aug  5 11:35:20 pmx01 kernel: [ 2056.954871]  ? __switch_to_asm+0x34/0x70
Aug  5 11:35:20 pmx01 kernel: [ 2056.954880]  ? __switch_to+0x85/0x480
Aug  5 11:35:20 pmx01 kernel: [ 2056.954883]  ? __schedule+0x2ee/0x6f0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954890]  process_one_work+0x20f/0x3d0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954893]  worker_thread+0x34/0x400
Aug  5 11:35:20 pmx01 kernel: [ 2056.954898]  kthread+0x120/0x140
Aug  5 11:35:20 pmx01 kernel: [ 2056.954901]  ? process_one_work+0x3d0/0x3d0
Aug  5 11:35:20 pmx01 kernel: [ 2056.954904]  ? kthread_park+0x90/0x90
Aug  5 11:35:20 pmx01 kernel: [ 2056.954907]  ret_from_fork+0x35/0x40
Aug  5 11:35:20 pmx01 kernel: [ 2056.954918] INFO: task qemu-img:13875 blocked for more than 120 seconds.
Aug  5 11:35:20 pmx01 kernel: [ 2056.956150]       Tainted: P           O      5.4.73-1-pve #1

this happens on each of 2 SSDs and a HDD drive on each node. So I assume - this is not the SSD/HDD problem. When the creation of VM hangs - pvedaemon fails, and has to be restarted manually (but it has to wait all the timeouts because it tries to kill the VM creation process).

Were there any issues with LVM in general or with the exact version of lvm2 package in your practice? Because SSD/HDD disks don't show any errors in smart

dcsapak · Sep 13, 2022

whats your 'pveversion -v' and what kind of hardware do you run it on ? (cpu/memory/storage/etc)

Mad-Max · Sep 18, 2022

dcsapak said:
whats your 'pveversion -v' and what kind of hardware do you run it on ? (cpu/memory/storage/etc)

pveversion:

Code:

# pveversion -v

proxmox-ve: 6.3-1 (running kernel: 5.4.73-1-pve)
pve-manager: 6.3-2 (running version: 6.3-2/22f57405)
pve-kernel-5.4: 6.3-1
pve-kernel-helper: 6.3-1
pve-kernel-5.4.73-1-pve: 5.4.73-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.4-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: residual config
ifupdown2: 3.0.0-1+pve3
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.16-pve1
libproxmox-acme-perl: 1.0.5
libproxmox-backup-qemu0: 1.0.2-1
libpve-access-control: 6.1-3
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.2-6
libpve-guest-common-perl: 3.1-3
libpve-http-server-perl: 3.0-6
libpve-storage-perl: 6.3-1
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.3-1
lxcfs: 4.0.3-pve3
novnc-pve: 1.1.0-1
proxmox-backup-client: 1.0.5-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.4-3
pve-cluster: 6.2-1
pve-container: 3.3-1
pve-docs: 6.3-1
pve-edk2-firmware: 2.20200531-1
pve-firewall: 4.1-3
pve-firmware: 3.1-3
pve-ha-manager: 3.1-1
pve-i18n: 2.2-2
pve-qemu-kvm: 5.1.0-7
pve-xtermjs: 4.7.0-3
qemu-server: 6.3-1
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-2
zfsutils-linux: 0.8.5-pve1

CPU: Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz
RAM: 256GB (16 x 16GB) HP 1600 MT/s
SSD: 2 x "Crucial/Micron 2.5 inch CT1000MX500SSD1 1TB"
HDD: Hitachi/HGST Ultrastar 7K2

Both nodes have same parameters

dcsapak · Sep 19, 2022

ok, first i'd upgrade your nodes to at least 6.4, but even better would be a supported version of proxmox (currently 7.2)

also, which models of the hdds? is your vm storage on the hdds?

Mad-Max · Sep 21, 2022

dcsapak said:
ok, first i'd upgrade your nodes to at least 6.4, but even better would be a supported version of proxmox (currently 7.2)

also, which models of the hdds? is your vm storage on the hdds?

The exact model of HDDs is HGST HUS722T2TALA604
Each node has 2 SSDs and 1 HDD. Each drive is configured as RAID0 and has LVM Thin pool on it (so total of 3 LVM Thin pools on each node). Generally all VMs are on SSDs, HDD is used for backups or as secondary drive for VM to keep some data

As for upgrade:
Already read a lot of articles how to upgrade Proxmox, but I'm afraid that something could go wrong and after upgrade and cluster will be broken or VMs won't start. Is there a way to secure the cluster while upgrading?

dcsapak · Sep 21, 2022

Mad-Max said:
Each node has 2 SSDs and 1 HDD. Each drive is configured as RAID0 and has LVM Thin pool on it (so total of 3 LVM Thin pools on each node). Generally all VMs are on SSDs, HDD is used for backups or as secondary drive for VM to keep some data

ok the raid card may also play a role here.. any chance to remove that from the equation (so using a hba/onboard sata instead?)

Mad-Max said:
Already read a lot of articles how to upgrade Proxmox, but I'm afraid that something could go wrong and after upgrade and cluster will be broken or VMs won't start. Is there a way to secure the cluster while upgrading?

the upgrades should normally go well if you follow the upgrade guide, but in any case it would be good to have proper working backups of your vms/configuration which you can restore if something goes wrong

Mad-Max · Sep 24, 2022

dcsapak said:
ok the raid card may also play a role here.. any chance to remove that from the equation (so using a hba/onboard sata instead?)

the upgrades should normally go well if you follow the upgrade guide, but in any case it would be good to have proper working backups of your vms/configuration which you can restore if something goes wrong

There is no way to remove Raid card, but this problem occurred on both servers at the same time, so I don't think that raid card could misbehave on both servers.

The process of creating VM stucks on creating LVM Logical Volume, but existent VMs continue to work as expected.

As for upgrade:
Having backup of VMs is obvious, but what kind of configuration would you recommend to save from node itself?

dcsapak · Sep 26, 2022

Mad-Max said:
Having backup of VMs is obvious, but what kind of configuration would you recommend to save from node itself?

depends on what you configured, most probably the network config

having a backup of the /etc folder is often a good idea anyway (should contain nearly all system configuration normally)

Search

Search

VM is not clonning

TreantBG

New Member

dcsapak

Proxmox Staff Member

TreantBG

New Member

TreantBG

New Member

dcsapak

Proxmox Staff Member

Mad-Max

Member

Mad-Max

Member

dcsapak

Proxmox Staff Member

Mad-Max

Member

dcsapak

Proxmox Staff Member

Mad-Max

Member

dcsapak

Proxmox Staff Member

Mad-Max

Member

dcsapak

Proxmox Staff Member

Mad-Max

Member

dcsapak

Proxmox Staff Member

We value your privacy