Hi,
I've just run into the same/similar issue.
I have 2 Proxmox nodes in a cluster, both running 6.3-6
I tried to migrate a VM from one node to another and experienced VERY slow migration. It got to about 7% after 30 minutes and I gave up. I then tried to take a backup of the VM and restore it on the other node. Again, the restore started but slowed down very quickly to the point where it seemed to have hung. I tried the 'udevadmin trigger' command but it did not appear to have any effect (I know this was way earlier than 95% complete but it doesn't seem to get that far).
I got the following reported in /var/log/kern.log:
Code:
Apr 2 11:18:01 gold kernel: [ 1547.122225] EXT4-fs (dm-6): mounted filesystem with ordered data mode. Opts: (null)
Apr 2 11:20:27 gold kernel: [ 1692.606490] INFO: task jbd2/dm-6-8:9894 blocked for more than 120 seconds.
Apr 2 11:20:27 gold kernel: [ 1692.606520] Tainted: P O 5.4.101-1-pve #1
Apr 2 11:20:27 gold kernel: [ 1692.606533] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 2 11:20:27 gold kernel: [ 1692.606551] jbd2/dm-6-8 D 0 9894 2 0x80004000
Apr 2 11:20:27 gold kernel: [ 1692.606552] Call Trace:
Apr 2 11:20:27 gold kernel: [ 1692.606558] __schedule+0x2e6/0x6f0
Apr 2 11:20:27 gold kernel: [ 1692.606559] schedule+0x33/0xa0
Apr 2 11:20:27 gold kernel: [ 1692.606560] io_schedule+0x16/0x40
Apr 2 11:20:27 gold kernel: [ 1692.606562] wait_on_page_bit+0x141/0x210
Apr 2 11:20:27 gold kernel: [ 1692.606564] ? file_fdatawait_range+0x30/0x30
Apr 2 11:20:27 gold kernel: [ 1692.606565] wait_on_page_writeback+0x43/0x90
Apr 2 11:20:27 gold kernel: [ 1692.606566] __filemap_fdatawait_range+0xae/0x120
Apr 2 11:20:27 gold kernel: [ 1692.606569] ? submit_bio+0x46/0x1c0
Apr 2 11:20:27 gold kernel: [ 1692.606570] ? bio_add_page+0x67/0x90
Apr 2 11:20:27 gold kernel: [ 1692.606572] filemap_fdatawait_range_keep_errors+0x12/0x40
Apr 2 11:20:27 gold kernel: [ 1692.606574] jbd2_journal_commit_transaction+0xba2/0x1750
Apr 2 11:20:27 gold kernel: [ 1692.606575] ? __switch_to_asm+0x34/0x70
Apr 2 11:20:27 gold kernel: [ 1692.606578] kjournald2+0xc8/0x270
Apr 2 11:20:27 gold kernel: [ 1692.606580] ? wait_woken+0x80/0x80
Apr 2 11:20:27 gold kernel: [ 1692.606582] kthread+0x120/0x140
Apr 2 11:20:27 gold kernel: [ 1692.606583] ? commit_timeout+0x20/0x20
Apr 2 11:20:27 gold kernel: [ 1692.606584] ? kthread_park+0x90/0x90
Apr 2 11:20:27 gold kernel: [ 1692.606585] ret_from_fork+0x35/0x40
Both nodes have SSD disks, with an lvm-thin storage device.
I also noticed that running 'lvs' on the node would hang while doing the restore. I had to reboot the proxmox node to get back to a normal state again. I eventually gave up and just created a new VM, which worked fine.
This is just a home lab, so not a major issue but it is a pain. A proper fix would be really appreciated.
Thanks
Simon