Note: Already posted on https://forum.proxmox.com/threads/server-disk-i-o-delay-100-during-cloning-and-backup.173051/page-3
Hey,
I seem to have the same problem since I've upgraded from Proxmox 8 to 9 - or at least something similar (currently running Kernel 7.0.2-4-pve).
My system freezes when I clone or restore a VM. No problem with backups or while running the system.
Freeze:
- I still can access the web gui, sometimes all VM States are not populated (question mark)
- I can access the server trough SSH, sometime also via webshell but not always.
- Reboot can be initiated via web gui or SSH but will hang somewhere in the process, need to reboot as shown below.
- I've just retried today to restore a 80GB backup - the restore process was at 100% within ~2 minutes but hung after that - I had to reboot the server after ~30minutes no progress
Mostly the clone/restore works until 100% and THEN the systems starts to hang before the log shows TASK OK.
The other VMs start to freeze and I cannot reboot the server - I have to use the following commands via SSH/Shell to reboot the server:
Could it have to do something with this?
https://bugzilla.proxmox.com/show_bug.cgi?id=7052
Hardware: ProLiant DL360 G7
RAID: HP Smart Array G6 with Spinning discs (one Array for system and VM), ssacli shows all disks OK
Configuration:
- System and VMs on LVM Thin
Addition:
I had the same or similar problem on another server, this one locked itself up multiple times.
I was able to resolve the self lock up by moving the VM Disks to another RAID on the same server (still spinning discs)
Logfile excempts:
iostat -xz 1 shows large w_await (>290000 ms), and 100% utilization
Journalctl:
Hey,
I seem to have the same problem since I've upgraded from Proxmox 8 to 9 - or at least something similar (currently running Kernel 7.0.2-4-pve).
My system freezes when I clone or restore a VM. No problem with backups or while running the system.
Freeze:
- I still can access the web gui, sometimes all VM States are not populated (question mark)
- I can access the server trough SSH, sometime also via webshell but not always.
- Reboot can be initiated via web gui or SSH but will hang somewhere in the process, need to reboot as shown below.
- I've just retried today to restore a 80GB backup - the restore process was at 100% within ~2 minutes but hung after that - I had to reboot the server after ~30minutes no progress
Mostly the clone/restore works until 100% and THEN the systems starts to hang before the log shows TASK OK.
The other VMs start to freeze and I cannot reboot the server - I have to use the following commands via SSH/Shell to reboot the server:
Code:
echo 1 > /proc/sys/kernel/sysrq
echo s > /proc/sysrq-trigger
sleep 2
echo u > /proc/sysrq-trigger
sleep 2
echo b > /proc/sysrq-trigger
Could it have to do something with this?
https://bugzilla.proxmox.com/show_bug.cgi?id=7052
Hardware: ProLiant DL360 G7
RAID: HP Smart Array G6 with Spinning discs (one Array for system and VM), ssacli shows all disks OK
Configuration:
- System and VMs on LVM Thin
Addition:
I had the same or similar problem on another server, this one locked itself up multiple times.
I was able to resolve the self lock up by moving the VM Disks to another RAID on the same server (still spinning discs)
Logfile excempts:
iostat -xz 1 shows large w_await (>290000 ms), and 100% utilization
Journalctl:
Code:
May 13 17:44:47 host kernel: INFO: task iou-wrk-1650:1748 blocked for more than 122 seconds.
May 13 17:44:47 host kernel: Tainted: P IO 7.0.2-2-pve #1
May 13 17:44:47 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 17:44:47 host kernel: task:iou-wrk-1650 state:D stack:0 pid:1748 tgid:1650 ppid:1 task_flags:0x84040d0 flags:0x00080000
May 13 17:44:47 host kernel: Call Trace:
May 13 17:44:47 host kernel: <TASK>
May 13 17:44:47 host kernel: __schedule+0x495/0x1760
May 13 17:44:47 host kernel: ? __blk_flush_plug+0xef/0x150
May 13 17:44:47 host kernel: schedule+0x27/0xf0
May 13 17:44:47 host kernel: io_schedule+0x4c/0x80
May 13 17:44:47 host kernel: folio_wait_bit_common+0x136/0x340
May 13 17:44:47 host kernel: ? __pfx_wake_page_function+0x10/0x10
May 13 17:44:47 host kernel: folio_wait_bit+0x18/0x30
May 13 17:44:47 host kernel: folio_wait_writeback+0x3d/0xb0
May 13 17:44:47 host kernel: writeback_iter+0xda/0x310
May 13 17:44:47 host kernel: blkdev_writepages+0x7f/0xd0
May 13 17:44:47 host kernel: do_writepages+0xc4/0x180
May 13 17:44:47 host kernel: filemap_writeback+0xd1/0x100
May 13 17:44:47 host kernel: file_write_and_wait_range+0x60/0xd0
May 13 17:44:47 host kernel: blkdev_fsync+0x36/0x60
May 13 17:44:47 host kernel: vfs_fsync_range+0x2d/0xa0
May 13 17:44:47 host kernel: io_fsync+0x3d/0x60
May 13 17:44:47 host kernel: __io_issue_sqe+0x43/0x1b0
May 13 17:44:47 host kernel: io_issue_sqe+0x3e/0x5b0
May 13 17:44:47 host kernel: io_wq_submit_work+0xdf/0x380
May 13 17:44:47 host kernel: io_worker_handle_work+0x13d/0x570
May 13 17:44:47 host kernel: io_wq_worker+0x101/0x3b0
May 13 17:44:47 host kernel: ? raw_spin_rq_unlock+0x14/0x50
May 13 17:44:47 host kernel: ? finish_task_switch.isra.0+0x95/0x2f0
May 13 17:44:47 host kernel: ? __pfx_io_wq_worker+0x10/0x10
May 13 17:44:47 host kernel: ret_from_fork+0x2dc/0x3a0
May 13 17:44:47 host kernel: ? __pfx_io_wq_worker+0x10/0x10
May 13 17:44:47 host kernel: ret_from_fork_asm+0x1a/0x30
May 13 17:44:47 host kernel: RIP: 0033:0x0
May 13 17:44:47 host kernel: RSP: 002b:0000000000000000 EFLAGS: 00000246 ORIG_RAX: 00000000000001aa
May 13 17:44:47 host kernel: RAX: 0000000000000000 RBX: 00005b32e77b52d8 RCX: 00007c13d6ce63ca
May 13 17:44:47 host kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000001a
May 13 17:44:47 host kernel: RBP: 00005b32e77b53c0 R08: 0000000000000000 R09: 0000000000000008
May 13 17:44:47 host kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00005b32e77b52d0
May 13 17:44:47 host kernel: R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000000
May 13 17:44:47 host kernel: </TASK>
May 13 19:19:16 host kernel: INFO: task worker:1889 blocked for more than 122 seconds.
May 13 19:19:16 host kernel: Tainted: P IO 7.0.2-2-pve #1
May 13 19:19:16 host kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 13 19:19:16 host kernel: task:worker state:D stack:0 pid:1889 tgid:1881 ppid:1854 task_flags:0x400040 flags:0x00080000
May 13 19:19:16 host kernel: Call Trace:
May 13 19:19:16 host kernel: <TASK>
May 13 19:19:16 host kernel: __schedule+0x495/0x1760
May 13 19:19:16 host kernel: ? __submit_bio+0x196/0x250
May 13 19:19:16 host kernel: ? __pfx_bit_wait_io+0x10/0x10
May 13 19:19:16 host kernel: schedule+0x27/0xf0
May 13 19:19:16 host kernel: io_schedule+0x4c/0x80
May 13 19:19:16 host kernel: bit_wait_io+0x11/0x80
May 13 19:19:16 host kernel: __wait_on_bit+0x34/0xa0
May 13 19:19:16 host kernel: out_of_line_wait_on_bit+0x8d/0xc0
May 13 19:19:16 host kernel: ? __pfx_wake_bit_function+0x10/0x10
May 13 19:19:16 host kernel: __block_write_begin_int+0x24f/0x560
May 13 19:19:16 host kernel: iomap_write_begin+0x4cf/0x790
May 13 19:19:16 host kernel: ? radix_tree_lookup+0xd/0x20
May 13 19:19:16 host kernel: iomap_file_buffered_write+0x1f8/0x4a0
May 13 19:19:16 host kernel: blkdev_write_iter+0x192/0x350
May 13 19:19:16 host kernel: ? rw_verify_area+0x57/0x190
May 13 19:19:16 host kernel: vfs_write+0x274/0x490
May 13 19:19:16 host kernel: __x64_sys_pwrite64+0x98/0xd0
May 13 19:19:16 host kernel: x64_sys_call+0x1d12/0x2390
May 13 19:19:16 host kernel: do_syscall_64+0x11c/0x14e0
May 13 19:19:16 host kernel: ? do_syscall_64+0x311/0x14e0
May 13 19:19:16 host kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 13 19:19:16 host kernel: RIP: 0033:0x7341e8ea69ee
May 13 19:19:16 host kernel: RSP: 002b:00007341dd7f5f28 EFLAGS: 00000246 ORIG_RAX: 0000000000000012
May 13 19:19:16 host kernel: RAX: ffffffffffffffda RBX: 00007341dd7fa6c0 RCX: 00007341e8ea69ee
May 13 19:19:16 host kernel: RDX: 0000000000200000 RSI: 00007341e4e3a000 RDI: 000000000000000a
May 13 19:19:16 host kernel: RBP: 00007341e4e3a000 R08: 0000000000000000 R09: 0000000000000000
May 13 19:19:16 host kernel: R10: 00000000db1ffe00 R11: 0000000000000246 R12: 0000000000000000
May 13 19:19:16 host kernel: R13: 00005b76c37f41de R14: 00005b76fb84cf58 R15: 00007341dcffa000
May 13 19:19:16 host kernel: </TASK>
Last edited: