Hello all,
I have noticed occasional issues with ubuntu ProxMox VE VMs running on 8.4.14 VE servers. VMs and lagged and report operational and I/O issues with disk / filesystem access resulting in CPU hung tasks during backups on a 4.1.1 ProxMox Backup Server using S3 datastores
On the VM I see the following errors:
[6635504.997115] INFO: task kworker/u4:0:2245157 blocked for more than 120 seconds.
[6635504.998911] Tainted: G W 5.15.0-161-generic #171-Ubuntu
[6635505.000654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[6635505.002548] task:kworker/u4:0 state
stack: 0 pid:2245157 ppid: 2 flags:0x00004000
[6635505.002552] Workqueue: writeback wb_workfn (flush-253:2)
[6635505.002556] Call Trace:
[6635505.002557] <TASK>
[6635505.002558] __schedule+0x24e/0x590
[6635505.002561] schedule+0x69/0x110
[6635505.002562] io_schedule+0x46/0x80
[6635505.002564] ? wbt_cleanup_cb+0x20/0x20
[6635505.002566] rq_qos_wait+0xd0/0x170
[6635505.002568] ? wbt_rqw_done+0x110/0x110
[6635505.002570] ? sysv68_partition+0x280/0x280
[6635505.002572] ? wbt_cleanup_cb+0x20/0x20
[6635505.002574] wbt_wait+0x9f/0xf0
[6635505.002576] __rq_qos_throttle+0x28/0x40
[6635505.002578] blk_mq_submit_bio+0x127/0x610
[6635505.002581] __submit_bio+0x1ee/0x220
[6635505.002584] __submit_bio_noacct+0x85/0x200
[6635505.002586] submit_bio_noacct+0x4e/0x120
[6635505.002588] ? unlock_page_memcg+0x46/0x80
[6635505.002592] ? __test_set_page_writeback+0x75/0x2d0
[6635505.002595] submit_bio+0x4a/0x130
[6635505.002607] iomap_submit_ioend+0x53/0x90
[6635505.002609] iomap_writepage_map+0x1fa/0x370
[6635505.002611] iomap_do_writepage+0x6e/0x110
[6635505.002613] write_cache_pages+0x1a6/0x460
[6635505.002615] ? iomap_writepage_map+0x370/0x370
[6635505.002618] iomap_writepages+0x21/0x40
[6635505.002619] xfs_vm_writepages+0x84/0xc0 [xfs]
[6635505.002679] do_writepages+0xd7/0x200
[6635505.002682] ? check_preempt_curr+0x61/0x70
[6635505.002685] ? ttwu_do_wakeup+0x1c/0x170
[6635505.002687] __writeback_single_inode+0x44/0x290
[6635505.002690] writeback_sb_inodes+0x22a/0x500
[6635505.002692] __writeback_inodes_wb+0x56/0xf0
[6635505.002695] wb_writeback+0x1cc/0x290
[6635505.002697] wb_do_writeback+0x1a0/0x280
[6635505.002699] wb_workfn+0x77/0x260
[6635505.002701] ? psi_task_switch+0xc6/0x220
[6635505.002703] ? raw_spin_rq_unlock+0x10/0x30
[6635505.002705] ? finish_task_switch.isra.0+0x7e/0x280
[6635505.002708] process_one_work+0x22b/0x3d0
[6635505.002710] worker_thread+0x53/0x420
[6635505.002711] ? process_one_work+0x3d0/0x3d0
[6635505.002712] kthread+0x12a/0x150
[6635505.002714] ? set_kthread_struct+0x50/0x50
[6635505.002717] ret_from_fork+0x22/0x30
[6635505.002720] </TASK>
And dmesg shows disk access issues
[6635676.000802] sd 2:0:0:0: [sda] tag#227 timing out command, waited 180s
[6635676.063181] sd 2:0:0:0: [sda] tag#227 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=399s
[6635676.063185] sd 2:0:0:0: [sda] tag#227 Sense Key : Aborted Command [current]
[6635676.063187] sd 2:0:0:0: [sda] tag#227 Add. Sense: I/O process terminated
[6635676.063196] sd 2:0:0:0: [sda] tag#227 CDB: Write(10) 2a 00 02 21 99 b8 00 00 08 00
[6635676.063197] blk_update_request: I/O error, dev sda, sector 35756472 op 0x1
WRITE) flags 0x100000 phys_seg 1 prio class 0
[6635676.066252] dm-2: writeback error on inode 166, offset 36864, sector 98744
[6635676.066262] sd 2:0:0:0: [sda] tag#233 timing out command, waited 180s
[6635676.069684] sd 2:0:0:0: [sda] tag#233 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=399s
[6635676.069685] sd 2:0:0:0: [sda] tag#233 Sense Key : Aborted Command [current]
[6635676.069687] sd 2:0:0:0: [sda] tag#233 Add. Sense: I/O process terminated
[6635676.069688] sd 2:0:0:0: [sda] tag#233 CDB: Write(10) 2a 00 02 21 a3 f8 00 00 08 00
[6635676.069689] blk_update_request: I/O error, dev sda, sector 35759096 op 0x1
WRITE) flags 0x100000 phys_seg 1 prio class 0
[6635676.072322] dm-2: writeback error on inode 166, offset 1380352, sector 101368
[6635676.072326] sd 2:0:0:0: [sda] tag#234 timing out command, waited 180s
[6635676.077039] sd 2:0:0:0: [sda] tag#234 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=399s
[6635676.077041] sd 2:0:0:0: [sda] tag#234 Sense Key : Aborted Command [current]
[6635676.077042] sd 2:0:0:0: [sda] tag#234 Add. Sense: I/O process terminated
[6635676.077044] sd 2:0:0:0: [sda] tag#234 CDB: Write(10) 2a 00 02 21 b0 88 00 00 08 00
In the proxmox VE back up task window I would have to interrupt as you see below due to the backup stuck on not progressing past 3% for some time :
INFO: starting new backup job: vzdump 163 --notification-mode auto --remove 0 --notes-template '{{guestname}}' --mode snapshot --storage s3-store2 --mailto ops@openanswers.co.uk --node hlvbp011
INFO: Starting Backup of VM 163 (qemu)
INFO: Backup started at 2026-01-20 14:02:43
INFO: status = running
INFO: VM Name: TestVM
INFO: include disk 'scsi0' 'VolGroup01:vm-163-disk-0' 40G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/163/2026-01-20T14:02:43Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'cc5ced35-2950-4d0d-b952-c8513ce512c8'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 0% (172.0 MiB of 40.0 GiB) in 3s, read: 57.3 MiB/s, write: 57.3 MiB/s
INFO: 1% (412.0 MiB of 40.0 GiB) in 7s, read: 60.0 MiB/s, write: 60.0 MiB/s
INFO: 2% (824.0 MiB of 40.0 GiB) in 1m 10s, read: 6.5 MiB/s, write: 4.0 MiB/s
INFO: 3% (1.3 GiB of 40.0 GiB) in 1m 48s, read: 12.1 MiB/s, write: 5.9 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 163 failed - interrupted by signal
INFO: Failed at 2026-01-20 14:09:04
ERROR: Backup job failed - interrupted by signal
The PBS task would be stuck at a 'caching chunk' message
My questions are;
when we see the INFO:resuming VM again message ; does that mean that a lock on a disk resource was held or is this relating to the VM lock file in /var/lock/qemu-server ?
Are there known issues with ProxMox VE backup processes holding access /lock on disks resource when backups stall? Even the ProxMox VE Server will CPU hung and disk IO messages too (relating to VM disk image).
I am used to seeing the INFO:resuming VM again message before a backup is started but is the VM hang related to seeing this message again on failed or interrupted backups .
If a backup is successful we never see the resuming VM again message at the end in the backup task log such as ie:
INFO: 99% (40.0 GiB of 40.0 GiB) in 2m 51s, read: 230.7 MiB/s, write: 0 B/s
INFO: 100% (40.0 GiB of 40.0 GiB) in 2m 52s, read: 33.4 MiB/s, write: 8.0 KiB/s
INFO: backup is sparse: 22.28 GiB (55%) total zero data
INFO: transferred 40.00 GiB in 172 seconds (238.1 MiB/s)
INFO: archive file size: 8.66GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=1
INFO: removing backup 'backup:backup/vzdump-qemu-166-2026_01_24-01_00_02.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 166 (00:02:54)
INFO: Backup finished at 2026-01-31 01:03:02
INFO: Backup job finished successfully
INFO: notified via target `<ops@openanswers.co.uk>`
TASK OK
Is the s3 datastore still under development (Technical Preview?) Would this have the knock on effect to stall a ProxMox VE VM ?
Any info help or hints would be greatly appreciated?
Regards Dek
I have noticed occasional issues with ubuntu ProxMox VE VMs running on 8.4.14 VE servers. VMs and lagged and report operational and I/O issues with disk / filesystem access resulting in CPU hung tasks during backups on a 4.1.1 ProxMox Backup Server using S3 datastores
On the VM I see the following errors:
[6635504.997115] INFO: task kworker/u4:0:2245157 blocked for more than 120 seconds.
[6635504.998911] Tainted: G W 5.15.0-161-generic #171-Ubuntu
[6635505.000654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[6635505.002548] task:kworker/u4:0 state
[6635505.002552] Workqueue: writeback wb_workfn (flush-253:2)
[6635505.002556] Call Trace:
[6635505.002557] <TASK>
[6635505.002558] __schedule+0x24e/0x590
[6635505.002561] schedule+0x69/0x110
[6635505.002562] io_schedule+0x46/0x80
[6635505.002564] ? wbt_cleanup_cb+0x20/0x20
[6635505.002566] rq_qos_wait+0xd0/0x170
[6635505.002568] ? wbt_rqw_done+0x110/0x110
[6635505.002570] ? sysv68_partition+0x280/0x280
[6635505.002572] ? wbt_cleanup_cb+0x20/0x20
[6635505.002574] wbt_wait+0x9f/0xf0
[6635505.002576] __rq_qos_throttle+0x28/0x40
[6635505.002578] blk_mq_submit_bio+0x127/0x610
[6635505.002581] __submit_bio+0x1ee/0x220
[6635505.002584] __submit_bio_noacct+0x85/0x200
[6635505.002586] submit_bio_noacct+0x4e/0x120
[6635505.002588] ? unlock_page_memcg+0x46/0x80
[6635505.002592] ? __test_set_page_writeback+0x75/0x2d0
[6635505.002595] submit_bio+0x4a/0x130
[6635505.002607] iomap_submit_ioend+0x53/0x90
[6635505.002609] iomap_writepage_map+0x1fa/0x370
[6635505.002611] iomap_do_writepage+0x6e/0x110
[6635505.002613] write_cache_pages+0x1a6/0x460
[6635505.002615] ? iomap_writepage_map+0x370/0x370
[6635505.002618] iomap_writepages+0x21/0x40
[6635505.002619] xfs_vm_writepages+0x84/0xc0 [xfs]
[6635505.002679] do_writepages+0xd7/0x200
[6635505.002682] ? check_preempt_curr+0x61/0x70
[6635505.002685] ? ttwu_do_wakeup+0x1c/0x170
[6635505.002687] __writeback_single_inode+0x44/0x290
[6635505.002690] writeback_sb_inodes+0x22a/0x500
[6635505.002692] __writeback_inodes_wb+0x56/0xf0
[6635505.002695] wb_writeback+0x1cc/0x290
[6635505.002697] wb_do_writeback+0x1a0/0x280
[6635505.002699] wb_workfn+0x77/0x260
[6635505.002701] ? psi_task_switch+0xc6/0x220
[6635505.002703] ? raw_spin_rq_unlock+0x10/0x30
[6635505.002705] ? finish_task_switch.isra.0+0x7e/0x280
[6635505.002708] process_one_work+0x22b/0x3d0
[6635505.002710] worker_thread+0x53/0x420
[6635505.002711] ? process_one_work+0x3d0/0x3d0
[6635505.002712] kthread+0x12a/0x150
[6635505.002714] ? set_kthread_struct+0x50/0x50
[6635505.002717] ret_from_fork+0x22/0x30
[6635505.002720] </TASK>
And dmesg shows disk access issues
[6635676.000802] sd 2:0:0:0: [sda] tag#227 timing out command, waited 180s
[6635676.063181] sd 2:0:0:0: [sda] tag#227 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=399s
[6635676.063185] sd 2:0:0:0: [sda] tag#227 Sense Key : Aborted Command [current]
[6635676.063187] sd 2:0:0:0: [sda] tag#227 Add. Sense: I/O process terminated
[6635676.063196] sd 2:0:0:0: [sda] tag#227 CDB: Write(10) 2a 00 02 21 99 b8 00 00 08 00
[6635676.063197] blk_update_request: I/O error, dev sda, sector 35756472 op 0x1
[6635676.066252] dm-2: writeback error on inode 166, offset 36864, sector 98744
[6635676.066262] sd 2:0:0:0: [sda] tag#233 timing out command, waited 180s
[6635676.069684] sd 2:0:0:0: [sda] tag#233 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=399s
[6635676.069685] sd 2:0:0:0: [sda] tag#233 Sense Key : Aborted Command [current]
[6635676.069687] sd 2:0:0:0: [sda] tag#233 Add. Sense: I/O process terminated
[6635676.069688] sd 2:0:0:0: [sda] tag#233 CDB: Write(10) 2a 00 02 21 a3 f8 00 00 08 00
[6635676.069689] blk_update_request: I/O error, dev sda, sector 35759096 op 0x1
[6635676.072322] dm-2: writeback error on inode 166, offset 1380352, sector 101368
[6635676.072326] sd 2:0:0:0: [sda] tag#234 timing out command, waited 180s
[6635676.077039] sd 2:0:0:0: [sda] tag#234 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=399s
[6635676.077041] sd 2:0:0:0: [sda] tag#234 Sense Key : Aborted Command [current]
[6635676.077042] sd 2:0:0:0: [sda] tag#234 Add. Sense: I/O process terminated
[6635676.077044] sd 2:0:0:0: [sda] tag#234 CDB: Write(10) 2a 00 02 21 b0 88 00 00 08 00
In the proxmox VE back up task window I would have to interrupt as you see below due to the backup stuck on not progressing past 3% for some time :
INFO: starting new backup job: vzdump 163 --notification-mode auto --remove 0 --notes-template '{{guestname}}' --mode snapshot --storage s3-store2 --mailto ops@openanswers.co.uk --node hlvbp011
INFO: Starting Backup of VM 163 (qemu)
INFO: Backup started at 2026-01-20 14:02:43
INFO: status = running
INFO: VM Name: TestVM
INFO: include disk 'scsi0' 'VolGroup01:vm-163-disk-0' 40G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating Proxmox Backup Server archive 'vm/163/2026-01-20T14:02:43Z'
INFO: issuing guest-agent 'fs-freeze' command
INFO: issuing guest-agent 'fs-thaw' command
INFO: started backup task 'cc5ced35-2950-4d0d-b952-c8513ce512c8'
INFO: resuming VM again
INFO: scsi0: dirty-bitmap status: existing bitmap was invalid and has been cleared
INFO: 0% (172.0 MiB of 40.0 GiB) in 3s, read: 57.3 MiB/s, write: 57.3 MiB/s
INFO: 1% (412.0 MiB of 40.0 GiB) in 7s, read: 60.0 MiB/s, write: 60.0 MiB/s
INFO: 2% (824.0 MiB of 40.0 GiB) in 1m 10s, read: 6.5 MiB/s, write: 4.0 MiB/s
INFO: 3% (1.3 GiB of 40.0 GiB) in 1m 48s, read: 12.1 MiB/s, write: 5.9 MiB/s
ERROR: interrupted by signal
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 163 failed - interrupted by signal
INFO: Failed at 2026-01-20 14:09:04
ERROR: Backup job failed - interrupted by signal
The PBS task would be stuck at a 'caching chunk' message
My questions are;
when we see the INFO:resuming VM again message ; does that mean that a lock on a disk resource was held or is this relating to the VM lock file in /var/lock/qemu-server ?
Are there known issues with ProxMox VE backup processes holding access /lock on disks resource when backups stall? Even the ProxMox VE Server will CPU hung and disk IO messages too (relating to VM disk image).
I am used to seeing the INFO:resuming VM again message before a backup is started but is the VM hang related to seeing this message again on failed or interrupted backups .
If a backup is successful we never see the resuming VM again message at the end in the backup task log such as ie:
INFO: 99% (40.0 GiB of 40.0 GiB) in 2m 51s, read: 230.7 MiB/s, write: 0 B/s
INFO: 100% (40.0 GiB of 40.0 GiB) in 2m 52s, read: 33.4 MiB/s, write: 8.0 KiB/s
INFO: backup is sparse: 22.28 GiB (55%) total zero data
INFO: transferred 40.00 GiB in 172 seconds (238.1 MiB/s)
INFO: archive file size: 8.66GB
INFO: adding notes to backup
INFO: prune older backups with retention: keep-last=1
INFO: removing backup 'backup:backup/vzdump-qemu-166-2026_01_24-01_00_02.vma.zst'
INFO: pruned 1 backup(s) not covered by keep-retention policy
INFO: Finished Backup of VM 166 (00:02:54)
INFO: Backup finished at 2026-01-31 01:03:02
INFO: Backup job finished successfully
INFO: notified via target `<ops@openanswers.co.uk>`
TASK OK
Is the s3 datastore still under development (Technical Preview?) Would this have the knock on effect to stall a ProxMox VE VM ?
Any info help or hints would be greatly appreciated?
Regards Dek
Last edited: