Hi,
since we upgraded some hosts to PVE from 6.4 to 7.2 recently, some VMs get unresponsive when other VMs are backed up via PBS. E.g. one VM running Debian 9 had a maximum load of 6 on PVE 6 and on PVE 7.2 the load goes up to 180. We changed the IO of the VM from virtio to single SCSI / iothread=1, but no luck.
As storage local zpools consisting of RAID1 NVMe and RAID1 SATA disks (with slog on RAID1 SATA SSDs) are used.
The issue seems to be related to some PBS jobs, as the we move the schedule of the PBS jobs the issues also move their occurrence.
The PVE host itself throw some errors into dmesg:
The issue in https://forum.proxmox.com/threads/p...2-vm-freeze-if-backing-up-large-disks.109272/ seemed to be somehow similar, but pve-qemu-kvm is at `6.2.0-11`.
since we upgraded some hosts to PVE from 6.4 to 7.2 recently, some VMs get unresponsive when other VMs are backed up via PBS. E.g. one VM running Debian 9 had a maximum load of 6 on PVE 6 and on PVE 7.2 the load goes up to 180. We changed the IO of the VM from virtio to single SCSI / iothread=1, but no luck.
As storage local zpools consisting of RAID1 NVMe and RAID1 SATA disks (with slog on RAID1 SATA SSDs) are used.
The issue seems to be related to some PBS jobs, as the we move the schedule of the PBS jobs the issues also move their occurrence.
The PVE host itself throw some errors into dmesg:
Code:
[351625.025281] INFO: task kvm:1607299 blocked for more than 121 seconds.
[351625.033610] Tainted: P O 5.15.39-1-pve #1
[351625.041888] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[351625.050197] task:kvm state:D stack: 0 pid:1607299 ppid: 1 flags:0x00000000
[351625.058515] Call Trace:
[351625.066812] <TASK>
[351625.075131] __schedule+0x33d/0x1750
[351625.083428] ? default_send_IPI_single_phys+0x4e/0x90
[351625.091737] ? native_send_call_func_single_ipi+0x1e/0x20
[351625.100054] ? send_call_function_single_ipi+0x70/0xd0
[351625.108352] ? __smp_call_single_queue+0x55/0x80
[351625.116671] schedule+0x4e/0xb0
[351625.124947] rwsem_down_write_slowpath+0x217/0x4d0
[351625.133245] down_write+0x43/0x50
[351625.141541] blkdev_common_ioctl+0x60b/0x8b0
[351625.149890] blkdev_ioctl+0xf6/0x270
[351625.158110] ? __fget_files+0x86/0xc0
[351625.166489] block_ioctl+0x46/0x50
[351625.174767] __x64_sys_ioctl+0x91/0xc0
[351625.183112] do_syscall_64+0x5c/0xc0
[351625.191409] ? syscall_exit_to_user_mode+0x27/0x50
[351625.199751] ? __x64_sys_write+0x1a/0x20
[351625.208017] ? do_syscall_64+0x69/0xc0
[351625.216314] ? do_syscall_64+0x69/0xc0
[351625.224612] ? exit_to_user_mode_prepare+0x37/0x1b0
[351625.232935] ? syscall_exit_to_user_mode+0x27/0x50
[351625.241288] ? __x64_sys_clone+0x25/0x30
[351625.249576] ? do_syscall_64+0x69/0xc0
[351625.257874] entry_SYSCALL_64_after_hwframe+0x44/0xae
[351625.266112] RIP: 0033:0x7f1025360cc7
[351625.274177] RSP: 002b:00007f0eeac7c518 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[351625.282795] RAX: ffffffffffffffda RBX: 00007f0ef18d8900 RCX: 00007f1025360cc7
[351625.291107] RDX: 00007f0eeac7c520 RSI: 000000000000127f RDI: 000000000000001c
[351625.299423] RBP: 000055e886be0650 R08: 0000000000000000 R09: 00007f0eea480700
[351625.307719] R10: 00007f0eeac7c520 R11: 0000000000000246 R12: 00007f0eeac7c520
[351625.316017] R13: 00007f100c016858 R14: 00007f100c05d8b0 R15: 0000000000802000
[351625.324316] </TASK>
[351625.332679] INFO: task kvm:1607300 blocked for more than 121 seconds.
[351625.333725] Tainted: P O 5.15.39-1-pve #1
[351625.334427] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[351625.335062] task:kvm state:D stack: 0 pid:1607300 ppid: 1 flags:0x00000000
[351625.335661] Call Trace:
[351625.336237] <TASK>
[351625.336789] __schedule+0x33d/0x1750
[351625.337339] ? default_wake_function+0x1a/0x30
[351625.337893] ? pollwake+0x72/0x90
[351625.338468] ? wake_up_q+0x90/0x90
[351625.339026] ? __wake_up_common+0x7e/0x140
[351625.339601] schedule+0x4e/0xb0
[351625.340128] rwsem_down_write_slowpath+0x217/0x4d0
[351625.340658] down_write+0x43/0x50
[351625.341171] blkdev_common_ioctl+0x60b/0x8b0
[351625.341660] blkdev_ioctl+0xf6/0x270
[351625.342183] ? __fget_files+0x86/0xc0
[351625.342693] block_ioctl+0x46/0x50
[351625.343189] __x64_sys_ioctl+0x91/0xc0
[351625.343682] do_syscall_64+0x5c/0xc0
The issue in https://forum.proxmox.com/threads/p...2-vm-freeze-if-backing-up-large-disks.109272/ seemed to be somehow similar, but pve-qemu-kvm is at `6.2.0-11`.