massive load spikes since upgrade to 7.2

bytemine

Member
Apr 16, 2020
9
1
23
Hi,
since we upgraded some hosts to PVE from 6.4 to 7.2 recently, some VMs get unresponsive when other VMs are backed up via PBS. E.g. one VM running Debian 9 had a maximum load of 6 on PVE 6 and on PVE 7.2 the load goes up to 180. We changed the IO of the VM from virtio to single SCSI / iothread=1, but no luck.

As storage local zpools consisting of RAID1 NVMe and RAID1 SATA disks (with slog on RAID1 SATA SSDs) are used.

The issue seems to be related to some PBS jobs, as the we move the schedule of the PBS jobs the issues also move their occurrence.

The PVE host itself throw some errors into dmesg:

Code:
[351625.025281] INFO: task kvm:1607299 blocked for more than 121 seconds.
[351625.033610]       Tainted: P           O      5.15.39-1-pve #1
[351625.041888] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[351625.050197] task:kvm             state:D stack:    0 pid:1607299 ppid:     1 flags:0x00000000
[351625.058515] Call Trace:
[351625.066812]  <TASK>
[351625.075131]  __schedule+0x33d/0x1750
[351625.083428]  ? default_send_IPI_single_phys+0x4e/0x90
[351625.091737]  ? native_send_call_func_single_ipi+0x1e/0x20
[351625.100054]  ? send_call_function_single_ipi+0x70/0xd0
[351625.108352]  ? __smp_call_single_queue+0x55/0x80
[351625.116671]  schedule+0x4e/0xb0
[351625.124947]  rwsem_down_write_slowpath+0x217/0x4d0
[351625.133245]  down_write+0x43/0x50
[351625.141541]  blkdev_common_ioctl+0x60b/0x8b0
[351625.149890]  blkdev_ioctl+0xf6/0x270
[351625.158110]  ? __fget_files+0x86/0xc0
[351625.166489]  block_ioctl+0x46/0x50
[351625.174767]  __x64_sys_ioctl+0x91/0xc0
[351625.183112]  do_syscall_64+0x5c/0xc0
[351625.191409]  ? syscall_exit_to_user_mode+0x27/0x50
[351625.199751]  ? __x64_sys_write+0x1a/0x20
[351625.208017]  ? do_syscall_64+0x69/0xc0
[351625.216314]  ? do_syscall_64+0x69/0xc0
[351625.224612]  ? exit_to_user_mode_prepare+0x37/0x1b0
[351625.232935]  ? syscall_exit_to_user_mode+0x27/0x50
[351625.241288]  ? __x64_sys_clone+0x25/0x30
[351625.249576]  ? do_syscall_64+0x69/0xc0
[351625.257874]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[351625.266112] RIP: 0033:0x7f1025360cc7
[351625.274177] RSP: 002b:00007f0eeac7c518 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[351625.282795] RAX: ffffffffffffffda RBX: 00007f0ef18d8900 RCX: 00007f1025360cc7
[351625.291107] RDX: 00007f0eeac7c520 RSI: 000000000000127f RDI: 000000000000001c
[351625.299423] RBP: 000055e886be0650 R08: 0000000000000000 R09: 00007f0eea480700
[351625.307719] R10: 00007f0eeac7c520 R11: 0000000000000246 R12: 00007f0eeac7c520
[351625.316017] R13: 00007f100c016858 R14: 00007f100c05d8b0 R15: 0000000000802000
[351625.324316]  </TASK>
[351625.332679] INFO: task kvm:1607300 blocked for more than 121 seconds.
[351625.333725]       Tainted: P           O      5.15.39-1-pve #1
[351625.334427] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[351625.335062] task:kvm             state:D stack:    0 pid:1607300 ppid:     1 flags:0x00000000
[351625.335661] Call Trace:
[351625.336237]  <TASK>
[351625.336789]  __schedule+0x33d/0x1750
[351625.337339]  ? default_wake_function+0x1a/0x30
[351625.337893]  ? pollwake+0x72/0x90
[351625.338468]  ? wake_up_q+0x90/0x90
[351625.339026]  ? __wake_up_common+0x7e/0x140
[351625.339601]  schedule+0x4e/0xb0
[351625.340128]  rwsem_down_write_slowpath+0x217/0x4d0
[351625.340658]  down_write+0x43/0x50
[351625.341171]  blkdev_common_ioctl+0x60b/0x8b0
[351625.341660]  blkdev_ioctl+0xf6/0x270
[351625.342183]  ? __fget_files+0x86/0xc0
[351625.342693]  block_ioctl+0x46/0x50
[351625.343189]  __x64_sys_ioctl+0x91/0xc0
[351625.343682]  do_syscall_64+0x5c/0xc0

The issue in https://forum.proxmox.com/threads/p...2-vm-freeze-if-backing-up-large-disks.109272/ seemed to be somehow similar, but pve-qemu-kvm is at `6.2.0-11`.
 
Oh, the good old hang checker ... it reports blocking processes, that hang for more than 2 minutes and this implies a storage problem. The state D also is consistent with this. The process is in an uninteruptible sleep state and waits for the IO subsystem. What about your ZFS pools? Are they too full?
 
Hi,
the zfs pools are fine:

Code:
zpool list
NAME       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
datapool  14.5T  2.49T  12.1T        -         -    16%    17%  1.00x    ONLINE  -
nvmepool  6.97T  1.30T  5.67T        -         -    31%    18%  1.00x    ONLINE  -
rpool      236G  18.7G   217G        -         -    20%     7%  1.00x    ONLINE  -

Maybe the arcstat is also relevant:
Code:
time  read  miss  miss%  dmis  dm%  pmis  pm%  mmis  mm%  size     c  avail
10:44:08     0     0      0     0    0     0    0     0    0   63G   64G    52G

We changed some PBS job schedules hoping it will reduce the load, but it did not. The above mentioned VM got a load of 250 while a PBS job snapshotting ~1TB of datapool (SATA disks) of another VM while itself it resides on nvmepool.

Our monitoring shows that the disk IO pending queue length got much more spikes since the upgrade to PVE 7 on 19.07.:


diskio pending queue.png


We have changed aio to threads of the affected VM hoping it will help.

Best regards
Bjoern from bytemine GmbH
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!