KVM guests freeze (hung tasks) during backup/restore/migrate

rakurtz · Feb 9, 2021

velocity08 said:
Thanks for confirming.

I did a check with our new cluster and all the drive caches are on by default.

these proprietary cards from Dell and HP seem to be a common thread for these types of issues Is what I’ve discovered so far.

””Cheers
g

You're welcome. It seams counterintuitive to disable any cache of the controller card to speed up the whole zpool. I guess this has to do something with ZFS needs to have access to the individual blocks. What i don't understand then is: Why doesn't the same explanation apply to the disk's cache? These caches have to be turned on - at least if you want to speed up your zpool.

Anyways, i am glad i could solve the initial issues with the Perc controller. I just wonder, why this information is nowhere to be found in the various threads in different forums....

velocity08 · Feb 9, 2021

rakurtz said:
You're welcome. It seams counterintuitive to disable any cache of the controller card to speed up the whole zpool. I guess this has to do something with ZFS needs to have access to the individual blocks. What i don't understand then is: Why doesn't the same explanation apply to the disk's cache? These caches have to be turned on - at least if you want to speed up your zpool.

Anyways, i am glad i could solve the initial issues with the Perc controller. I just wonder, why this information is nowhere to be found in the various threads in different forums....

Hi @rakurtz

I would say that the Controller cache is another level of cache that cant be controlled by ZFS while drive cache may act in a different way.

Raid controller cache is specifically designed to be a middle man cache while drive cache is direct on drive.

ZFS uses Ram for cache and with direct access to the drives it can manage what it needs to, with no direct access to the controller it has no way to manage the controller cache.

this is my understanding happy to be corrected by someone who knows the more granular details.

Reading through these forums and others on the TrueNas website i'm seeing the same issues pop up over and over with these same controllers from HP and Dell that can be switched between RAID and HBA mode, they always run into some sort of issues.

The best controller cards are dedicated HBA like an LSI or Broadcom 9300 SAS 8-port

""Cheers
G

RolandK · Aug 25, 2021

new findings on this at

https://bugzilla.proxmox.com/show_bug.cgi?id=1453

and

https://bugzilla.kernel.org/show_bug.cgi?id=199727

RolandK · Aug 29, 2021

apparently, things are different with pve7.

all vm's now seem to use io_uring async io per default:

"QEMU 6.0: The latest QEMU version with new functionalities is included in Proxmox VE 7. This
includes support for the Linux IO interface ‘io_uring’. The asynchronous I/O engine for virtual drives
will be applied to all newly launched or migrated guest systems by default"

i need to manually set aio=threads in vm conf to make the workaround described above work again.

it seems there is something weird with aio.

why do VMs heavily stutter/lag when disk subsystem is put under load AND when VMs do io with aio syscalls io_submit() or io_uring() ?

what can explain this behaviour ?

is it a bug or "by design" ?

is there a way to switch back to the previous default behaviour, i.e. use aio=threads instead aio=io_uring ?

Search

Search

KVM guests freeze (hung tasks) during backup/restore/migrate

rakurtz

Active Member

velocity08

Well-Known Member

RolandK

Famous Member

RolandK

Famous Member

We value your privacy