Thank you Fiona. I edited my post.CFQ was removed/replaced by BFQ in kernel 5.0. The wiki article wasn't update since 2019. But I'd try using a bandwidth limit first.
Unfortunately, this also is not possible anymore. TheSo Unspec, maybe you should try to change the scheduler algorithm to CFQ ?
nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="... elevator=bfq"
update-grub
elevator
commandline option has no effect anymore nowadays. Can be done via udev nowadays. See the (now updated) wiki article: https://pve.proxmox.com/wiki/IO_SchedulerNice to read this ️So far, limiting bandwidth to 50MB has prevented any failures. Will continue to monitor. Note that my failures only ever affected CT's - the VM snapshot backups never failed.
Hmmm... And what activity makes those IOs ?
What is the output ofIt seems like it's entirely centered around my uptime-kuma container. If I shut it down, IO delay stays below 2% consistently. It's doing something like 60M/s of reads. After multiple restarts of that container, it's down to 20 now - no idea why that container is being such a disk hog.
I've limited backup bandwidth to 25MB and will continue monitoring
zpool status -v
? How full is your pool? Is there anything in the system logs/journal around the time of the issue? Could you share the configuration of the problematic container pct config <ID>
?What is the output ofzpool status -v
? How full is your pool? Is there anything in the system logs/journal around the time of the issue? Could you share the configuration of the problematic containerpct config <ID>
?
Are the disk speeds very different between these two? Of course, it shouldn't lead to issues, but I'd also not recommend having them in a mirror then.pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:11:19 with 0 errors on Sun Feb 9 00:35:20 2025
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda3 ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0
pbs_local
? Is PBS running as a stand-alone node, a VM or a container?Are the disk speeds very different between these two? Of course, it shouldn't lead to issues, but I'd also not recommend having them in a mirror then.
I also noticed your storage is calledpbs_local
? Is PBS running as a stand-alone node, a VM or a container?
Maybe you can try the elevator and the IO scheduling ?Kernel 6.11 did not fix. I am stumped at this point. Looks like there's potentially an underlying ZFS bug.
fgrep -e vzsnap -e vzdump /proc/*/mounts
on the host? This is for checking if there is still a process using the dataset (of course if there is still a backup running, it will show up there too), so you should check who the PID belongs to afterwards.We use essential cookies to make this site work, and optional cookies to enhance your experience.