Thank you Fiona. I edited my post.CFQ was removed/replaced by BFQ in kernel 5.0. The wiki article wasn't update since 2019. But I'd try using a bandwidth limit first.
Unfortunately, this also is not possible anymore. TheSo Unspec, maybe you should try to change the scheduler algorithm to CFQ ?
nano /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="... elevator=bfq"
update-grub
elevator
commandline option has no effect anymore nowadays. Can be done via udev nowadays. See the (now updated) wiki article: https://pve.proxmox.com/wiki/IO_SchedulerNice to read this ️So far, limiting bandwidth to 50MB has prevented any failures. Will continue to monitor. Note that my failures only ever affected CT's - the VM snapshot backups never failed.
Hmmm... And what activity makes those IOs ?
What is the output ofIt seems like it's entirely centered around my uptime-kuma container. If I shut it down, IO delay stays below 2% consistently. It's doing something like 60M/s of reads. After multiple restarts of that container, it's down to 20 now - no idea why that container is being such a disk hog.
I've limited backup bandwidth to 25MB and will continue monitoring
zpool status -v
? How full is your pool? Is there anything in the system logs/journal around the time of the issue? Could you share the configuration of the problematic container pct config <ID>
?What is the output ofzpool status -v
? How full is your pool? Is there anything in the system logs/journal around the time of the issue? Could you share the configuration of the problematic containerpct config <ID>
?
Are the disk speeds very different between these two? Of course, it shouldn't lead to issues, but I'd also not recommend having them in a mirror then.pool: rpool
state: ONLINE
scan: scrub repaired 0B in 00:11:19 with 0 errors on Sun Feb 9 00:35:20 2025
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sda3 ONLINE 0 0 0
nvme0n1p3 ONLINE 0 0 0
pbs_local
? Is PBS running as a stand-alone node, a VM or a container?Are the disk speeds very different between these two? Of course, it shouldn't lead to issues, but I'd also not recommend having them in a mirror then.
I also noticed your storage is calledpbs_local
? Is PBS running as a stand-alone node, a VM or a container?
Maybe you can try the elevator and the IO scheduling ?Kernel 6.11 did not fix. I am stumped at this point. Looks like there's potentially an underlying ZFS bug.
fgrep -e vzsnap -e vzdump /proc/*/mounts
on the host? This is for checking if there is still a process using the dataset (of course if there is still a backup running, it will show up there too), so you should check who the PID belongs to afterwards.The next time the issue pops up, could you runfgrep -e vzsnap -e vzdump /proc/*/mounts
on the host? This is for checking if there is still a process using the dataset (of course if there is still a backup running, it will show up there too), so you should check who the PID belongs to afterwards.
Is there maybe a hung umount task? See: https://forum.proxmox.com/threads/b...shot-dataset-already-exists.52783/post-748864
We use essential cookies to make this site work, and optional cookies to enhance your experience.