Hello,
I've seen a lot of threads about the error "zfs error: cannot XYZ ... : dataset is busy" and also on our site this happens quite often (~ every 3-4 weeks on a cluster with 6 nodes). As in the other threads suggested, a reboot solves this problem but is not applicable for a production server.
The problem with this is, that one cannot interact with the containers anymore. They are still running and working, but
For example:
This is the output of
Sadly the tasks seem to be stuck and I cannot kill them.
Do you have any suggestion on how I can solve this problem without having to reboot?
Does anybody know where this problem comes from? I read that it may be due to a heavy IO load on the ZFS pools, can anyone confirm or explain what I can do against it?
Thanks for your help!
Additional information:
I've seen a lot of threads about the error "zfs error: cannot XYZ ... : dataset is busy" and also on our site this happens quite often (~ every 3-4 weeks on a cluster with 6 nodes). As in the other threads suggested, a reboot solves this problem but is not applicable for a production server.
The problem with this is, that one cannot interact with the containers anymore. They are still running and working, but
rollback, destroy and starting
on the containers is not working. For example:
Code:
()
TASK ERROR: zfs error: cannot destroy 'rpool/data/subvol-65002-disk-0': dataset is busy
This is the output of
ps auxf | grep 65002
(65002 being the container that I tried to rollback:
Code:
ps auxf | grep 65002
root 2628155 0.0 0.0 6240 716 pts/0 S+ 13:27 0:00 \_ grep 65002
root 2225583 0.0 0.0 8832 4024 ? D Sep26 0:00 zfs unmount rpool/data/subvol-65002-disk-0
root 2799512 0.0 0.0 8832 4116 ? D Sep26 0:00 zfs mount rpool/data/subvol-65002-disk-0
root 2833006 0.0 0.0 8832 4088 ? D Sep26 0:00 zfs mount rpool/data/subvol-65002-disk-0
root 2838196 0.0 0.0 8832 4144 ? D Sep26 0:00 zfs mount rpool/data/subvol-65002-disk-0
root 2852238 0.0 0.0 8832 4116 ? D Sep26 0:00 zfs mount rpool/data/subvol-65002-disk-0
root 2647640 0.0 0.0 8832 4132 ? D Sep28 0:00 zfs mount rpool/data/subvol-65002-disk-0
root 524203 0.0 0.0 8832 4180 ? D 09:16 0:00 zfs mount rpool/data/subvol-65002-disk-0
Sadly the tasks seem to be stuck and I cannot kill them.
Do you have any suggestion on how I can solve this problem without having to reboot?
Does anybody know where this problem comes from? I read that it may be due to a heavy IO load on the ZFS pools, can anyone confirm or explain what I can do against it?
Thanks for your help!
Additional information:
Code:
~# pveversion -v
proxmox-ve: 7.4-1 (running kernel: 5.15.107-2-pve)
pve-manager: 7.4-3 (running version: 7.4-3/9002ab8a)
pve-kernel-5.15: 7.4-3
pve-kernel-5.11: 7.0-10
pve-kernel-5.4: 6.4-6
pve-kernel-5.15.107-2-pve: 5.15.107-2
pve-kernel-5.15.39-1-pve: 5.15.39-1
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.11.22-7-pve: 5.11.22-12
pve-kernel-5.11.22-5-pve: 5.11.22-10
pve-kernel-5.4.140-1-pve: 5.4.140-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4-2
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-1
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.6
libpve-storage-perl: 7.4-2
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.1-1
proxmox-backup-file-restore: 2.4.1-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.1-1
proxmox-widget-toolkit: 3.6.5
pve-cluster: 7.3-3
pve-container: 4.4-3
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-2
pve-firewall: 4.3-2
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-1
qemu-server: 7.4-3
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1