zfs error: cannot destroy : dataset is busy

svennd

Renowned Member
Aug 4, 2014
45
1
73
I got a fresh install with ZFS raidz-3 where I have 2 containers, (lxc) one was for testing and I wanted to remove it. However I got this error :
Code:
zfs error: cannot destroy 'rpool/ROOT/subvol-101-disk-1': dataset is busy

How can I force remove ? I did not reboot the host, since this is not an acceptable solution ... It is part of a cluster, but the lxcontainer is shutdown obv.

cat /etc/pve/storage.cfg
Code:
dir: local
        path /var/lib/vz
        maxfiles 0
        content rootdir,images,iso,vztmpl

zfspool: lxc_storage
        pool rpool/ROOT
        content images,rootdir

Code:
root@noname:~# pveversion
pve-manager/4.1-5/f910ef5c (running kernel: 4.2.6-1-pve)
 
Last edited:
I have found that yes, however on the latest update this was already included.
Code:
    # Do not scan ZFS zvols (to avoid problems on ZFS zvols snapshots)
    filter = [ "r|/dev/zd*|" ]
 
did you try to reboot the machine?
 
When both containers are down and rebooted I could remove it, then again, this is my test machine, I don't want to do that on a machine with multiple containers ... is this a bug ? How can I avoid this state ? Thanks for the advice.

// edit
I took a snapshot of the other container (the one I did not want to remove) might that be the problem ?
 
Can you give me a step by step description to reproduce.
Include config of the container
 
- fresh install
- created container (see first topic /etc/pve/storage.cfg, for zfs addition)
- created a snapshot
- added a server to the cluster
- on server 2, I added a container backup (from v3.x, openvz)
- on server 1, I created a new container to test nfs
- after shutting down the new container I tried to remove the container ...

Rebooting on this setup is no problem, but its not something I can't do when I move over the v3 cluster to v4.
 
Same here. I powered server off for maintenance at 19 march, and after that i started it. Today one of the stupid support guys killed mysql in one of the lxc container. I tryed to recover it from backup (nfs backup), but got same error:
TASK ERROR: zfs error: cannot destroy 'pool1/subvol-161-disk-1': dataset is busy
At the same time i have tried to remove one of unused lxc container, and got error one more time.

I have tried:
#zfs list -t snapshot
no snapshots
#zfs umount pool1/subvol-161-disk-1
currently not mounted
#zfs destroy pool1/subvol-161-disk-1
same error
and nothing gave me nothing.

I have tried than:
#grep pool1/subvol-161-disk-1 /proc/*/mounts
and got
#/proc/31344/mounts: pool1/subvol-161-disk-1 /pool1/subvol-161-disk-1 zfs rw,realtime,xattr,posixacl 0 0

Now i can't reboot server to check, because it's production server. Will try to reboot at midnight.
Any suggestions?

Thx.
 
Last edited:
The reboot fixed it for me, but it held me back from going with our main production servers over to the latest version of proxmox ... So wait for the next reboot ...
 
Yep, reboot fixed for me too - after reboot i have deleted all manualy using zfs destroy pool1/subvol_bla_bla_bla. It's definetly not good for production server such bugs.
 
hi,

same for me:

zfs destroy -r rpool/pve-container/vm-122-disk-1
cannot destroy 'rpool/pve-container/vm-122-disk-1': dataset is busy

ii proxmox-ve 4.2-60 all The Proxmox Virtual Environment

Linux ina-pmox-04 4.4.13-1-pve #1 SMP Tue Jun 28 10:16:33 CEST 2016 x86_64 GNU/Linux
 
recently I tried to reimport some contailers unprivileged on our new proxmox 4.3 but removing the containers gave me this error:

Task viewer: CT 1 - Destroy
Output
Status
Stop
TASK ERROR: zfs error: cannot destroy 'rpool/data/subvol-1-disk-1': dataset is busy​

as the machine was stopped no furhter mounts should have existed but:

~# grep rpool/data/subvol-1-disk-1 /proc/*/mounts
/proc/16775/mounts:rpool/data/subvol-1-disk-1 /rpool/data/subvol-1-disk-1 zfs rw,noatime,xattr,posixacl 0 0

~# ps auxf | grep 16775
root 9718 0.0 0.0 14456 1696 pts/9 S+ 08:00 0:00 \_ grep 16775
root 16775 0.0 0.0 27876 2472 ? Ss 03:56 0:01 [lxc monitor] /var/lib/lxc 2​

...so I restarted CT2 and the removal of CT1 was no problem anymore.
 
We're having the same issue here.

As Plexus has pointed out, the "lxc monitor" of every container seems to hold a handle on all mountpoints which are available when starting the container.

The issue only occurs after running a backup for another container.

Let's say we have two running containers: 1+2.
We'll try to remove 1 later. First we'll run a backup for 2.
During VZDUMP of 2, the root mountpoint is recursively made private ("mount --make-rprivate /"). 2 is started again after the backup.
Now the lxc-monitor of 2 holds a handle for the mountpoint of 1.

When you try to remove 1 the "umount" for the mountpoint isn't propagated to 2.
It won't be possible to delete the dataset, since 2 still holds the handle.
If you restart 2 now, then the dataset can be deleted.

@wolfgang: The commit b6c491ee4 on pve-container came from you? Would "--make-rslave" instead of "--make-rprivate" suffice to stop the security issue that you wanted to stop together with 2cfae16ee?
 
Thanks for catching this, patch with a fix is on the pve-devel list!
 
I know this is a little old now but for those still struggling with this I was able to replicate and solve without rebooting. Make sure the container you are trying to delete is not included in the current backups, if it is go to Datacenter > Backup > Select the Backup entry and click on Edit, un-check the box for the container you want to delete and click ok. I followed this process and was able to delete the zfs vol without issues.
 
I have had the similar situation on test server with undestroyable "vm-137-disk-2" that was not mount, without any snapshots and VM 137 did not exist from long ago:

>zfs destroy -r -f rpool/data/vm-137-disk-2
cannot destroy 'rpool/data/vm-137-disk-2': dataset is busy

I have fixed this with following procedure:
>fuser -am /dev/rpool/data/vm-137-disk-2
/dev/zd64: 2633

PID 2633 was KVM
I have stopped all KVM VMs and after that I'll be able to destroy vm-137-disk-2 even from Proxmox GUI.
 
Be carefull and do not destroy needed disk, as it was in my case:(. And now I know about is my backup working or not.
 
I ran into this same issue and it turned out to the scsi multipath service.

After ruining

systemctl stop multipathd.service

I was able to destroy the disks.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!