Can't remove replicated VM disk

alc

Active Member
Feb 18, 2020
23
2
43
40
After launching a replication, the task failed and I've received the following error :
Code:
end replication job with error: No common base snapshot on volume(s) local-zfs:vm-102-disk-0
Note that there are no snapshots associated with this VM.
Also, I've had this error with VM's and CT's too, neither having snapshots.

Then when trying to remove the destination disk, I get "zfs error: cannot destroy 'rpool/vm-102-disk-0': dataset is busy".
Also note that the replication seems to be borked, as the destination drive shows a size of 0B.

I've waited a few hours and tried again with the same results.
I've also removed the replication tasks (and they did go away successfully, but I still couldn't delete the target disk).

What else can I try ?
 
Last edited:
After launching a replication, the task failed and I've received the following error :
Code:
end replication job with error: No common base snapshot on volume(s) local-zfs:vm-102-disk-0
Note that there are no snapshots associated with this VM.
Also, I've had this error with VM's and CT's too, neither having snapshots.

Then when trying to remove the destination disk, I get "zfs error: cannot destroy 'rpool/vm-102-disk-0': dataset is busy".
Also note that the replication seems to be borked, as the destination drive shows a size of 0B.

I've waited a few hours and tried again with the same results.
I've also removed the replication tasks (and they did go away successfully, but I still couldn't delete the target disk).

What else can I try ?

1. Check if the dataset is in use

  • Run the following command to check active usage:
    lsof | grep rpool/vm-102-disk-0

  • Ensure no mounts or locks are holding it:
    zfs get mounted rpool/vm-102-disk-0

  • If mounted, unmount it:
    zfs unmount rpool/vm-102-disk-0

2. Verify Replication State

  • Check for lingering replication configurations:
    cat /etc/pve/replication.cfg

  • If there are still replication entries for the affected VM/CT, remove them manually.

3. Force Delete the ZFS Dataset

  • If the dataset is still "busy":
    zfs destroy -f rpool/vm-102-disk-0

  • If it still refuses:
    zfs destroy -f -r rpool/vm-102-disk-0

4. Restart ZFS Services

  • Restart ZFS-related services:
    systemctl restart zfs.target

  • If the issue persists, reboot the node:
    reboot

5. Fix Future Replication Issues

Since replication requires snapshots:

  • Create a manual snapshot before starting a new replication:
    zfs snapshot local-zfs/vm-102-disk-0@base
  • Then recreate the replication task in the Proxmox UI.
  • If issues persist, check system logs:
    journalctl -xe | grep zfs
 
  • Like
Reactions: esod
Hi,
the replication uses ZFS snapshots to transfer the data. If there is no common snapshot between source and target, replication will not be possible. Please share the output of zpool history | grep vm-102-disk-0 on the target as well as pveversion -v and zfs list -r -t all rpool/data/vm-102-disk-0 on both source and target.

Is there any task referencing the disk still running, ps aux | grep -e pvesr -e vm-102-disk-0 -e zfs?