move disk feature blocking behaviour with zfs

layer7.net

Member
Oct 5, 2021
43
3
13
24
Hi,

if you are moving a disk from ceph to a zfs storage with one VM and you try to remove another VM on the same host which is on zfs, you will receive

TASK ERROR: zfs error: cannot destroy 'local-zfs/vm-124-disk-0': dataset is busy

pve-manager/7.2-4/ca9d43cc (running kernel: 5.15.35-1-pve)

Is this normal?

Greetings
Oliver
 
Hi,
no that should not happen. But I'd rather guess the image to be removed was still in use by something else and the move disk just happened at the same time. Can you share an excerpt of /var/log/syslog and zpool history | grep vm-124-diks-0 from before the failure occurred as well as the output of pveversion -v?
 
Hello fiona,

as it seems, doing migrations blocks several zfs actions.

When trying to create new VM's by cloning existing template, you will receive:

Code:
()
trying to acquire lock...
TASK ERROR: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout

This is
Code:
proxmox-ve: 7.2-1 (running kernel: 5.15.39-4-pve)
pve-manager: 7.2-7 (running version: 7.2-7/d0dd0e85)
pve-kernel-5.15: 7.2-9
pve-kernel-helper: 7.2-9
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.39-4-pve: 5.15.39-4
pve-kernel-5.15.35-1-pve: 5.15.35-3
pve-kernel-5.13.19-6-pve: 5.13.19-15
pve-kernel-5.13.19-2-pve: 5.13.19-4
ceph-fuse: 16.2.9-pve1
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve1
libproxmox-acme-perl: 1.4.2
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.2-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.2-2
libpve-guest-common-perl: 4.1-2
libpve-http-server-perl: 4.1-3
libpve-storage-perl: 7.2-8
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.2.5-1
proxmox-backup-file-restore: 2.2.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.5.1
pve-cluster: 7.2-2
pve-container: 4.2-2
pve-docs: 7.2-2
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-5
pve-firmware: 3.5-1
pve-ha-manager: 3.4.0
pve-i18n: 2.7-2
pve-qemu-kvm: 7.0.0-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.2-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.5-pve1

from the syslog:

Code:
Aug 29 22:34:39 n3 pvedaemon[3293053]: <root@pam> starting task UPID:n3:0035B57E:0090CD08:630D22DF:qmstart:265:root@pam:
Aug 29 22:34:39 n3 pvedaemon[3519870]: start VM 265: UPID:n3:0035B57E:0090CD08:630D22DF:qmstart:265:root@pam:
Aug 29 22:34:49 n3 pvedaemon[3519870]: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout
Aug 29 22:34:49 n3 pvedaemon[3293053]: <root@pam> end task UPID:n3:0035B57E:0090CD08:630D22DF:qmstart:265:root@pam: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout
Aug 29 22:34:51 n3 pvedaemon[3293053]: <root@pam> starting task UPID:n3:00363E2C:0090D169:630D22EB:qmdestroy:265:root@pam:
Aug 29 22:34:51 n3 pvedaemon[3554860]: destroy VM 265: UPID:n3:00363E2C:0090D169:630D22EB:qmdestroy:265:root@pam:
Aug 29 22:34:59 n3 pmxcfs[1735]: [status] notice: received log
Aug 29 22:35:01 n3 pvedaemon[3554860]: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout
Aug 29 22:35:01 n3 pvedaemon[3293053]: <root@pam> end task UPID:n3:00363E2C:0090D169:630D22EB:qmdestroy:265:root@pam: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout

zfs things:

Code:
root@n3:~# zpool history | grep vm-265-diks-0
root@n3:~# zfs list
NAME                         USED  AVAIL     REFER  MOUNTPOINT
local-zfs                    443G  4.88T     47.1K  /local-zfs
local-zfs/vm-265-cloudinit  35.9K  4.88T     35.9K  -
local-zfs/vm-265-disk-0      673M  4.88T      673M  -
root@n3:~#


The delay on all zfs commands is huge. On the CLI i have to wait 60 seconds. From the proxmox GUI a start of this 265 VM took 18 minuets.
 
Hello fiona,

as it seems, doing migrations blocks several zfs actions.


When trying to create new VM's by cloning existing template, you will receive:

Code:
()
trying to acquire lock...
TASK ERROR: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout

Code:
Aug 29 22:34:39 n3 pvedaemon[3293053]: <root@pam> starting task UPID:n3:0035B57E:0090CD08:630D22DF:qmstart:265:root@pam:
Aug 29 22:34:39 n3 pvedaemon[3519870]: start VM 265: UPID:n3:0035B57E:0090CD08:630D22DF:qmstart:265:root@pam:
Aug 29 22:34:49 n3 pvedaemon[3519870]: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout
Aug 29 22:34:49 n3 pvedaemon[3293053]: <root@pam> end task UPID:n3:0035B57E:0090CD08:630D22DF:qmstart:265:root@pam: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout
Aug 29 22:34:51 n3 pvedaemon[3293053]: <root@pam> starting task UPID:n3:00363E2C:0090D169:630D22EB:qmdestroy:265:root@pam:
Aug 29 22:34:51 n3 pvedaemon[3554860]: destroy VM 265: UPID:n3:00363E2C:0090D169:630D22EB:qmdestroy:265:root@pam:
Aug 29 22:34:59 n3 pmxcfs[1735]: [status] notice: received log
Aug 29 22:35:01 n3 pvedaemon[3554860]: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout
Aug 29 22:35:01 n3 pvedaemon[3293053]: <root@pam> end task UPID:n3:00363E2C:0090D169:630D22EB:qmdestroy:265:root@pam: can't lock file '/var/lock/qemu-server/lock-265.conf' - got timeout
Doing a migration will prevent other operations on that VM, because that's the only way to ensure that no bad interaction happens. Otherwise, for example, a disk might already be moved to the other node while a clone runs. That's why Proxmox VE won't give you the lock for other operations. You have to wait for the migration to finish.
 
Hi Fiona,

its clear that a migration will and must block all operations for the migrating vm to ensure a consistent state of the VM ( incl. its disks ).

To me its not clear why a migration of VM A to node X will actually block / harm all other operations with the ZFS storage on node X.

As i already explained, creating a _new_ VM or destroying _new_ VM's will likely run in a timeout.

This is nothing that comes from ZFS actually, as of course ZFS will allow operations ( creating/destroy datasets/volumes ) as long as the operation will not collide with another operation on the _same_ dataset/volume.

But what i can see here is, that migration of VM ID 123 will block/harm any other operation ( on any _other_ VM ) in context with the ZFS storage.

And to me that does not seem to be a wanted behaviour.

Greetings
Oliver
 
Hi Fiona,

its clear that a migration will and must block all operations for the migrating vm to ensure a consistent state of the VM ( incl. its disks ).

To me its not clear why a migration of VM A to node X will actually block / harm all other operations with the ZFS storage on node X.

As i already explained, creating a _new_ VM or destroying _new_ VM's will likely run in a timeout.
Oh, I see. Since your root partition is also on ZFS, trying to acquire the /var/lock/qemu-server/lock-265.conf is actually also a ZFS operation.
This is nothing that comes from ZFS actually, as of course ZFS will allow operations ( creating/destroy datasets/volumes ) as long as the operation will not collide with another operation on the _same_ dataset/volume.

But what i can see here is, that migration of VM ID 123 will block/harm any other operation ( on any _other_ VM ) in context with the ZFS storage.

And to me that does not seem to be a wanted behaviour.

Greetings
Oliver


The delay on all zfs commands is huge. On the CLI i have to wait 60 seconds. From the proxmox GUI a start of this 265 VM took 18 minuets.
What type of disks are you using for zfs? Usually if you use consumer ssds this will happen.
Yes, if all your other ZFS operations hang when some other operation is going on, it could be unfit hardware or some other hardware/firmware/configuration issue.
 
Hi,

yes

/var/lock/qemu-server/lock-265.conf is actually also a ZFS operation.

but why should this block the creation / destroy of any other VM on the same storage?

---------

What type of disks are you using for zfs? Usually if you use consumer ssds this will happen.

it could be unfit hardware or some other hardware/firmware/configuration issue.


yes its of course natural, that if you are using weak hardware, that you get what you paid for.

And without doubt, moving this 2 TB disk with 10G is quiet unpleasent for ZFS using just SAS disks as raidz2.

But as i already wrote:

On the CLI i have to wait 60 seconds for a operation to take place. From the proxmox GUI a start of a VM took 18 minuets.

I dont know how proxmox exactly interact/access ZFS. But to me it seems strange that CLI operations with ZFS seems a lot faster, without timeout while operations of proxmox took eighter a _lot_ more time or will be simply blocked or timeout.
 
Hi,

yes



but why should this block the creation / destroy of any other VM on the same storage?

---------
It shouldn't, but likely the storage is completely overloaded so that even acquiring a lock fails.

yes its of course natural, that if you are using weak hardware, that you get what you paid for.

And without doubt, moving this 2 TB disk with 10G is quiet unpleasent for ZFS using just SAS disks as raidz2.

But as i already wrote:

On the CLI i have to wait 60 seconds for a operation to take place. From the proxmox GUI a start of a VM took 18 minuets.

I dont know how proxmox exactly interact/access ZFS. But to me it seems strange that CLI operations with ZFS seems a lot faster, without timeout while operations of proxmox took eighter a _lot_ more time or will be simply blocked or timeout.
The UI will also just trigger the same API calls as the CLI handlers. Or do you mean CLI operations with the zfs command directly? Promxox VE needs to access other files too and when your root filesystems hangs, that takes longer of course.

EDIT: To avoid/reduce such issues in the future, you might want to use bandwidth-limits for IO-heavy operations. There is a bwlimit option that can be set for the storage configuration. See man pvesm for details.
 
Last edited:
Hi,

no the root filesystem is _not_ on zfs! Please excuse me that i didnt clearify this earlier. The OS is running on independent, dedicated for the OS, disks.

And yes, i am talking about CLI operations with the zfs / zpool command. They are fast(er) and dont time out/not having any issues.

Proxmox has ( for some reason i would like to understand ) huge problems.

And of course i could just work around this problem ( stronger hardware / limiting the amount of data flowing ) but that is actually not solving/understanding the root cause of this behaviour.

Thats why i made it a topic here. Maybe there is an issue with the way proxmox talks with ZFS. Because its strange to me that proxmox zfs operations seems to have a much higher delay or even timeout / block compared to zfs operation by the CLI ( zfs / zpool command ).

And by the way: Thank you for your time! I do appreciate it!
 
zfs commands work directly with disks and pool, while proxmox commands usually work with api, which is delay prone. I still see you have written disk model and number, can you write that for start?