Most ZFS operations leads to an timeout

dirk.nilius · Mar 11, 2016

Most operations I do leads to a ZFS timeout error when the is a notable I/O traffic on the node. The timeouts are way to small or in my opinion useless. Why don't you wait for the zfs command to complete? Why do you produce this errors artificially? I never saw an zfs command not to return, but many times an inconsistent state because to you break a task by this timeout

fabian · Mar 11, 2016

Can you elaborate a bit more? What kind of operations are running into timeouts? What is your storage setup like?

dirk.nilius · Mar 11, 2016

Things like creating a dataset (while creating a new VM/CT) or create, destroy, rollback of a snapshot. Often it took a little longer to complete as the timeout expects. ZFS has no guarantee for executing such operations in a given time-box. So why is there such a VERY small timeout? Could you answer this?

My setup:

- 3 nodes cluster
- ZFS RAID 10 over 4x1TB SAS HDD
- additional ZIL SSD
- additional Cache SSD

proxmox-ve: 4.1-39 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-15 (running version: 4.1-15/8cd55b52)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-39
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-33
qemu-server: 4.0-62
pve-firmware: 1.1-7
libpve-common-perl: 4.0-49
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-42
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-46
pve-firewall: 2.0-18
pve-ha-manager: 1.0-24
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie

fabian · Mar 11, 2016

Yes, the timeout is there because otherwise we would need to spawn worker processes for such tasks (otherwise the web interface would block). The timeout used in the ZFSPool plugin is set to 5 seconds, which should be more than enough for a normal setup..

Can you give us more detailed information about your configuration?

Do you have an unusally large amount of subvols/zvols/snapshots?
How long does manually creating a subvol or snapshot take? (simply run "time zfs <command>" for a rough estimate)
Are you talking about local ZFS pools or ZFS over iSCSI?

dirk.nilius · Mar 11, 2016

1)

zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
rpool 238G 1.52T 104K /rpool
rpool/ROOT 911M 1.52T 96K /rpool/ROOT
rpool/ROOT/pve-1 911M 1.52T 911M /
rpool/subvol-201-disk-1 8.37G 16.6G 8.37G /rpool/subvol-201-disk-1
rpool/subvol-202-disk-1 26.7G 173G 26.7G /rpool/subvol-202-disk-1
rpool/subvol-205-disk-1 3.49G 21.5G 3.49G /rpool/subvol-205-disk-1
rpool/subvol-206-disk-1 3.14G 21.9G 3.14G /rpool/subvol-206-disk-1
rpool/subvol-208-disk-2 14.0G 236G 14.0G /rpool/subvol-208-disk-2
rpool/subvol-209-disk-1 3.46G 96.5G 3.46G /rpool/subvol-209-disk-1
rpool/subvol-210-disk-1 576M 4.44G 576M /rpool/subvol-210-disk-1
rpool/subvol-212-disk-1 777M 15.2G 777M /rpool/subvol-212-disk-1
rpool/swap 65.9G 1.56T 27.2G -
rpool/vm-108-disk-1 33.0G 1.55T 4.29G -
rpool/vm-109-disk-1 77.4G 1.56T 44.2G -

2)

On an idle node about 1-2 second. On a node under with noteable I/O it's took about 4-8 secondes. But I also saw operations took up to a minute.

3)

all local

dirk.nilius · Mar 11, 2016

fabian said:
The timeout used in the ZFSPool plugin is set to 5 seconds, which should be more than enough for a normal setup..

Sorry but I don't agree. What does a 'normal' setup mean? It is not unusual that such operations can took long on larger datasets/snapshots. Would you say this is not 'normal'?

fabian · Mar 11, 2016

dirk.nilius said:
1)

zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
rpool 238G 1.52T 104K /rpool
rpool/ROOT 911M 1.52T 96K /rpool/ROOT
rpool/ROOT/pve-1 911M 1.52T 911M /
rpool/subvol-201-disk-1 8.37G 16.6G 8.37G /rpool/subvol-201-disk-1
rpool/subvol-202-disk-1 26.7G 173G 26.7G /rpool/subvol-202-disk-1
rpool/subvol-205-disk-1 3.49G 21.5G 3.49G /rpool/subvol-205-disk-1
rpool/subvol-206-disk-1 3.14G 21.9G 3.14G /rpool/subvol-206-disk-1
rpool/subvol-208-disk-2 14.0G 236G 14.0G /rpool/subvol-208-disk-2
rpool/subvol-209-disk-1 3.46G 96.5G 3.46G /rpool/subvol-209-disk-1
rpool/subvol-210-disk-1 576M 4.44G 576M /rpool/subvol-210-disk-1
rpool/subvol-212-disk-1 777M 15.2G 777M /rpool/subvol-212-disk-1
rpool/swap 65.9G 1.56T 27.2G -
rpool/vm-108-disk-1 33.0G 1.55T 4.29G -
rpool/vm-109-disk-1 77.4G 1.56T 44.2G -

Sidenote: you should probably think about moving that swap to a non-zfs disk.

2)
On an idle node about 1-2 second. On a node under with noteable I/O it's took about 4-8 secondes. But I also saw operations took up to a minute.

We might be able to bump the timeout to 10 seconds (although it will need discussion first). Can you describe in more detail what operations take up to a minute and under which circumstances? This seems highly unusual to me, unless you are talking about stuff like send/receive

dirk.nilius · Mar 14, 2016

I cannot follow your arguments. All these operations are done in a background task, so no blocking UI. I don't remember what took so long. But I think most long running are destroying snapshot and datasets. Destroying depends on the used blockx that has to be freed. https://blogs.oracle.com/ahrens/entry/is_it_magic has some background information. Here you see that this can be a long running issue depending on the hardware, configuration in IO traffic.

Search

Search

Most ZFS operations leads to an timeout

dirk.nilius

Member

fabian

Proxmox Staff Member

dirk.nilius

Member

fabian

Proxmox Staff Member

dirk.nilius

Member

dirk.nilius

Member

fabian

Proxmox Staff Member

dirk.nilius

Member