Most ZFS operations leads to an timeout

dirk.nilius

Member
Nov 5, 2015
47
0
6
Berlin, Germany
Most operations I do leads to a ZFS timeout error when the is a notable I/O traffic on the node. The timeouts are way to small or in my opinion useless. Why don't you wait for the zfs command to complete? Why do you produce this errors artificially? I never saw an zfs command not to return, but many times an inconsistent state because to you break a task by this timeout :(
 
Can you elaborate a bit more? What kind of operations are running into timeouts? What is your storage setup like?
 
Things like creating a dataset (while creating a new VM/CT) or create, destroy, rollback of a snapshot. Often it took a little longer to complete as the timeout expects. ZFS has no guarantee for executing such operations in a given time-box. So why is there such a VERY small timeout? Could you answer this?

My setup:

- 3 nodes cluster
- ZFS RAID 10 over 4x1TB SAS HDD
- additional ZIL SSD
- additional Cache SSD

proxmox-ve: 4.1-39 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-15 (running version: 4.1-15/8cd55b52)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-39
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-33
qemu-server: 4.0-62
pve-firmware: 1.1-7
libpve-common-perl: 4.0-49
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-42
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-9
pve-container: 1.0-46
pve-firewall: 2.0-18
pve-ha-manager: 1.0-24
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve1
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie
 
Yes, the timeout is there because otherwise we would need to spawn worker processes for such tasks (otherwise the web interface would block). The timeout used in the ZFSPool plugin is set to 5 seconds, which should be more than enough for a normal setup..

Can you give us more detailed information about your configuration?
  1. Do you have an unusally large amount of subvols/zvols/snapshots?
  2. How long does manually creating a subvol or snapshot take? (simply run "time zfs <command>" for a rough estimate)
  3. Are you talking about local ZFS pools or ZFS over iSCSI?
 
1)

zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
rpool 238G 1.52T 104K /rpool
rpool/ROOT 911M 1.52T 96K /rpool/ROOT
rpool/ROOT/pve-1 911M 1.52T 911M /
rpool/subvol-201-disk-1 8.37G 16.6G 8.37G /rpool/subvol-201-disk-1
rpool/subvol-202-disk-1 26.7G 173G 26.7G /rpool/subvol-202-disk-1
rpool/subvol-205-disk-1 3.49G 21.5G 3.49G /rpool/subvol-205-disk-1
rpool/subvol-206-disk-1 3.14G 21.9G 3.14G /rpool/subvol-206-disk-1
rpool/subvol-208-disk-2 14.0G 236G 14.0G /rpool/subvol-208-disk-2
rpool/subvol-209-disk-1 3.46G 96.5G 3.46G /rpool/subvol-209-disk-1
rpool/subvol-210-disk-1 576M 4.44G 576M /rpool/subvol-210-disk-1
rpool/subvol-212-disk-1 777M 15.2G 777M /rpool/subvol-212-disk-1
rpool/swap 65.9G 1.56T 27.2G -
rpool/vm-108-disk-1 33.0G 1.55T 4.29G -
rpool/vm-109-disk-1 77.4G 1.56T 44.2G -

2)

On an idle node about 1-2 second. On a node under with noteable I/O it's took about 4-8 secondes. But I also saw operations took up to a minute.

3)

all local
 
The timeout used in the ZFSPool plugin is set to 5 seconds, which should be more than enough for a normal setup..

Sorry but I don't agree. What does a 'normal' setup mean? It is not unusual that such operations can took long on larger datasets/snapshots. Would you say this is not 'normal'?
 
1)

zfs list -t all
NAME USED AVAIL REFER MOUNTPOINT
rpool 238G 1.52T 104K /rpool
rpool/ROOT 911M 1.52T 96K /rpool/ROOT
rpool/ROOT/pve-1 911M 1.52T 911M /
rpool/subvol-201-disk-1 8.37G 16.6G 8.37G /rpool/subvol-201-disk-1
rpool/subvol-202-disk-1 26.7G 173G 26.7G /rpool/subvol-202-disk-1
rpool/subvol-205-disk-1 3.49G 21.5G 3.49G /rpool/subvol-205-disk-1
rpool/subvol-206-disk-1 3.14G 21.9G 3.14G /rpool/subvol-206-disk-1
rpool/subvol-208-disk-2 14.0G 236G 14.0G /rpool/subvol-208-disk-2
rpool/subvol-209-disk-1 3.46G 96.5G 3.46G /rpool/subvol-209-disk-1
rpool/subvol-210-disk-1 576M 4.44G 576M /rpool/subvol-210-disk-1
rpool/subvol-212-disk-1 777M 15.2G 777M /rpool/subvol-212-disk-1
rpool/swap 65.9G 1.56T 27.2G -
rpool/vm-108-disk-1 33.0G 1.55T 4.29G -
rpool/vm-109-disk-1 77.4G 1.56T 44.2G -

Sidenote: you should probably think about moving that swap to a non-zfs disk.

2)
On an idle node about 1-2 second. On a node under with noteable I/O it's took about 4-8 secondes. But I also saw operations took up to a minute.

We might be able to bump the timeout to 10 seconds (although it will need discussion first). Can you describe in more detail what operations take up to a minute and under which circumstances? This seems highly unusual to me, unless you are talking about stuff like send/receive ;)
 
I cannot follow your arguments. All these operations are done in a background task, so no blocking UI. I don't remember what took so long. But I think most long running are destroying snapshot and datasets. Destroying depends on the used blockx that has to be freed. https://blogs.oracle.com/ahrens/entry/is_it_magic has some background information. Here you see that this can be a long running issue depending on the hardware, configuration in IO traffic.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!