ZFS plugin timeouts

johnplv

Renowned Member
Jul 19, 2013
32
1
73
I can not migrate the disk, I can not restore from backup.
After 5 seconds the operation fails: got timeout

Code:
[COLOR=#000000][FONT=tahoma]TASK ERROR: [/FONT][/COLOR][COLOR=#000000][FONT=tahoma]storage migration failed: command '/usr/bin/ssh -o 'BatchMode=yes' -i /etc/pve/priv/zfs/172.16.99.15_id_rsa root@172.16.99.15 zfs create -b 4k -V 62914560k datapool/vm-1105-disk-3' failed: got timeout
[/FONT][/COLOR]
Code:
[COLOR=#000000][FONT=tahoma]TASK ERROR: command 'lzop -d -c /mnt/pve/node3-NFS/dump/vzdump-qemu-1105-2014_07_03-20_43_44.vma.lzo|vma extract -v -r /var/tmp/vzdumptmp74702.fifo - /var/tmp/vzdumptmp74702' failed: command '/usr/bin/ssh -o 'BatchMode=yes' -i /etc/pve/priv/zfs/172.16.99.15_id_rsa root@172.16.99.15 zfs create -b 4k -V 62914560k datapool/vm-1105-disk-2' failed: got timeout[/FONT][/COLOR]


The disk (datapool/vm-1105-disk-2) is created on Nexenta completely.
Apparently my Nexenta not fast and ZFS plugin does not wait for an answer from her.
Can I increase the wait time for operations ZFS plugin?
 
Did you remember these settings in /etc/ssh/sshd_config on Nexenta?

LookupClientHostnames no
VerifyReverseMapping no
GSSAPIAuthentication no
 
Just to be certain. You did remember doing this step according to wiki?


  • login once to zfs san from each proxmox node
ssh -i /etc/pve/priv/zfs/192.168.1.1_id_rsa root@192.168.1.1

The authenticity of host '192.168.1.1 (192.168.1.1)' can't be established.
RSA key fingerprint is 8c:f9:46:5e:40:65:b4:91:be:41:a0:25:ef:7f:80:5f.
Are you sure you want to continue connecting (yes/no)? yes

When login is performed the first time ssh will wait for confirmation indefinitely.
 
yes, i did it all

small operations do not cause problems:
I can see the contents of the storage (list of raw disks) in the web interface,
I can change the size of the disk in the web interface
I can log in to Nexenta through ssh without password
I can create a disk via ssh:
Code:
root@pve-node1:/etc/ssh# time ssh -o 'BatchMode=yes' -i /etc/pve/priv/zfs/172.16.99.15_id_rsa root@172.16.99.15 zfs  create -b 4k -V 62914560k datapool/vm-1105-disk-3


real    0m13.095s
user    0m0.006s
sys     0m0.005s
 
Then your problem is related to the time it takes for your NexentaStore to comply to a request and the default timeout in pvedaemon. How to change that is beyond my knowledge. How is your Nexenta actually configured (controller, disk, raid, memory etc)?
 
An optimization improvement you could try. Performance benchmarks shows that using 8k for block size gives a considerable performance boost.
 
Then your problem is related to the time it takes for your NexentaStore to comply to a request and the default timeout in pvedaemon. How to change that is beyond my knowledge. How is your Nexenta actually configured (controller, disk, raid, memory etc)?
somebody of the developers can tell where in the source code possible to increase the timeout?

An optimization improvement you could try. Performance benchmarks shows that using 8k for block size gives a considerable performance boost.
if i change "blocksize 4k" to "blocksize 8k" in /etc/pve/storage.cfg
it will not cause problems with already working disks?
 
Last edited:
Before changing anything you could try the following:
time ssh -o 'BatchMode=yes' -i /etc/pve/priv/zfs/172.16.99.15_id_rsa root@172.16.99.15 zfs create -b 8k -V 62914560k datapool/vm-1105-disk-3
 
Yeah, just great!
but does not solve the problem

Hi,
I think we met similar problems although in different environments.
My problem was on a low-end server: HP Proliant ML315, 1xXeon, 16GB RAM, 2x1TB Seagate SATA3, plus a Plextor SSD PCIe for SLOG and L2ARC, embedded SATA RAID card in AHCI mode.

When system IO was just idle, a qm snapshot, which in turn, does a zfs create for a zvol was fine, just takes a couple of seconds.
But whenever a file copy of several GBs was running, a snapshot from web interface, or directly a qm snapshot, failed with a timeout error on command zfs create -V etc.
However, the zvol was indeed created. Same when trying to add a new zfs-plugin based disk to a VM: an error raised, vm config file is not updated to include new disk but the zvol is indeed create.

But if I issue a zfs create -V command, it takes something like 15 seconds but succeeds.

So I finally found the code for the zfs plugin at /usr/share/perl5/PVE/Storage/ZFSPoolPlugin.pm . At sub zfs_request I simply overwrite whichever value passed to timeout to an arbitrary high value of 500. My problem is solved now, although it should be further investigated why zfs takes so long, or adjust the timeout values.

Regards
Emilio Arrufat
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!