pct restore, zfs list timeout

Mar 8, 2016
66
3
73
Hello:

My host is moving some ZFS datasets around bringing IO delay a little above 6%. While this was happening I brought over a CT from another host using vzdump, scp the .gz file over to the 'busy' host, and then try to pct restore it there. The target disk for the restore was not one of the busy ones, but I guess the busy ones slowed down the zfs list part of the restore. I can confirm a zfs list took about 6 seconds in this state (host cpu was not highly loaded).

The error message at either the command line or gui was "command 'zfs list -o name,volsize,origin,type,refquota -t volume,filesystem -Hr' failed: got timeout"

With persistent attempts I got it to work, and the vzdump -> pct restore is a very nice mechanism, but the experience was let down by the error.

I suggest you give the zfs list operation at least 10 seconds before your script/program gives up on the zfs list part of the restore.
 
Hi,

the point is zfs command line are low priority in ZFS.
So when one pool is busy and you ask another idle one the command will take longer.
 
Hi,

the point is zfs command line are low priority in ZFS.
So when one pool is busy and you ask another idle one the command will take longer.

Hello Wolfgang:

I think you missed the point of my post. There seems to be a bug with your software where pct restore (by command line or GUI) will fail to execute because it does not wait long enough for the zfs list operation it calls to complete. Is there a better place for me to report this to the developers?

Thank you.
 
There seems to be a bug with your software where pct restore
No this is not a bug this is a timeout and there is a reason why there is a timeout.
You can't increase the timeout because if we do that we run into other problems.
 
No this is not a bug this is a timeout and there is a reason why there is a timeout.
You can't increase the timeout because if we do that we run into other problems.

Thank you for the quick responses. I'm not a developer so I can't really argue with you about whether such a short timeout for your script to run a zfs list operation is justified. My thinking (as a 'lay person') is that zfs list needs to query all of the pools on the system, but if I'm not trying to restore to the very busy pool but instead restore to a different one, what it the problem in continuing to execute the restore? But I don't need to waste your time discussing that so no need to reply further.

However, I can then instead suggest that you improve the error output:

Unable to restore container because "command 'zfs list -o name,volsize,origin,type,refquota -t volume,filesystem -Hr" took more than 5 seconds to return. Please try the restore again when none of your storage pools is very busy.

 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!