ZFS Import hangs/freezes the whole system

ninjagp

New Member
Aug 3, 2015
4
0
1
I am using last proxmox 4.2 with all packages up to date, using pve-no-subscription.

I create a zpool with a new drive (zpool create Backup sdc)

Then I export that pool ( zpool export Backup )

Then I try to import the zpool ( zpool import Backup) and the zpool command freeze...

Dmesg trace:

[ 360.020970] INFO: task zpool:3748 blocked for more than 120 seconds.
[ 360.020991] Tainted: P O 4.4.6-1-pve #1
[ 360.021006] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 360.021026] zpool D ffff88009a9e3b58 0 3748 2513 0x00000000
[ 360.021029] ffff88009a9e3b58 ffff88009a322aa0 ffffffff81e12580 ffff880118143300
[ 360.021030] ffff88009a9e4000 0000000000000002 0000000000000001 ffff88009a322ac8
[ 360.021032] 0000000000000008 ffff88009a9e3b70 ffffffff818435c5 ffff88009a322a00
[ 360.021034] Call Trace:
[ 360.021039] [<ffffffff818435c5>] schedule+0x35/0x80
[ 360.021047] [<ffffffffc00c4bf4>] taskq_wait+0x74/0xe0 [spl]
[ 360.021051] [<ffffffff810c3d30>] ? wait_woken+0x90/0x90
[ 360.021055] [<ffffffffc00c4cab>] taskq_destroy+0x4b/0x100 [spl]
[ 360.021094] [<ffffffffc01e922d>] vdev_open_children+0x12d/0x180 [zfs]
[ 360.021119] [<ffffffffc01f2ddc>] vdev_root_open+0x3c/0xc0 [zfs]
[ 360.021141] [<ffffffffc01e8d25>] vdev_open+0xf5/0x4d0 [zfs]
[ 360.021163] [<ffffffffc01de97f>] ? spa_config_enter+0xdf/0x120 [zfs]
[ 360.021184] [<ffffffffc01d39f0>] spa_load+0x3a0/0x1b70 [zfs]
[ 360.021187] [<ffffffff810c3d30>] ? wait_woken+0x90/0x90
[ 360.021208] [<ffffffffc01d0838>] ? spa_activate+0x1b8/0x440 [zfs]
[ 360.021229] [<ffffffffc01de657>] ? spa_add+0x627/0x670 [zfs]
[ 360.021249] [<ffffffffc01d5efd>] spa_tryimport+0xad/0x460 [zfs]
[ 360.021272] [<ffffffffc0208594>] zfs_ioc_pool_tryimport+0x64/0xa0 [zfs]
[ 360.021296] [<ffffffffc020adbb>] zfsdev_ioctl+0x44b/0x4f0 [zfs]
[ 360.021298] [<ffffffff812204f2>] do_vfs_ioctl+0x2d2/0x4b0
[ 360.021301] [<ffffffff8109f13b>] ? task_work_run+0x7b/0x90
[ 360.021303] [<ffffffff81003226>] ? exit_to_usermode_loop+0xa6/0xd0
[ 360.021304] [<ffffffff81220749>] SyS_ioctl+0x79/0x90
[ 360.021306] [<ffffffff81003c28>] ? syscall_return_slowpath+0x98/0x110
[ 360.021310] [<ffffffff818476f6>] entry_SYSCALL_64_fastpath+0x16/0x75


I tried different hard disks, plugged on sata , also on usb and the same problem.

It is possibly a ZFSONLINUX issue...

Any ideas??
 
I tried to install an older kernel(pve-kernel-4.2.8-1-pve / zfsonlinux v.0.6.5.4-1) and it works like a charm...

So it should be a bug with the pve-kernel-4.4.6-1-pve / zfsonlinux v.0.6.5.6-1
 
Last edited:
I'm not sure what are you trying to achieve by export/import, I could try it on my own system 4.2 with ZoL, which might give some more insight. It works like a charm by sharing ZFS over network.
 
The main purpose of export / import is to use an USB drive to backup files, using the zfs snapshot/incremental feature.
Thanks for the response, but I think it is a bug in the kernel/zfsonlinux "couple"
 
I have a similar issue, I had to do a full shutdown, and my external USB enclosure containing a couple of drives did not import or get mounted. Now every time i bootup that pool is missing and when i try to import or even zfs status the command hangs forever and the proxmox GUI goes strange giving me the red x and looks like all my machines are shutdown ... however as far as i can tell the system is still ok aside from not booting
 
This seems to have corrected itself at least for me with the latest kernel update ... i have my external USB 3.0 drives importing again
 
I had this same issue after a fresh install of proxmox 4.2 which used pve-kernel-4.4.6. After many attempts I had almost given up hope then found zfsguru which I booted with from my PXE setup. Using pool import it could see the missing pool but warnings about being from another file system waved me off from trying a zpool import -f. Since I am a ZFS newbie didn't want to take any chances. Since this pointed me to a kernel problem I rebooted and install pve-kerne;-4.4.16, updated gurb and selected the new kernel during boot. Same problem. Then saw the post above about pve-kernel-4.2.8 so I installed that one and viola, zpool import worked.
 
I had this issue as well running kernel version 4.4.19-66. I downgraded to 4.2.8-41 and `zpool import` finally worked for me.

I just noticed a newer kernel is available on the pve-no-subscription respository, pve-kernel-4.4.24-1-pve. Is this issue fixed with that kernel?
 
there was a bug (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1636517) affecting kernels based on / from Ubuntu's 4.4 series that caused a hang in the kernel when running "zpool import" in combination with nested pools (i.e., a vdev of a pool being itself a zvol of another pool, like you might have with ZFS in VMs backed by ZFS). that bug was fixed in our 4.4.21 kernel in version 4.4.21-71 , and the 4.4.24 kernel was not affected at all.

that bug only affected "zpool import"s after the initial import at boot (because before the "outer" pool is imported, the zvols from it are not visible to the kernel, and thus are not scanned by zpool import), which is a rather rare operation on most setups, so the bug went unnoticed for quite a while.

if you encounter a zpool/zfs hang with the current kernel, please file a bug report at bugzilla.proxmox.com!
 
  • Like
Reactions: Gene
@fabian, thank you for that information!

It looks like 4.4.21-71 was released on 2016/10/28, but the last time the Proxmox ISO was updated was 2016/09/26. `zpool import` is a very basic command, something that shouldn't ever break a system, I'm surprised the PVE ISO installer wasn't updated for it, especially since there isn't a straightforward process in place for updating PVE.

When I can I'll give the latest kernel a try. What is the proper way to install it? Can I install just that package (
pve-kernel-4.4.24-1-pve) or should I do a dist-upgrade?
 
@fabian, thank you for that information!

It looks like 4.4.21-71 was released on 2016/10/28, but the last time the Proxmox ISO was updated was 2016/09/26. `zpool import` is a very basic command, something that shouldn't ever break a system, I'm surprised the PVE ISO installer wasn't updated for it, especially since there isn't a straightforward process in place for updating PVE.

When I can I'll give the latest kernel a try. What is the proper way to install it? Can I install just that package (
pve-kernel-4.4.24-1-pve) or should I do a dist-upgrade?

installer isos are usually regenerated for every minor release - the expectation is that users upgrade the systems to the current version afterwards (not only after installation, but in general - we only support the current version). updating is very easy - configure your subscription key or enable the no-subscription repository and either use the GUI to upgrade (node -> Updates) or "apt-get update; apt-get dist-upgrade" on the command line / shell. see http://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_package_repositories for details

we fix too many bugs to regenerate the iso every second day - especially because the installer requires a lot of extra testing.
 
  • Like
Reactions: Gene
updating is very easy - configure your subscription key or enable the no-subscription repository and either use the GUI to upgrade (node -> Updates) or "apt-get update; apt-get dist-upgrade" on the command line / shell. see http://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysadmin_package_repositories for details

Thanks for confirming the procedure.

we fix too many bugs to regenerate the iso every second day - especially because the installer requires a lot of extra testing.

I'm not suggesting regenerating the ISO every other day. When your installer ships with a majour issue, something that essentially renders the system unusable and cannot be corrected without an unclean shutdown, it should probably be fixed. Or, at the very least, have some kind of indicator on the downloads page that a repository has to be manually enabled and the software updated to a stable release.

Overall I'm impressed with proxmox, but this has definitely soured my evaluation of the product. I wasted a lot of time on this issue. Ideally I shouldn't have ever encountered it since I only started evaluating it after the pve-kernel package was fixed.
 
Last edited:
Seems you missed a page to add that information to.

https://gfycat.com/IllfatedTiredKissingbug

You are right, this shortcut does not show it.

But you got the notification on each login that you should upload a subscription key for updates. Then you will get automatically emails telling you all about available updates.
 
Okay, so it feels like we're going around in circles here and you're avoiding the primary issue.

The ISO provided for installation is broken. It is not stable. It is not production ready. There isn't a clear indication of this. Saying updates are available in sporadic places isn't helpful, especially when it isn't made clear that they're required to make things function properly.

If you still don't understand I don't know what else to say.

I'm just evaluating a new software product and trying to determine whether or not it's worth the licensing costs. I've been provided inherently broken software and tried providing feedback on it, and what I get in return is deflections about the product's shortcomings and a dismissive, snarky attitude from a representative of that company. This is a pretty terrible first impression.
 
First, thanks for your feedback. But please accept my comments and pick up the help in this community forum.

Proxmox VE is free software, so if you evaluate license costs your can calm down, its zero.

You got all help and all answer for free, so your summary that all is terrible seems a bit too exaggerated.
 
Thank you for your responses. It was very educational. Based on this discussion I can see that buying subscription plans from you for our 100+ hosts with dual sockets isn't in our best interest.
 
Okay, so it feels like we're going around in circles here and you're avoiding the primary issue.

The ISO provided for installation is broken. It is not stable. It is not production ready. There isn't a clear indication of this. Saying updates are available in sporadic places isn't helpful, especially when it isn't made clear that they're required to make things function properly.

If you still don't understand I don't know what else to say.

I'm just evaluating a new software product and trying to determine whether or not it's worth the licensing costs. I've been provided inherently broken software and tried providing feedback on it, and what I get in return is deflections about the product's shortcomings and a dismissive, snarky attitude from a representative of that company. This is a pretty terrible first impression.

I am sorry - but this is how every operating system out there works. After installation, you should install updates to get the current version (with hopefully less bugs, both of the security and stability kind). Debian does point releases every couple months and only then you get new official install images, when you install Windows it is not unusual to spend a whole afternoon getting it to the current update status (although this might have changed/improved since I last had to go through that experience, I am not a Windows user) or you have to create your own install images with (most) updates included. There's no other way to do this without spending an immense amount of wasted resources on testing installation media.

This bug occurred so rarely that it went unnoticed in our kernel (and Ubuntu's LTS kernel) from February until October (November in Ubuntu's case) - it was promptly triaged and fixed when it was discovered, and there was an easy workaround for it that was posted here on the forum for people affected by the bug at the time. I am afraid calling it a "major issue" is vastly overstating the severity of this bug. Calling it a major issue with the installer iso would be double so when it does not prevent installing or upgrading (or testing without upgrading unless you have a very specific niche setup - 99% of the people don't "zpool import" on a running system, and when they do almost all of them pass "-d /dev/disk/by-id" which completely avoids this bug).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!