We just experienced a nasty crash whenever kernel touched our ZFS pool. This occured after we replaced one faulty drive and resilvered but in fact had nothing to do with that.
The crash bug occurs when ZFS is trying to reapply ZIL due to a previous power loss. The issues linked below document the detailed problem, symptoms and potential solutions. They helped us recover from that issue.
Partial trace (like this: https://github.com/zfsonlinux/zfs/issues/7151#issuecomment-518045491):
Related PRs/issues:
* tuxoko/zfs commit fixing the issue: https://github.com/tuxoko/zfs/commit/698ba75a05b09e2b7dd460e5564e3c9d6a9df1f2 (refers to 3 issues against zfsonlinux/zfs listed below)
* Issue 1: https://github.com/zfsonlinux/zfs/issues/7151
* Issue 2: https://github.com/zfsonlinux/zfs/issues/8910
* Issue 3: https://github.com/zfsonlinux/zfs/issues/9123
* Merged PR in zfsonlinux/zfs: https://github.com/zfsonlinux/zfs/pull/9061
* Helpful workarounds: https://github.com/zfsonlinux/zfs/issues/8910#issuecomment-502381050 and https://github.com/zfsonlinux/zfs/issues/8910#issuecomment-504147847
Since the bug has been fixed in zfsonlinux/zfs, it is up to the Proxmox team to release a kernel version with the patch - probably backported, since zfsonlinux releases are rare.
The crash bug occurs when ZFS is trying to reapply ZIL due to a previous power loss. The issues linked below document the detailed problem, symptoms and potential solutions. They helped us recover from that issue.
Partial trace (like this: https://github.com/zfsonlinux/zfs/issues/7151#issuecomment-518045491):
Code:
kernel:[ 149.314630] VERIFY3(0 == dmu_object_claim_dnsize(zfsvfs->z_os, obj, DMU_OT_PLAIN_FILE_CONTENTS, 0, obj_type, bonuslen, dnodesize, tx)) failed (0 == 28)
Related PRs/issues:
* tuxoko/zfs commit fixing the issue: https://github.com/tuxoko/zfs/commit/698ba75a05b09e2b7dd460e5564e3c9d6a9df1f2 (refers to 3 issues against zfsonlinux/zfs listed below)
* Issue 1: https://github.com/zfsonlinux/zfs/issues/7151
* Issue 2: https://github.com/zfsonlinux/zfs/issues/8910
* Issue 3: https://github.com/zfsonlinux/zfs/issues/9123
* Merged PR in zfsonlinux/zfs: https://github.com/zfsonlinux/zfs/pull/9061
* Helpful workarounds: https://github.com/zfsonlinux/zfs/issues/8910#issuecomment-502381050 and https://github.com/zfsonlinux/zfs/issues/8910#issuecomment-504147847
Since the bug has been fixed in zfsonlinux/zfs, it is up to the Proxmox team to release a kernel version with the patch - probably backported, since zfsonlinux releases are rare.