ZFS error on pool

Aug 20, 2018
20
0
6
Hello,
I have some errors on one of our zfs pools. I cannot perform any action that is related to ZFS operations like pool scrub, snapshots, ...

Maybe anyone is able to help ?

The node is one of our old nodes with consumer hardware (Asus H170M-PLUS, Core i7 and no ecc ram). ZFS is running on Samsung SSD EVO.
Zpool status looks good so far. Maybe it is somehow related to the hardware. But please have a look.

proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
zfsutils-linux: 0.8.4-pve1

zpool status
pool: zpool1
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 0 days 00:51:30 with 0 errors on Sat Sep 12 23:15:31 2020
config:

NAME STATE READ WRITE CKSUM
zpool1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
7f9ce18d-634e-4315-8504-ba65ed8f6327 ONLINE 0 0 0
b673d8e7-7995-457e-a7a3-bf30aefb89a6 ONLINE 0 0 0

errors: No known data errors


Output of dmesg:

[ 821.067625] ata5.00: Enabling discard_zeroes_data
[ 850.836877] ata6.00: Enabling discard_zeroes_data
[ 1005.452750] zd0: p1
[ 1005.455945] zd16: p1 p2 p3
[ 1005.460989] zd32: p1 p2
[ 1005.468935] zd48: p1 p2 p3 p4 < p5 p6 >
[ 1005.474918] zd64: p1 p3
[ 1005.708613] PANIC: blkptr at 00000000baa0bfe2 has invalid TYPE 108
[ 1005.708615] PANIC: blkptr at 00000000c30871fb DVA 0 has invalid VDEV 131072
[ 1005.708616] Showing stack for process 11383
[ 1005.708617] Showing stack for process 11320
[ 1005.708618] CPU: 1 PID: 11383 Comm: z_wr_int Tainted: P O 5.4.0-42-generic #46-Ubuntu
[ 1005.708618] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[ 1005.708619] Call Trace:
[ 1005.708624] dump_stack+0x6d/0x9a
[ 1005.708631] spl_dumpstack+0x29/0x2b [spl]
[ 1005.708636] vcmn_err.cold+0x60/0x99 [spl]
[ 1005.708638] ? check_preempt_curr+0x20/0x90
[ 1005.708639] ? ttwu_do_wakeup+0x1e/0x150
[ 1005.708641] ? ttwu_do_activate+0x5b/0x70
[ 1005.708642] ? try_to_wake_up+0x224/0x6a0
[ 1005.708698] zfs_panic_recover+0x6f/0x90 [zfs]
[ 1005.708744] zfs_blkptr_verify+0x380/0x440 [zfs]
[ 1005.708786] zio_free+0x22/0xf0 [zfs]
[ 1005.708825] dsl_free+0x11/0x20 [zfs]
[ 1005.708860] dsl_dataset_block_kill+0x2ba/0x480 [zfs]
[ 1005.708891] dbuf_write_done+0x1b2/0x1e0 [zfs]
[ 1005.708921] arc_write_done+0x235/0x440 [zfs]
[ 1005.708964] zio_done+0x3aa/0xe20 [zfs]
[ 1005.709006] zio_execute+0x91/0xe0 [zfs]
[ 1005.709012] taskq_thread+0x245/0x430 [spl]
[ 1005.709014] ? __switch_to_asm+0x40/0x70
[ 1005.709016] ? wake_up_q+0x70/0x70
[ 1005.709059] ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[ 1005.709061] kthread+0x104/0x140
[ 1005.709065] ? task_done+0x90/0x90 [spl]
[ 1005.709066] ? kthread_park+0x90/0x90
[ 1005.709067] ret_from_fork+0x35/0x40
[ 1005.709069] CPU: 5 PID: 11320 Comm: z_wr_int Tainted: P O 5.4.0-42-generic #46-Ubuntu
[ 1005.709070] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[ 1005.709070] Call Trace:
[ 1005.709073] dump_stack+0x6d/0x9a
[ 1005.709079] spl_dumpstack+0x29/0x2b [spl]
[ 1005.709085] vcmn_err.cold+0x60/0x99 [spl]
[ 1005.709086] ? check_preempt_curr+0x7a/0x90
[ 1005.709088] ? ttwu_do_wakeup+0x1e/0x150
[ 1005.709089] ? ttwu_do_activate+0x5b/0x70
[ 1005.709091] ? try_to_wake_up+0x224/0x6a0
[ 1005.709093] ? dm_make_request+0x58/0xb0
[ 1005.709137] zfs_panic_recover+0x6f/0x90 [zfs]
[ 1005.709185] zfs_blkptr_verify+0x3ab/0x440 [zfs]
[ 1005.709236] zio_free+0x22/0xf0 [zfs]
[ 1005.709288] dsl_free+0x11/0x20 [zfs]
[ 1005.709336] dsl_dataset_block_kill+0x2ba/0x480 [zfs]
[ 1005.709380] dbuf_write_done+0x1b2/0x1e0 [zfs]
[ 1005.709416] arc_write_done+0x235/0x440 [zfs]
[ 1005.709463] zio_done+0x3aa/0xe20 [zfs]
[ 1005.709510] zio_execute+0x91/0xe0 [zfs]
[ 1005.709514] taskq_thread+0x245/0x430 [spl]
[ 1005.709516] ? wake_up_q+0x70/0x70
[ 1005.709567] ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[ 1005.709568] kthread+0x104/0x140
[ 1005.709572] ? task_done+0x90/0x90 [spl]
[ 1005.709573] ? kthread_park+0x90/0x90
[ 1005.709574] ret_from_fork+0x35/0x40
 
Last edited:
Youre right, this output is from a zfs enabled rescue system.
I have bootet up again the server and did an import of the pool (this time with pve kernel). Here is the outcome:

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.927457] PANIC: blkptr at 000000000e8ad38c DVA 0 has invalid VDEV 131072

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.927458] PANIC: blkptr at 000000005c4cc310 has invalid TYPE 108

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.928361] PANIC: blkptr at 0000000005a25a22 DVA 0 has invalid VDEV 65536

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.928370] PANIC: blkptr at 00000000123a078d DVA 0 has invalid VDEV 4194304

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.928524] PANIC: blkptr at 00000000aaecb7ff DVA 0 has invalid VDEV 11534336


dmesg:

[78358.755191] ata5.00: Enabling discard_zeroes_data
[78370.136169] ata6.00: Enabling discard_zeroes_data
[78436.716999] zd0: p1
[78436.719978] zd16: p1 p2 p3
[78436.724023] zd32: p1 p2
[78436.732161] zd48: p1 p2 p3 p4 < p5 p6 >
[78436.738724] zd64: p1 p3
[78438.927457] PANIC: blkptr at 000000000e8ad38c DVA 0 has invalid VDEV 131072
[78438.927458] PANIC: blkptr at 000000005c4cc310 has invalid TYPE 108
[78438.927459] Showing stack for process 28017
[78438.927484] Showing stack for process 28022
[78438.927503] CPU: 1 PID: 28017 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.927550] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.927575] Call Trace:
[78438.927588] dump_stack+0x6d/0x9a
[78438.927603] spl_dumpstack+0x29/0x2b [spl]
[78438.927619] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.927673] ? range_tree_add+0x11/0x20 [zfs]
[78438.927689] ? _cond_resched+0x19/0x30
[78438.927701] ? mutex_lock+0x12/0x30
[78438.927734] ? dmu_buf_will_dirty_impl+0x95/0x130 [zfs]
[78438.927751] ? try_to_wake_up+0x67/0x650
[78438.927779] ? _cond_resched+0x19/0x30
[78438.927819] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.927863] zfs_blkptr_verify+0x265/0x400 [zfs]
[78438.927878] ? __mutex_lock_slowpath+0x13/0x20
[78438.927921] zio_free+0x21/0xe0 [zfs]
[78438.927958] dsl_free+0x11/0x20 [zfs]
[78438.927992] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.928026] dbuf_write_done+0x171/0x210 [zfs]
[78438.928057] arc_write_done+0x8f/0x410 [zfs]
[78438.928099] zio_done+0x440/0x1030 [zfs]
[78438.928141] zio_execute+0x99/0xf0 [zfs]
[78438.928157] taskq_thread+0x2ec/0x4d0 [spl]
[78438.928171] ? wake_up_q+0x80/0x80
[78438.928211] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.928231] kthread+0x120/0x140
[78438.928244] ? task_done+0xb0/0xb0 [spl]
[78438.928257] ? kthread_park+0x90/0x90
[78438.928270] ret_from_fork+0x35/0x40
[78438.928283] CPU: 5 PID: 28022 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.928311] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.928337] Call Trace:
[78438.928345] dump_stack+0x6d/0x9a
[78438.928356] spl_dumpstack+0x29/0x2b [spl]
[78438.928361] PANIC: blkptr at 0000000005a25a22 DVA 0 has invalid VDEV 65536
[78438.928370] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.928370] PANIC: blkptr at 00000000123a078d DVA 0 has invalid VDEV 4194304
[78438.928371] Showing stack for process 28016
[78438.928397] Showing stack for process 28019
[78438.928399] ? check_preempt_curr+0x68/0x90
[78438.928524] PANIC: blkptr at 00000000aaecb7ff DVA 0 has invalid VDEV 11534336
[78438.929078] ? ttwu_do_wakeup+0x1e/0x150
[78438.929819] Showing stack for process 28021
[78438.930403] ? ttwu_do_activate+0x5a/0x70
[78438.930404] ? try_to_wake_up+0x223/0x650
[78438.930438] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.930469] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.934550] zio_free+0x21/0xe0 [zfs]
[78438.935111] dsl_free+0x11/0x20 [zfs]
[78438.935630] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.936142] dbuf_write_done+0x171/0x210 [zfs]
[78438.936641] arc_write_done+0x8f/0x410 [zfs]
[78438.937151] zio_done+0x440/0x1030 [zfs]
[78438.937676] zio_execute+0x99/0xf0 [zfs]
[78438.938161] taskq_thread+0x2ec/0x4d0 [spl]
[78438.938649] ? wake_up_q+0x80/0x80
[78438.939166] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.939660] kthread+0x120/0x140
[78438.940155] ? task_done+0xb0/0xb0 [spl]
[78438.940663] ? kthread_park+0x90/0x90
[78438.941150] ret_from_fork+0x35/0x40
[78438.941636] CPU: 0 PID: 28016 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.942290] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.942797] Call Trace:
[78438.943298] dump_stack+0x6d/0x9a
[78438.943781] spl_dumpstack+0x29/0x2b [spl]
[78438.944254] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.944742] ? check_preempt_curr+0x79/0x90
[78438.945216] ? ttwu_do_wakeup+0x1e/0x150
[78438.945689] ? ttwu_do_activate+0x5a/0x70
[78438.946189] ? try_to_wake_up+0x223/0x650
[78438.946702] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.947214] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.947712] zio_free+0x21/0xe0 [zfs]
[78438.948201] dsl_free+0x11/0x20 [zfs]
[78438.948699] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.949182] dbuf_write_done+0x171/0x210 [zfs]
[78438.949666] arc_write_done+0x8f/0x410 [zfs]
[78438.950191] zio_done+0x440/0x1030 [zfs]
[78438.950705] zio_execute+0x99/0xf0 [zfs]
[78438.951164] taskq_thread+0x2ec/0x4d0 [spl]
[78438.951633] ? wake_up_q+0x80/0x80
[78438.952112] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.952591] kthread+0x120/0x140
[78438.953052] ? task_done+0xb0/0xb0 [spl]
[78438.953515] ? kthread_park+0x90/0x90
[78438.953977] ret_from_fork+0x35/0x40
[78438.954501] CPU: 3 PID: 28021 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.955205] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.955854] Call Trace:
[78438.956484] dump_stack+0x6d/0x9a
[78438.957133] spl_dumpstack+0x29/0x2b [spl]
[78438.957769] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.958464] ? check_preempt_curr+0x68/0x90
[78438.959082] ? ttwu_do_wakeup+0x1e/0x150
[78438.959689] ? ttwu_do_activate+0x5a/0x70
[78438.960310] ? try_to_wake_up+0x223/0x650
[78438.960945] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.961592] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.962252] zio_free+0x21/0xe0 [zfs]
[78438.962873] dsl_free+0x11/0x20 [zfs]
[78438.963468] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.964084] dbuf_write_done+0x171/0x210 [zfs]
[78438.964684] arc_write_done+0x8f/0x410 [zfs]
[78438.965302] zio_done+0x440/0x1030 [zfs]
[78438.965900] zio_execute+0x99/0xf0 [zfs]
[78438.966521] taskq_thread+0x2ec/0x4d0 [spl]
[78438.967081] ? wake_up_q+0x80/0x80
[78438.967662] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.968280] kthread+0x120/0x140
[78438.968845] ? task_done+0xb0/0xb0 [spl]
[78438.969418] ? kthread_park+0x90/0x90
[78438.969983] ret_from_fork+0x35/0x40
[78438.970593] CPU: 7 PID: 28019 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.971144] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.971597] Call Trace:
[78438.972037] dump_stack+0x6d/0x9a
[78438.972492] spl_dumpstack+0x29/0x2b [spl]
[78438.972936] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.973378] ? check_preempt_curr+0x79/0x90
[78438.973832] ? ttwu_do_wakeup+0x1e/0x150
[78438.974343] ? ttwu_do_activate+0x5a/0x70
[78438.974780] ? try_to_wake_up+0x223/0x650
[78438.975253] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.975713] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.976171] zio_free+0x21/0xe0 [zfs]
[78438.976631] dsl_free+0x11/0x20 [zfs]
[78438.977059] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.977489] dbuf_write_done+0x171/0x210 [zfs]
[78438.977934] arc_write_done+0x8f/0x410 [zfs]
[78438.978429] zio_done+0x440/0x1030 [zfs]
[78438.978896] zio_execute+0x99/0xf0 [zfs]
[78438.979328] taskq_thread+0x2ec/0x4d0 [spl]
[78438.979745] ? wake_up_q+0x80/0x80
[78438.980184] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.980621] kthread+0x120/0x140
[78438.981045] ? task_done+0xb0/0xb0 [spl]
[78438.981466] ? kthread_park+0x90/0x90
[78438.981900] ret_from_fork+0x35/0x40
 
I keep seeing these output to my proxmox console too:

Code:
78436.716999] zd0: p1
[78436.719978] zd16: p1 p2 p3
[78436.724023] zd32: p1 p2
[78436.732161] zd48: p1 p2 p3 p4 < p5 p6 >
[78436.738724] zd64: p1 p3

Do I need to be concerned? Can't find anything else on Google about this apart from this post!
 
Hey,

I have some errors on one of our zfs pools. I cannot perform any action that is related to ZFS operations like pool scrub, snapshots,

2 or 3 pve-kernel ago, root has loosed his direct access on zpool / zfs commands. Have you tried with an account with user right granted by sudo ?
If not, try to see if you access zpool/zfs commands

Best regards,