ZFS error on pool

Aug 20, 2018
20
0
6
Hello,
I have some errors on one of our zfs pools. I cannot perform any action that is related to ZFS operations like pool scrub, snapshots, ...

Maybe anyone is able to help ?

The node is one of our old nodes with consumer hardware (Asus H170M-PLUS, Core i7 and no ecc ram). ZFS is running on Samsung SSD EVO.
Zpool status looks good so far. Maybe it is somehow related to the hardware. But please have a look.

proxmox-ve: 6.2-1 (running kernel: 5.4.60-1-pve)
zfsutils-linux: 0.8.4-pve1

zpool status
pool: zpool1
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(5) for details.
scan: scrub repaired 0B in 0 days 00:51:30 with 0 errors on Sat Sep 12 23:15:31 2020
config:

NAME STATE READ WRITE CKSUM
zpool1 ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
7f9ce18d-634e-4315-8504-ba65ed8f6327 ONLINE 0 0 0
b673d8e7-7995-457e-a7a3-bf30aefb89a6 ONLINE 0 0 0

errors: No known data errors


Output of dmesg:

[ 821.067625] ata5.00: Enabling discard_zeroes_data
[ 850.836877] ata6.00: Enabling discard_zeroes_data
[ 1005.452750] zd0: p1
[ 1005.455945] zd16: p1 p2 p3
[ 1005.460989] zd32: p1 p2
[ 1005.468935] zd48: p1 p2 p3 p4 < p5 p6 >
[ 1005.474918] zd64: p1 p3
[ 1005.708613] PANIC: blkptr at 00000000baa0bfe2 has invalid TYPE 108
[ 1005.708615] PANIC: blkptr at 00000000c30871fb DVA 0 has invalid VDEV 131072
[ 1005.708616] Showing stack for process 11383
[ 1005.708617] Showing stack for process 11320
[ 1005.708618] CPU: 1 PID: 11383 Comm: z_wr_int Tainted: P O 5.4.0-42-generic #46-Ubuntu
[ 1005.708618] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[ 1005.708619] Call Trace:
[ 1005.708624] dump_stack+0x6d/0x9a
[ 1005.708631] spl_dumpstack+0x29/0x2b [spl]
[ 1005.708636] vcmn_err.cold+0x60/0x99 [spl]
[ 1005.708638] ? check_preempt_curr+0x20/0x90
[ 1005.708639] ? ttwu_do_wakeup+0x1e/0x150
[ 1005.708641] ? ttwu_do_activate+0x5b/0x70
[ 1005.708642] ? try_to_wake_up+0x224/0x6a0
[ 1005.708698] zfs_panic_recover+0x6f/0x90 [zfs]
[ 1005.708744] zfs_blkptr_verify+0x380/0x440 [zfs]
[ 1005.708786] zio_free+0x22/0xf0 [zfs]
[ 1005.708825] dsl_free+0x11/0x20 [zfs]
[ 1005.708860] dsl_dataset_block_kill+0x2ba/0x480 [zfs]
[ 1005.708891] dbuf_write_done+0x1b2/0x1e0 [zfs]
[ 1005.708921] arc_write_done+0x235/0x440 [zfs]
[ 1005.708964] zio_done+0x3aa/0xe20 [zfs]
[ 1005.709006] zio_execute+0x91/0xe0 [zfs]
[ 1005.709012] taskq_thread+0x245/0x430 [spl]
[ 1005.709014] ? __switch_to_asm+0x40/0x70
[ 1005.709016] ? wake_up_q+0x70/0x70
[ 1005.709059] ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[ 1005.709061] kthread+0x104/0x140
[ 1005.709065] ? task_done+0x90/0x90 [spl]
[ 1005.709066] ? kthread_park+0x90/0x90
[ 1005.709067] ret_from_fork+0x35/0x40
[ 1005.709069] CPU: 5 PID: 11320 Comm: z_wr_int Tainted: P O 5.4.0-42-generic #46-Ubuntu
[ 1005.709070] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[ 1005.709070] Call Trace:
[ 1005.709073] dump_stack+0x6d/0x9a
[ 1005.709079] spl_dumpstack+0x29/0x2b [spl]
[ 1005.709085] vcmn_err.cold+0x60/0x99 [spl]
[ 1005.709086] ? check_preempt_curr+0x7a/0x90
[ 1005.709088] ? ttwu_do_wakeup+0x1e/0x150
[ 1005.709089] ? ttwu_do_activate+0x5b/0x70
[ 1005.709091] ? try_to_wake_up+0x224/0x6a0
[ 1005.709093] ? dm_make_request+0x58/0xb0
[ 1005.709137] zfs_panic_recover+0x6f/0x90 [zfs]
[ 1005.709185] zfs_blkptr_verify+0x3ab/0x440 [zfs]
[ 1005.709236] zio_free+0x22/0xf0 [zfs]
[ 1005.709288] dsl_free+0x11/0x20 [zfs]
[ 1005.709336] dsl_dataset_block_kill+0x2ba/0x480 [zfs]
[ 1005.709380] dbuf_write_done+0x1b2/0x1e0 [zfs]
[ 1005.709416] arc_write_done+0x235/0x440 [zfs]
[ 1005.709463] zio_done+0x3aa/0xe20 [zfs]
[ 1005.709510] zio_execute+0x91/0xe0 [zfs]
[ 1005.709514] taskq_thread+0x245/0x430 [spl]
[ 1005.709516] ? wake_up_q+0x70/0x70
[ 1005.709567] ? zio_execute_stack_check.constprop.0+0x10/0x10 [zfs]
[ 1005.709568] kthread+0x104/0x140
[ 1005.709572] ? task_done+0x90/0x90 [spl]
[ 1005.709573] ? kthread_park+0x90/0x90
[ 1005.709574] ret_from_fork+0x35/0x40
 
Last edited:
Youre right, this output is from a zfs enabled rescue system.
I have bootet up again the server and did an import of the pool (this time with pve kernel). Here is the outcome:

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.927457] PANIC: blkptr at 000000000e8ad38c DVA 0 has invalid VDEV 131072

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.927458] PANIC: blkptr at 000000005c4cc310 has invalid TYPE 108

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.928361] PANIC: blkptr at 0000000005a25a22 DVA 0 has invalid VDEV 65536

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.928370] PANIC: blkptr at 00000000123a078d DVA 0 has invalid VDEV 4194304

Message from syslogd@srvhost2 at Oct 2 11:34:55 ...
kernel:[78438.928524] PANIC: blkptr at 00000000aaecb7ff DVA 0 has invalid VDEV 11534336


dmesg:

[78358.755191] ata5.00: Enabling discard_zeroes_data
[78370.136169] ata6.00: Enabling discard_zeroes_data
[78436.716999] zd0: p1
[78436.719978] zd16: p1 p2 p3
[78436.724023] zd32: p1 p2
[78436.732161] zd48: p1 p2 p3 p4 < p5 p6 >
[78436.738724] zd64: p1 p3
[78438.927457] PANIC: blkptr at 000000000e8ad38c DVA 0 has invalid VDEV 131072
[78438.927458] PANIC: blkptr at 000000005c4cc310 has invalid TYPE 108
[78438.927459] Showing stack for process 28017
[78438.927484] Showing stack for process 28022
[78438.927503] CPU: 1 PID: 28017 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.927550] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.927575] Call Trace:
[78438.927588] dump_stack+0x6d/0x9a
[78438.927603] spl_dumpstack+0x29/0x2b [spl]
[78438.927619] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.927673] ? range_tree_add+0x11/0x20 [zfs]
[78438.927689] ? _cond_resched+0x19/0x30
[78438.927701] ? mutex_lock+0x12/0x30
[78438.927734] ? dmu_buf_will_dirty_impl+0x95/0x130 [zfs]
[78438.927751] ? try_to_wake_up+0x67/0x650
[78438.927779] ? _cond_resched+0x19/0x30
[78438.927819] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.927863] zfs_blkptr_verify+0x265/0x400 [zfs]
[78438.927878] ? __mutex_lock_slowpath+0x13/0x20
[78438.927921] zio_free+0x21/0xe0 [zfs]
[78438.927958] dsl_free+0x11/0x20 [zfs]
[78438.927992] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.928026] dbuf_write_done+0x171/0x210 [zfs]
[78438.928057] arc_write_done+0x8f/0x410 [zfs]
[78438.928099] zio_done+0x440/0x1030 [zfs]
[78438.928141] zio_execute+0x99/0xf0 [zfs]
[78438.928157] taskq_thread+0x2ec/0x4d0 [spl]
[78438.928171] ? wake_up_q+0x80/0x80
[78438.928211] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.928231] kthread+0x120/0x140
[78438.928244] ? task_done+0xb0/0xb0 [spl]
[78438.928257] ? kthread_park+0x90/0x90
[78438.928270] ret_from_fork+0x35/0x40
[78438.928283] CPU: 5 PID: 28022 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.928311] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.928337] Call Trace:
[78438.928345] dump_stack+0x6d/0x9a
[78438.928356] spl_dumpstack+0x29/0x2b [spl]
[78438.928361] PANIC: blkptr at 0000000005a25a22 DVA 0 has invalid VDEV 65536
[78438.928370] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.928370] PANIC: blkptr at 00000000123a078d DVA 0 has invalid VDEV 4194304
[78438.928371] Showing stack for process 28016
[78438.928397] Showing stack for process 28019
[78438.928399] ? check_preempt_curr+0x68/0x90
[78438.928524] PANIC: blkptr at 00000000aaecb7ff DVA 0 has invalid VDEV 11534336
[78438.929078] ? ttwu_do_wakeup+0x1e/0x150
[78438.929819] Showing stack for process 28021
[78438.930403] ? ttwu_do_activate+0x5a/0x70
[78438.930404] ? try_to_wake_up+0x223/0x650
[78438.930438] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.930469] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.934550] zio_free+0x21/0xe0 [zfs]
[78438.935111] dsl_free+0x11/0x20 [zfs]
[78438.935630] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.936142] dbuf_write_done+0x171/0x210 [zfs]
[78438.936641] arc_write_done+0x8f/0x410 [zfs]
[78438.937151] zio_done+0x440/0x1030 [zfs]
[78438.937676] zio_execute+0x99/0xf0 [zfs]
[78438.938161] taskq_thread+0x2ec/0x4d0 [spl]
[78438.938649] ? wake_up_q+0x80/0x80
[78438.939166] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.939660] kthread+0x120/0x140
[78438.940155] ? task_done+0xb0/0xb0 [spl]
[78438.940663] ? kthread_park+0x90/0x90
[78438.941150] ret_from_fork+0x35/0x40
[78438.941636] CPU: 0 PID: 28016 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.942290] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.942797] Call Trace:
[78438.943298] dump_stack+0x6d/0x9a
[78438.943781] spl_dumpstack+0x29/0x2b [spl]
[78438.944254] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.944742] ? check_preempt_curr+0x79/0x90
[78438.945216] ? ttwu_do_wakeup+0x1e/0x150
[78438.945689] ? ttwu_do_activate+0x5a/0x70
[78438.946189] ? try_to_wake_up+0x223/0x650
[78438.946702] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.947214] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.947712] zio_free+0x21/0xe0 [zfs]
[78438.948201] dsl_free+0x11/0x20 [zfs]
[78438.948699] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.949182] dbuf_write_done+0x171/0x210 [zfs]
[78438.949666] arc_write_done+0x8f/0x410 [zfs]
[78438.950191] zio_done+0x440/0x1030 [zfs]
[78438.950705] zio_execute+0x99/0xf0 [zfs]
[78438.951164] taskq_thread+0x2ec/0x4d0 [spl]
[78438.951633] ? wake_up_q+0x80/0x80
[78438.952112] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.952591] kthread+0x120/0x140
[78438.953052] ? task_done+0xb0/0xb0 [spl]
[78438.953515] ? kthread_park+0x90/0x90
[78438.953977] ret_from_fork+0x35/0x40
[78438.954501] CPU: 3 PID: 28021 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.955205] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.955854] Call Trace:
[78438.956484] dump_stack+0x6d/0x9a
[78438.957133] spl_dumpstack+0x29/0x2b [spl]
[78438.957769] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.958464] ? check_preempt_curr+0x68/0x90
[78438.959082] ? ttwu_do_wakeup+0x1e/0x150
[78438.959689] ? ttwu_do_activate+0x5a/0x70
[78438.960310] ? try_to_wake_up+0x223/0x650
[78438.960945] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.961592] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.962252] zio_free+0x21/0xe0 [zfs]
[78438.962873] dsl_free+0x11/0x20 [zfs]
[78438.963468] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.964084] dbuf_write_done+0x171/0x210 [zfs]
[78438.964684] arc_write_done+0x8f/0x410 [zfs]
[78438.965302] zio_done+0x440/0x1030 [zfs]
[78438.965900] zio_execute+0x99/0xf0 [zfs]
[78438.966521] taskq_thread+0x2ec/0x4d0 [spl]
[78438.967081] ? wake_up_q+0x80/0x80
[78438.967662] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.968280] kthread+0x120/0x140
[78438.968845] ? task_done+0xb0/0xb0 [spl]
[78438.969418] ? kthread_park+0x90/0x90
[78438.969983] ret_from_fork+0x35/0x40
[78438.970593] CPU: 7 PID: 28019 Comm: z_wr_int Tainted: P O 5.4.60-1-pve #1
[78438.971144] Hardware name: System manufacturer System Product Name/H170M-PLUS, BIOS 0704 02/15/2016
[78438.971597] Call Trace:
[78438.972037] dump_stack+0x6d/0x9a
[78438.972492] spl_dumpstack+0x29/0x2b [spl]
[78438.972936] vcmn_err.cold.1+0x60/0x94 [spl]
[78438.973378] ? check_preempt_curr+0x79/0x90
[78438.973832] ? ttwu_do_wakeup+0x1e/0x150
[78438.974343] ? ttwu_do_activate+0x5a/0x70
[78438.974780] ? try_to_wake_up+0x223/0x650
[78438.975253] zfs_panic_recover+0x6f/0x90 [zfs]
[78438.975713] zfs_blkptr_verify+0x34f/0x400 [zfs]
[78438.976171] zio_free+0x21/0xe0 [zfs]
[78438.976631] dsl_free+0x11/0x20 [zfs]
[78438.977059] dsl_dataset_block_kill+0x292/0x460 [zfs]
[78438.977489] dbuf_write_done+0x171/0x210 [zfs]
[78438.977934] arc_write_done+0x8f/0x410 [zfs]
[78438.978429] zio_done+0x440/0x1030 [zfs]
[78438.978896] zio_execute+0x99/0xf0 [zfs]
[78438.979328] taskq_thread+0x2ec/0x4d0 [spl]
[78438.979745] ? wake_up_q+0x80/0x80
[78438.980184] ? zio_taskq_member.isra.12.constprop.17+0x70/0x70 [zfs]
[78438.980621] kthread+0x120/0x140
[78438.981045] ? task_done+0xb0/0xb0 [spl]
[78438.981466] ? kthread_park+0x90/0x90
[78438.981900] ret_from_fork+0x35/0x40
 
I keep seeing these output to my proxmox console too:

Code:
78436.716999] zd0: p1
[78436.719978] zd16: p1 p2 p3
[78436.724023] zd32: p1 p2
[78436.732161] zd48: p1 p2 p3 p4 < p5 p6 >
[78436.738724] zd64: p1 p3

Do I need to be concerned? Can't find anything else on Google about this apart from this post!
 
Hey,

I have some errors on one of our zfs pools. I cannot perform any action that is related to ZFS operations like pool scrub, snapshots,

2 or 3 pve-kernel ago, root has loosed his direct access on zpool / zfs commands. Have you tried with an account with user right granted by sudo ?
If not, try to see if you access zpool/zfs commands

Best regards,
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!