[SOLVED] ZFS boot mirror: ZIL/L2ARC device crashed

nasenmann72

Active Member
Dec 9, 2008
69
1
28
Germany, Saarland
Hi,

this seems not to be an Proxmox issue but an ZFS issue. Anyway I think it might be worth to know for Proxmox users.

Issue:
The SSD which was used as ZIL/L2ARC device for the boot/VM mirror crashed two days ago. After that I was not able to boot the system any more (boot stopped at grub), so I wanted to repair the grub again. I booted up the PVE install iso in debug mode to get the console.

As I tried to import the pool I get a kernel panic (also see screenshot):
Code:
zfs import -mf rpool
Feb 14 11:20:56 pve kernel: PANIC: zfs: allocating allocated segment(offset=109738323968 size=131072)
Feb 14 11:20:56 pve kernel: Showing stack for process 7320
Feb 14 11:20:56 pve kernel: CPU: 1 PID: 7320 Comm: z_wr_iss Tainted: P           O    4.2.8-1-pve #1
Feb 14 11:20:56 pve kernel: Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
Feb 14 11:20:56 pve kernel:  0000000000000000 00000000b4e48d13 ffff8800cf707848 ffffffff81802a68
Feb 14 11:20:56 pve kernel:  0000000000000000 0000000000000003 ffff8800cf707858 ffffffffc03bfd24
Feb 14 11:20:56 pve kernel:  ffff8800cf707988 ffffffffc03bfe8c 00004210ffffffff 6c6c61203a73667a
Feb 14 11:20:56 pve kernel: Call Trace:
Feb 14 11:20:56 pve kernel:  [<ffffffff81802a68>] dump_stack+0x45/0x57
Feb 14 11:20:56 pve kernel:  [<ffffffffc03bfd24>] spl_dumpstack+0x44/0x50 [spl]
Feb 14 11:20:56 pve kernel:  [<ffffffffc03bfe8c>] vcmn_err+0x6c/0x110 [spl]
Feb 14 11:20:56 pve kernel:  [<ffffffff811dd496>] ? __slab_free+0xb6/0x2d0
Feb 14 11:20:56 pve kernel:  [<ffffffff811de4df>] ? kmem_cache_alloc+0x18f/0x200
Feb 14 11:20:56 pve kernel:  [<ffffffffc03bc592>] ? spl_kmem_cache_alloc+0x72/0x7f0 [spl]
Feb 14 11:20:56 pve kernel:  [<ffffffffc023f15b>] ? avl_find+0x5b/0xa0 [zavl]
Feb 14 11:20:56 pve kernel:  [<ffffffffc08256b1>] zfs_panic_recover+0x61/0x80 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc080d61d>] range_tree_add+0x2dd/0x2f0 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc07d6525>] ? dmu_read+0x135/0x190 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc08276c6>] space_map_load+0x3a6/0x560 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc080d8f2>] ? range_tree_remove+0x2c2/0x2f0 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc0809ed6>] metaslab_load+0x36/0xd0 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc080a0f9>] metaslab_activate+0x89/0xb0 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc080b99d>] metaslab_alloc+0x59d/0xbb0 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffff810acb9c>] ? __enqueue_entity+0x6c/0x70
Feb 14 11:20:56 pve kernel:  [<ffffffff811de4df>] ? kmem_cache_alloc+0x18f/0x200
Feb 14 11:20:56 pve kernel:  [<ffffffffc0874127>] zio_dva_allocate+0x97/0x410 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc03bbd6d>] ? spl_kmem_cache_free+0x14d/0x1e0 [spl]
Feb 14 11:20:56 pve kernel:  [<ffffffffc086fdca>] ? zio_buf_free+0x5a/0x60 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc03bd22c>] ? taskq_member+0x5c/0x70 [spl]
Feb 14 11:20:56 pve kernel:  [<ffffffffc0871a2e>] zio_execute+0xde/0x190 [zfs]
Feb 14 11:20:56 pve kernel:  [<ffffffffc03bdfe2>] taskq_thread+0x262/0x470 [spl]
Feb 14 11:20:56 pve kernel:  [<ffffffff810a6650>] ? wake_up_q+0x70/0x70
Feb 14 11:20:56 pve kernel:  [<ffffffffc03bdd80>] ? taskq_cancel_id+0x140/0x140 [spl]
Feb 14 11:20:56 pve kernel:  [<ffffffff8109b1fa>] kthread+0xea/0x100
Feb 14 11:20:56 pve kernel:  [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
Feb 14 11:20:56 pve kernel:  [<ffffffff81809e5f>] ret_from_fork+0x3f/0x70
Feb 14 11:20:56 pve kernel:  [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
A tried many ways to import the pool again (PVE ISO in Debug Mode, ZFSGuru ISO, Installed clean PVE on separate disk) but I was not able to import my pool again. It always complained about a missing device (ZIL/LOG).

Then I tried to import the pool read only - which I could have tried much earlier but you know sometimes you can't see the wood for the trees - and voila I could import the pool and at least copy my VMs and configuration to a safe place.
Code:
zpool import -m  -R /mnt/pool -o rdonly=on rpool
I do not know if there is a way of importing/repairing a zpool again where the ZIL/LOG device has crashed. I think a failing ZIL/LOG device *SHOULD NOT* be a problem for a zpool. Perhaps a real ZFS expert does know? Or it is a bug in ZFS?

Anyway, I did a fresh install of Proxmox and restored the config/VMs from backup so I am back on the road again.
 

Attachments

Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE and Proxmox Mail Gateway. We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!