Hi,
this seems not to be an Proxmox issue but an ZFS issue. Anyway I think it might be worth to know for Proxmox users.
Issue:
The SSD which was used as ZIL/L2ARC device for the boot/VM mirror crashed two days ago. After that I was not able to boot the system any more (boot stopped at grub), so I wanted to repair the grub again. I booted up the PVE install iso in debug mode to get the console.
As I tried to import the pool I get a kernel panic (also see screenshot):
A tried many ways to import the pool again (PVE ISO in Debug Mode, ZFSGuru ISO, Installed clean PVE on separate disk) but I was not able to import my pool again. It always complained about a missing device (ZIL/LOG).
Then I tried to import the pool read only - which I could have tried much earlier but you know sometimes you can't see the wood for the trees - and voila I could import the pool and at least copy my VMs and configuration to a safe place.
I do not know if there is a way of importing/repairing a zpool again where the ZIL/LOG device has crashed. I think a failing ZIL/LOG device *SHOULD NOT* be a problem for a zpool. Perhaps a real ZFS expert does know? Or it is a bug in ZFS?
Anyway, I did a fresh install of Proxmox and restored the config/VMs from backup so I am back on the road again.
this seems not to be an Proxmox issue but an ZFS issue. Anyway I think it might be worth to know for Proxmox users.
Issue:
The SSD which was used as ZIL/L2ARC device for the boot/VM mirror crashed two days ago. After that I was not able to boot the system any more (boot stopped at grub), so I wanted to repair the grub again. I booted up the PVE install iso in debug mode to get the console.
As I tried to import the pool I get a kernel panic (also see screenshot):
Code:
zfs import -mf rpool
Feb 14 11:20:56 pve kernel: PANIC: zfs: allocating allocated segment(offset=109738323968 size=131072)
Feb 14 11:20:56 pve kernel: Showing stack for process 7320
Feb 14 11:20:56 pve kernel: CPU: 1 PID: 7320 Comm: z_wr_iss Tainted: P O 4.2.8-1-pve #1
Feb 14 11:20:56 pve kernel: Hardware name: MSI MS-7798/B75MA-P45 (MS-7798), BIOS V1.9 09/30/2013
Feb 14 11:20:56 pve kernel: 0000000000000000 00000000b4e48d13 ffff8800cf707848 ffffffff81802a68
Feb 14 11:20:56 pve kernel: 0000000000000000 0000000000000003 ffff8800cf707858 ffffffffc03bfd24
Feb 14 11:20:56 pve kernel: ffff8800cf707988 ffffffffc03bfe8c 00004210ffffffff 6c6c61203a73667a
Feb 14 11:20:56 pve kernel: Call Trace:
Feb 14 11:20:56 pve kernel: [<ffffffff81802a68>] dump_stack+0x45/0x57
Feb 14 11:20:56 pve kernel: [<ffffffffc03bfd24>] spl_dumpstack+0x44/0x50 [spl]
Feb 14 11:20:56 pve kernel: [<ffffffffc03bfe8c>] vcmn_err+0x6c/0x110 [spl]
Feb 14 11:20:56 pve kernel: [<ffffffff811dd496>] ? __slab_free+0xb6/0x2d0
Feb 14 11:20:56 pve kernel: [<ffffffff811de4df>] ? kmem_cache_alloc+0x18f/0x200
Feb 14 11:20:56 pve kernel: [<ffffffffc03bc592>] ? spl_kmem_cache_alloc+0x72/0x7f0 [spl]
Feb 14 11:20:56 pve kernel: [<ffffffffc023f15b>] ? avl_find+0x5b/0xa0 [zavl]
Feb 14 11:20:56 pve kernel: [<ffffffffc08256b1>] zfs_panic_recover+0x61/0x80 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc080d61d>] range_tree_add+0x2dd/0x2f0 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc07d6525>] ? dmu_read+0x135/0x190 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc08276c6>] space_map_load+0x3a6/0x560 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc080d8f2>] ? range_tree_remove+0x2c2/0x2f0 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc0809ed6>] metaslab_load+0x36/0xd0 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc080a0f9>] metaslab_activate+0x89/0xb0 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc080b99d>] metaslab_alloc+0x59d/0xbb0 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffff810acb9c>] ? __enqueue_entity+0x6c/0x70
Feb 14 11:20:56 pve kernel: [<ffffffff811de4df>] ? kmem_cache_alloc+0x18f/0x200
Feb 14 11:20:56 pve kernel: [<ffffffffc0874127>] zio_dva_allocate+0x97/0x410 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc03bbd6d>] ? spl_kmem_cache_free+0x14d/0x1e0 [spl]
Feb 14 11:20:56 pve kernel: [<ffffffffc086fdca>] ? zio_buf_free+0x5a/0x60 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc03bd22c>] ? taskq_member+0x5c/0x70 [spl]
Feb 14 11:20:56 pve kernel: [<ffffffffc0871a2e>] zio_execute+0xde/0x190 [zfs]
Feb 14 11:20:56 pve kernel: [<ffffffffc03bdfe2>] taskq_thread+0x262/0x470 [spl]
Feb 14 11:20:56 pve kernel: [<ffffffff810a6650>] ? wake_up_q+0x70/0x70
Feb 14 11:20:56 pve kernel: [<ffffffffc03bdd80>] ? taskq_cancel_id+0x140/0x140 [spl]
Feb 14 11:20:56 pve kernel: [<ffffffff8109b1fa>] kthread+0xea/0x100
Feb 14 11:20:56 pve kernel: [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
Feb 14 11:20:56 pve kernel: [<ffffffff81809e5f>] ret_from_fork+0x3f/0x70
Feb 14 11:20:56 pve kernel: [<ffffffff8109b110>] ? kthread_create_on_node+0x1f0/0x1f0
A tried many ways to import the pool again (PVE ISO in Debug Mode, ZFSGuru ISO, Installed clean PVE on separate disk) but I was not able to import my pool again. It always complained about a missing device (ZIL/LOG).
Then I tried to import the pool read only - which I could have tried much earlier but you know sometimes you can't see the wood for the trees - and voila I could import the pool and at least copy my VMs and configuration to a safe place.
Code:
zpool import -m -R /mnt/pool -o rdonly=on rpool
I do not know if there is a way of importing/repairing a zpool again where the ZIL/LOG device has crashed. I think a failing ZIL/LOG device *SHOULD NOT* be a problem for a zpool. Perhaps a real ZFS expert does know? Or it is a bug in ZFS?
Anyway, I did a fresh install of Proxmox and restored the config/VMs from backup so I am back on the road again.
Attachments
Last edited: