Proxmox 7.0-11, encrypted zfs, kernel panic

sergerdn

New Member
Sep 22, 2021
4
1
3
42
I recently hit the bug in ZFS with password protected aes-256-gcm encryption. It happened on fresh installed system with ECC RAM while I copying a big file from one VM to another.

Code:
kernel:[72353.044289] VERIFY3(c < SPA_MAXBLOCKSIZE >> SPA_MINBLOCKSHIFT) failed (36028797018963967 < 32768)

Sep 22 07:39:36 pve4 kernel: [72353.044340] PANIC at zio.c:341:zio_data_buf_alloc()
Sep 22 07:39:36 pve4 kernel: [72353.044361] Showing stack for process 1640884
Sep 22 07:39:36 pve4 kernel: [72353.044381] CPU: 11 PID: 1640884 Comm: zvol Tainted: P           O      5.11.22-4-pve #1
Sep 22 07:39:36 pve4 kernel: [72353.044407] Hardware name: Hetzner /B565D4-V1L, BIOS L0.20 06/18/2021
Sep 22 07:39:36 pve4 kernel: [72353.044424] Call Trace:
Sep 22 07:39:36 pve4 kernel: [72353.045000]  dump_stack+0x70/0x8b
Sep 22 07:39:36 pve4 kernel: [72353.045473]  spl_dumpstack+0x29/0x2b [spl]
Sep 22 07:39:36 pve4 kernel: [72353.045942]  spl_panic+0xd4/0xfc [spl]
Sep 22 07:39:36 pve4 kernel: [72353.046384]  ? spl_kmem_cache_alloc+0x79/0x790 [spl]
Sep 22 07:39:36 pve4 kernel: [72353.046844]  ? kmem_cache_alloc+0xf1/0x200
Sep 22 07:39:36 pve4 kernel: [72353.047272]  ? spl_kmem_cache_alloc+0x9c/0x790 [spl]
Sep 22 07:39:36 pve4 kernel: [72353.047669]  ? _cond_resched+0x1a/0x50
Sep 22 07:39:36 pve4 kernel: [72353.048070]  ? mutex_lock+0x13/0x40
Sep 22 07:39:36 pve4 kernel: [72353.048459]  ? aggsum_add+0x187/0x1a0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.048890]  ? kmem_cache_alloc+0xf1/0x200
Sep 22 07:39:36 pve4 kernel: [72353.049263]  ? _cond_resched+0x1a/0x50
Sep 22 07:39:36 pve4 kernel: [72353.049633]  zio_data_buf_alloc+0x5e/0x60 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.050035]  abd_alloc_linear+0x91/0xd0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.050421]  abd_alloc+0x95/0xd0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.050799]  arc_hdr_alloc_abd+0xe6/0x200 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.051172]  arc_hdr_alloc+0xfd/0x170 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.051595]  arc_alloc_buf+0x4a/0x150 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.051997]  dbuf_alloc_arcbuf_from_arcbuf+0xd4/0x160 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.052444]  ? mutex_lock+0x13/0x40
Sep 22 07:39:36 pve4 kernel: [72353.052800]  ? _cond_resched+0x1a/0x50
Sep 22 07:39:36 pve4 kernel: [72353.053152]  dbuf_hold_copy.constprop.0+0x36/0xb0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.053616]  dbuf_hold_impl+0x480/0x680 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.054089]  dbuf_hold+0x33/0x60 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.054536]  dmu_buf_hold_array_by_dnode+0xeb/0x580 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.055007]  dmu_read_impl+0xa8/0x1c0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.055433]  dmu_read_by_dnode+0xe/0x10 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.055884]  zvol_get_data+0xa2/0x1a0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.056350]  zil_commit_impl+0xaa0/0xfa0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.056763]  zil_commit+0x40/0x60 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.057167]  zvol_write+0x325/0x4c0 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.057573]  ? finish_task_switch.isra.0+0x16a/0x290
Sep 22 07:39:36 pve4 kernel: [72353.057931]  taskq_thread+0x2b2/0x4f0 [spl]
Sep 22 07:39:36 pve4 kernel: [72353.058340]  ? wake_up_q+0xa0/0xa0
Sep 22 07:39:36 pve4 kernel: [72353.058720]  ? zvol_discard+0x330/0x330 [zfs]
Sep 22 07:39:36 pve4 kernel: [72353.059161]  ? taskq_thread_spawn+0x60/0x60 [spl]
Sep 22 07:39:36 pve4 kernel: [72353.059533]  kthread+0x12b/0x150
Sep 22 07:39:36 pve4 kernel: [72353.059869]  ? set_kthread_struct+0x50/0x50
Sep 22 07:39:36 pve4 kernel: [72353.060242]  ret_from_fork+0x22/0x30

Code:
# zfs version
zfs-2.0.5-pve1
zfs-kmod-2.0.5-pve1

Code:
# zfs get encryption
NAME                                PROPERTY    VALUE        SOURCE
rpool                               encryption  off          default
rpool/ROOT                          encryption  off          default
rpool/ROOT/pve-1                    encryption  off          default
rpool/data                          encryption  off          default
rpool/data/encrypted                encryption  aes-256-gcm  -
rpool/data/encrypted/vm-100-disk-0  encryption  aes-256-gcm  -
rpool/data/encrypted/vm-101-disk-0  encryption  aes-256-gcm  -
rpool/data/encrypted/vm-102-disk-0  encryption  aes-256-gcm  -

Code:
# zfs list
NAME                                 USED  AVAIL     REFER  MOUNTPOINT
rpool                               48.1G  1.63T      104K  /rpool
rpool/ROOT                          8.10G  1.63T       96K  /rpool/ROOT
rpool/ROOT/pve-1                    8.10G  1.63T     8.10G  /
rpool/data                          40.0G  1.63T       96K  /rpool/data
rpool/data/encrypted                40.0G  1.63T      192K  /rpool/data/encrypted
rpool/data/encrypted/vm-100-disk-0  3.78G  1.63T     3.78G  -
rpool/data/encrypted/vm-101-disk-0  35.8G  1.63T     35.8G  -
rpool/data/encrypted/vm-102-disk-0   414M  1.63T      414M  -


Code:
# zpool list rpool -v
NAME                                                  SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
rpool                                                1.73T  48.1G  1.69T        -         -     0%     2%  1.00x    ONLINE  -
  mirror                                             1.73T  48.1G  1.69T        -         -     0%  2.70%      -    ONLINE
    nvme-eui.34333930523011810025384300000001-part3      -      -      -        -         -      -      -      -    ONLINE
    nvme-eui.34333930523011790025384300000001-part3      -      -      -        -         -      -      -      -    ONLINE
Code:
# zpool get all
NAME   PROPERTY                       VALUE                          SOURCE
rpool  size                           1.73T                          -
rpool  capacity                       2%                             -
rpool  altroot                        -                              default
rpool  health                         ONLINE                         -
rpool  guid                           9087423920877254020            -
rpool  version                        -                              default
rpool  bootfs                         rpool/ROOT/pve-1               local
rpool  delegation                     on                             default
rpool  autoreplace                    off                            default
rpool  cachefile                      -                              default
rpool  failmode                       wait                           default
rpool  listsnapshots                  off                            default
rpool  autoexpand                     off                            default
rpool  dedupratio                     1.00x                          -
rpool  free                           1.69T                          -
rpool  allocated                      48.1G                          -
rpool  readonly                       off                            -
rpool  ashift                         12                             local
rpool  comment                        -                              default
rpool  expandsize                     -                              -
rpool  freeing                        0                              -
rpool  fragmentation                  0%                             -
rpool  leaked                         0                              -
rpool  multihost                      off                            default
rpool  checkpoint                     -                              -
rpool  load_guid                      13633783822077360766           -
rpool  autotrim                       off                            default
rpool  feature@async_destroy          enabled                        local
rpool  feature@empty_bpobj            active                         local
rpool  feature@lz4_compress           active                         local
rpool  feature@multi_vdev_crash_dump  enabled                        local
rpool  feature@spacemap_histogram     active                         local
rpool  feature@enabled_txg            active                         local
rpool  feature@hole_birth             active                         local
rpool  feature@extensible_dataset     active                         local
rpool  feature@embedded_data          active                         local
rpool  feature@bookmarks              enabled                        local
rpool  feature@filesystem_limits      enabled                        local
rpool  feature@large_blocks           enabled                        local
rpool  feature@large_dnode            enabled                        local
rpool  feature@sha512                 enabled                        local
rpool  feature@skein                  enabled                        local
rpool  feature@edonr                  enabled                        local
rpool  feature@userobj_accounting     active                         local
rpool  feature@encryption             active                         local
rpool  feature@project_quota          active                         local
rpool  feature@device_removal         enabled                        local
rpool  feature@obsolete_counts        enabled                        local
rpool  feature@zpool_checkpoint       enabled                        local
rpool  feature@spacemap_v2            active                         local
rpool  feature@allocation_classes     enabled                        local
rpool  feature@resilver_defer         enabled                        local
rpool  feature@bookmark_v2            enabled                        local
rpool  feature@redaction_bookmarks    enabled                        local
rpool  feature@redacted_datasets      enabled                        local
rpool  feature@bookmark_written       enabled                        local
rpool  feature@log_spacemap           active                         local
rpool  feature@livelist               enabled                        local
rpool  feature@device_rebuild         enabled                        local
rpool  feature@zstd_compress          enabled                        local

Please have a look:
- https://github.com/openzfs/zfs/issues/11531
- https://github.com/openzfs/zfs/issues/12494
- https://github.com/openzfs/zfs/pull/12346


I'm pretty new with ZFS and have a few questions:
- Is there any way to get the last patch from zfs upstream to Proxmox?
- Maybe anyone will suggest me how I can avoid this bug by changing encryption type or something else?

I asked because I got it twice in a just few days.
 
Last edited:
  • Like
Reactions: Dunuin
We're planning on including the fix with one of the next kernel-releases - I'll update the thread here once a patched version of the 5.11 kernel series is available

Thanks for the report!
 
We're planning on including the fix with one of the next kernel-releases - I'll update the thread here once a patched version of the 5.11 kernel series is available

Thanks for the report!
Thank you for your reply. Is there any public roadmap?
 
Is there any public roadmap?
not on the level of when a particular patch gets included and merged, packaged, tested, and released to a particular repository (too many moving parts which would render most time-estimations wrong)

If you want to know what's happening in the sources - the pve-devel mailing list (https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel) is the place to follow discussion, and https://git.proxmox.com/ is the authoritative source of the repositories.

as for this particular issue - we're working on it (mostly deciding if we simply cherry-pick the patch which fixes it, or if we see if there's a ZFS 2.0. release containing coming the next days.

I hope this explains it!
 
  • Like
Reactions: Dunuin
not on the level of when a particular patch gets included and merged, packaged, tested, and released to a particular repository (too many moving parts which would render most time-estimations wrong)

If you want to know what's happening in the sources - the pve-devel mailing list (https://lists.proxmox.com/cgi-bin/mailman/listinfo/pve-devel) is the place to follow discussion, and https://git.proxmox.com/ is the authoritative source of the repositories.

as for this particular issue - we're working on it (mostly deciding if we simply cherry-pick the patch which fixes it, or if we see if there's a ZFS 2.0. release containing coming the next days.

I hope this explains it!
Thank you for detailed explanation. I much appreciate it. Please have a look https://github.com/openzfs/zfs/releases/tag/zfs-2.0.6
 
Last edited:
Hello. I just noticed about new version of pve-kernel:

Code:
Changelog: pve-kernel-5.11.22-5-pve

pve-kernel (5.11.22-10) bullseye; urgency=medium
* update sources to Ubuntu-5.11.0-38.42
* update ZFS to 2.0.6   <<<<< THIS IS IMPORTANT line for me
* bump ABI to 5.11.22-5
* fix #3558: backport "bnx2x: Fix enabling network interfaces without VFs"
-- Proxmox Support Team <support@proxmox.com>  Tue, 28 Sep 2021 08:15:41 +0200

Did you fix this bug?
 
Last edited:
Hello. I just noticed about new version pve-kernel:
yes - the issue should be resolved by the changes included there (at least one other user who reported this in our enterprise support seems to not have run into this again)

I hope this helps!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!