ZFS Bug / array-index-out-of-bounds

stuartbh

Active Member
Dec 2, 2019
112
9
38
59
ProxMox users, developers, et alia:

Follows hereupon an excerpt from my /var/log/kern.log of what appears to be a problem in the ZFS code. In researching the issue I found some information that seems related, but I am not fully sure yet (see link #1 below). Has anyone seen such behavior before where ProxMox is running on an IBM x3650 M3?


================================================================================
2023-08-28T18:09:26.977526-07:00 pve4-x3650-m3 kernel: [26946.138017] UBSAN: array-index-out-of-bounds in /build/proxmox-kernel-6.2-dH6W6b/proxmox-kernel-6.2-6.2.16/modules/pkg-zfs/module/zstd/lib/zstd.c:20390:24
2023-08-28T18:09:26.982889-07:00 pve4-x3650-m3 kernel: [26946.151942] index 36 is out of range for type 'U32 [36]'
2023-08-28T18:09:26.982905-07:00 pve4-x3650-m3 kernel: [26946.157303] CPU: 9 PID: 475005 Comm: z_wr_iss Tainted: P O 6.2.16-10-pve #1
2023-08-28T18:09:27.131393-07:00 pve4-x3650-m3 kernel: [26946.165703] Hardware name: IBM System x3650 M3 -[7945AC1]-/69Y4438, BIOS -[D6E164AUS-1.22]- 06/04/2018
2023-08-28T18:09:27.131421-07:00 pve4-x3650-m3 kernel: [26946.175058] Call Trace:
2023-08-28T18:09:27.131424-07:00 pve4-x3650-m3 kernel: [26946.177526] <TASK>
2023-08-28T18:09:27.131425-07:00 pve4-x3650-m3 kernel: [26946.179649] dump_stack_lvl+0x48/0x70
2023-08-28T18:09:27.131426-07:00 pve4-x3650-m3 kernel: [26946.183352] dump_stack+0x10/0x20
2023-08-28T18:09:27.131428-07:00 pve4-x3650-m3 kernel: [26946.186695] __ubsan_handle_out_of_bounds+0xc6/0x110
2023-08-28T18:09:27.131430-07:00 pve4-x3650-m3 kernel: [26946.191697] ZSTD_initStats_ultra+0x4b2f/0x4ba0 [zzstd]
2023-08-28T18:09:27.131431-07:00 pve4-x3650-m3 kernel: [26946.196993] zfs_ZSTD_compressBlock_btultra2+0x40ff/0x42b0 [zzstd]
2023-08-28T18:09:27.131432-07:00 pve4-x3650-m3 kernel: [26946.203232] ? zstd_mempool_alloc+0x242/0x2b0 [zzstd]
2023-08-28T18:09:27.131434-07:00 pve4-x3650-m3 kernel: [26946.208338] ? zstd_alloc+0x19/0x40 [zzstd]
2023-08-28T18:09:27.131435-07:00 pve4-x3650-m3 kernel: [26946.212577] ? ZSTD_resetCCtx_internal+0xee4/0x1290 [zzstd]
2023-08-28T18:09:27.131436-07:00 pve4-x3650-m3 kernel: [26946.218216] ZSTD_buildSeqStore+0x18b/0x310 [zzstd]
2023-08-28T18:09:27.131438-07:00 pve4-x3650-m3 kernel: [26946.223148] ZSTD_compressBlock_internal+0x34/0x3e0 [zzstd]
2023-08-28T18:09:27.131439-07:00 pve4-x3650-m3 kernel: [26946.228784] ZSTD_compressContinue_internal+0x433/0x6d0 [zzstd]
2023-08-28T18:09:27.131440-07:00 pve4-x3650-m3 kernel: [26946.234768] zfs_ZSTD_compressEnd+0x28/0x180 [zzstd]
2023-08-28T18:09:27.131441-07:00 pve4-x3650-m3 kernel: [26946.239789] zfs_ZSTD_compressStream2+0x1e6/0x860 [zzstd]
2023-08-28T18:09:27.131443-07:00 pve4-x3650-m3 kernel: [26946.245248] zfs_ZSTD_compress2+0x63/0xb0 [zzstd]
2023-08-28T18:09:27.131445-07:00 pve4-x3650-m3 kernel: [26946.250013] ? zfs_ZSTD_compress2+0x63/0xb0 [zzstd]
2023-08-28T18:09:27.131446-07:00 pve4-x3650-m3 kernel: [26946.254948] zfs_zstd_compress+0x103/0x1e0 [zzstd]
2023-08-28T18:09:27.131447-07:00 pve4-x3650-m3 kernel: [26946.259789] zio_compress_data+0xd3/0x130 [zfs]
2023-08-28T18:09:27.131447-07:00 pve4-x3650-m3 kernel: [26946.264694] zio_write_compress+0x59b/0xa40 [zfs]
2023-08-28T18:09:27.131449-07:00 pve4-x3650-m3 kernel: [26946.269652] zio_execute+0x97/0x170 [zfs]
2023-08-28T18:09:27.131450-07:00 pve4-x3650-m3 kernel: [26946.273931] taskq_thread+0x2af/0x4d0 [spl]
2023-08-28T18:09:27.131451-07:00 pve4-x3650-m3 kernel: [26946.278172] ? __pfx_default_wake_function+0x10/0x10
2023-08-28T18:09:27.131453-07:00 pve4-x3650-m3 kernel: [26946.283175] ? __pfx_zio_execute+0x10/0x10 [zfs]
2023-08-28T18:09:27.131454-07:00 pve4-x3650-m3 kernel: [26946.288148] ? __pfx_taskq_thread+0x10/0x10 [spl]
2023-08-28T18:09:27.131456-07:00 pve4-x3650-m3 kernel: [26946.292917] kthread+0xe9/0x110
2023-08-28T18:09:27.131457-07:00 pve4-x3650-m3 kernel: [26946.296090] ? __pfx_kthread+0x10/0x10
2023-08-28T18:09:27.131458-07:00 pve4-x3650-m3 kernel: [26946.299871] ret_from_fork+0x2c/0x50
2023-08-28T18:09:27.131459-07:00 pve4-x3650-m3 kernel: [26946.303479] </TASK>
2023-08-28T18:09:27.131460-07:00 pve4-x3650-m3 kernel: [26946.305721] ================================================================================

Link #1: https://bugzilla.kernel.org/show_bug.cgi?id=215943

Thanks in advance!

Stuart
 
Last edited:
  • Like
Reactions: Stoiko Ivanov
Fiona, et alia:

Yes, I’ve hit this issue multiple times.

When might the patch be hitting ProxMox?

Stuart
 
Then I'm surprised nobody else seems to have.

I'm afraid I can't give you any time window for that. Let's see what upstream ZFS says.

I understand.

Thank you for your research and for opening an issue with ZFS.


Stuart
 
Fiona, et alia:

I took notice that no one has actioned or commented on the bug you raised with upstream ZFS. Interesting, I’ve always seen ZFS as a very active project.

Stuart
 
Fiona, et alia:

I presume at this juncture the current version of ProxMox has a newer version of ZFS and as such is inclusive of the fix for the aforementioned issue?

Stuart
 
what's the effect of this bug? does it cause data corruption?
I'm pretty sure then it would've been addressed a long time ago. Disclaimer: I have no idea about zstd, but telling from the function names, it might just cause a non-ideal cost calculation during compression.
 
  • Like
Reactions: RolandK
FYI, I just got this message but I'm not sure if it's not a side effect of a more global crash.
The full log is attached.

Code:
2024-02-22T14:08:55.044549+01:00 heimdall kernel: [5447587.192484] show_signal_msg: 13 callbacks suppressed
2024-02-22T14:08:55.044566+01:00 heimdall kernel: [5447587.192489] CPU 6/KVM[897247]: segfault at 9000000 ip 0000000009000000 sp 00007fb2089aff88 error 14 in qemu-system-x86_64[558a46e1e000+316000] likely on CPU 18 (core 2, socket 0)
2024-02-22T14:08:55.044567+01:00 heimdall kernel: [5447587.192510] Code: Unable to access opcode bytes at 0x8ffffd6.
2024-02-22T14:08:55.046982+01:00 heimdall systemd[1]: Looping too fast. Throttling execution a little.
2024-02-22T14:08:55.060538+01:00 heimdall kernel: [5447587.208708] traps: worker[2862804] trap invalid opcode ip:5556c6a13aca sp:7fe2d9ff61f0 error:0 in qemu-system-x86_64[5556c6447000+612000]
2024-02-22T14:08:55.072693+01:00 heimdall kernel: [5447587.218787] spiceproxy work[2330788]: segfault at 5585e19d9370 ip 000055865fd14285 sp 00007ffe3d5c0590 error 4 in perl[55865fb96000+195000] likely on CPU 7 (core 7, socket 0)
2024-02-22T14:08:55.072699+01:00 heimdall kernel: [5447587.218802] Code: 21 00 49 8b 04 24 49 8b 6c 24 10 4c 8b 40 10 4c 8a 44 24 10 48 8b 54 24 08 4c 89 33 49 01 c8 4c 05 fa 41 f6 44 24 0f 20 7d 11 <48> 8b 83 d0 00 00 80 8b 40 38 f7 d0 83 e0 98 09 c5 41 89 e9 4c 89
2024-02-22T14:08:55.076557+01:00 heimdall kernel: [5447587.225639] traps: spiceproxy[3422] trap invalid opcode ip:55865fd09a8b sp:7ffe3d5bfff8 error:0 in perl[55865fb96000+195000]
2024-02-22T14:08:55.180616+01:00 heimdall kernel: [5447587.255496] ================================================================================
2024-02-22T14:08:55.180628+01:00 heimdall kernel: [5447587.255510] UBSAN: array-index-out-of-bounds in /home/tom/sources/pve/pve-kernel/proxmox-kernel-6.2.16/modules/pkg-zfs/module/zfs/dbuf.c:2870:32
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255520] index 16777216 is out of range for type 'dbuf_cache_t [2]'
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255526] CPU: 18 PID: 897247 Comm: CPU 6/KVM Tainted: P           O       6.2.16-15-pve #1
2024-02-22T14:08:55.180632+01:00 heimdall kernel: [5447587.255530] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
 

Attachments

  • syslog_crash.log
    91.9 KB · Views: 2
Hi,
FYI, I just got this message but I'm not sure if it's not a side effect of a more global crash.
The full log is attached.

Code:
2024-02-22T14:08:55.044549+01:00 heimdall kernel: [5447587.192484] show_signal_msg: 13 callbacks suppressed
2024-02-22T14:08:55.044566+01:00 heimdall kernel: [5447587.192489] CPU 6/KVM[897247]: segfault at 9000000 ip 0000000009000000 sp 00007fb2089aff88 error 14 in qemu-system-x86_64[558a46e1e000+316000] likely on CPU 18 (core 2, socket 0)
2024-02-22T14:08:55.044567+01:00 heimdall kernel: [5447587.192510] Code: Unable to access opcode bytes at 0x8ffffd6.
2024-02-22T14:08:55.046982+01:00 heimdall systemd[1]: Looping too fast. Throttling execution a little.
2024-02-22T14:08:55.060538+01:00 heimdall kernel: [5447587.208708] traps: worker[2862804] trap invalid opcode ip:5556c6a13aca sp:7fe2d9ff61f0 error:0 in qemu-system-x86_64[5556c6447000+612000]
2024-02-22T14:08:55.072693+01:00 heimdall kernel: [5447587.218787] spiceproxy work[2330788]: segfault at 5585e19d9370 ip 000055865fd14285 sp 00007ffe3d5c0590 error 4 in perl[55865fb96000+195000] likely on CPU 7 (core 7, socket 0)
2024-02-22T14:08:55.072699+01:00 heimdall kernel: [5447587.218802] Code: 21 00 49 8b 04 24 49 8b 6c 24 10 4c 8b 40 10 4c 8a 44 24 10 48 8b 54 24 08 4c 89 33 49 01 c8 4c 05 fa 41 f6 44 24 0f 20 7d 11 <48> 8b 83 d0 00 00 80 8b 40 38 f7 d0 83 e0 98 09 c5 41 89 e9 4c 89
2024-02-22T14:08:55.076557+01:00 heimdall kernel: [5447587.225639] traps: spiceproxy[3422] trap invalid opcode ip:55865fd09a8b sp:7ffe3d5bfff8 error:0 in perl[55865fb96000+195000]
2024-02-22T14:08:55.180616+01:00 heimdall kernel: [5447587.255496] ================================================================================
you got segfaults in two different processes and complaints about invalid opcodes. I'd run a memtest and ensure latest BIOS upgrade and CPU microcode is installed: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_firmware_cpu
Code:
2024-02-22T14:08:55.180628+01:00 heimdall kernel: [5447587.255510] UBSAN: array-index-out-of-bounds in /home/tom/sources/pve/pve-kernel/proxmox-kernel-6.2.16/modules/pkg-zfs/module/zfs/dbuf.c:2870:32
Did you compile the kernel yourself?
Code:
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255520] index 16777216 is out of range for type 'dbuf_cache_t [2]'
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255526] CPU: 18 PID: 897247 Comm: CPU 6/KVM Tainted: P           O       6.2.16-15-pve #1
Also, I'm guessing you are running Proxmox VE 8, (since 6.2.16-15 was only released for Proxmox VE 8), but that uses kernel 6.5 nowadays. Please upgrade.
Code:
2024-02-22T14:08:55.180632+01:00 heimdall kernel: [5447587.255530] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
 

Fiona,

It seems that zfs 2.2.3 has come out. Has the version of zstd been updated yet there within? Moreover, will ProxMox soon update to ZFS 2.2.3? I presume that if I simply switch my compression from zstd to a different algorithm this issue is gone, correct?

Thanks!

Stuart
 
It seems that zfs 2.2.3 has come out. Has the version of zstd been updated yet there within?
I don't think so telling from a quick search. And the relevant issue is still open too: https://github.com/openzfs/zfs/issues/15219

Moreover, will ProxMox soon update to ZFS 2.2.3?
Yes, the patch for that has already been sent (but not yet applied): https://lists.proxmox.com/pipermail/pve-devel/2024-March/062112.html

I presume that if I simply switch my compression from zstd to a different algorithm this issue is gone, correct?
Have you seen any actual issue? Otherwise, I don't think you need to worry about the UBSAN warning in this case.

AFAIK, there are no plans to deviate from the bundled zstd version compared to upstream ZFS from our side.
 
I don't think so telling from a quick search. And the relevant issue is still open too: https://github.com/openzfs/zfs/issues/15219

That was my assessment too, thanks for confirming for me.

Yes, the patch for that has already been sent (but not yet applied): https://lists.proxmox.com/pipermail/pve-devel/2024-March/062112.html


Have you seen any actual issue? Otherwise, I don't think you need to worry about the UBSAN warning in this case.

Well when the bug hits for me it crashes the entire server, both on ProxMox and on other Linux machines running ZFS.

AFAIK, there are no plans to deviate from the bundled zstd version compared to upstream ZFS from our side.

Oh no I meant something else. I meant it might be better (for now) if I just decompress and compress my files again using a compression methodology other than zstd as a work around.

Than for your thoughts on this.

Stuart
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!