ZFS Bug / array-index-out-of-bounds

stuartbh

Active Member
Dec 2, 2019
120
11
38
59
ProxMox users, developers, et alia:

Follows hereupon an excerpt from my /var/log/kern.log of what appears to be a problem in the ZFS code. In researching the issue I found some information that seems related, but I am not fully sure yet (see link #1 below). Has anyone seen such behavior before where ProxMox is running on an IBM x3650 M3?


================================================================================
2023-08-28T18:09:26.977526-07:00 pve4-x3650-m3 kernel: [26946.138017] UBSAN: array-index-out-of-bounds in /build/proxmox-kernel-6.2-dH6W6b/proxmox-kernel-6.2-6.2.16/modules/pkg-zfs/module/zstd/lib/zstd.c:20390:24
2023-08-28T18:09:26.982889-07:00 pve4-x3650-m3 kernel: [26946.151942] index 36 is out of range for type 'U32 [36]'
2023-08-28T18:09:26.982905-07:00 pve4-x3650-m3 kernel: [26946.157303] CPU: 9 PID: 475005 Comm: z_wr_iss Tainted: P O 6.2.16-10-pve #1
2023-08-28T18:09:27.131393-07:00 pve4-x3650-m3 kernel: [26946.165703] Hardware name: IBM System x3650 M3 -[7945AC1]-/69Y4438, BIOS -[D6E164AUS-1.22]- 06/04/2018
2023-08-28T18:09:27.131421-07:00 pve4-x3650-m3 kernel: [26946.175058] Call Trace:
2023-08-28T18:09:27.131424-07:00 pve4-x3650-m3 kernel: [26946.177526] <TASK>
2023-08-28T18:09:27.131425-07:00 pve4-x3650-m3 kernel: [26946.179649] dump_stack_lvl+0x48/0x70
2023-08-28T18:09:27.131426-07:00 pve4-x3650-m3 kernel: [26946.183352] dump_stack+0x10/0x20
2023-08-28T18:09:27.131428-07:00 pve4-x3650-m3 kernel: [26946.186695] __ubsan_handle_out_of_bounds+0xc6/0x110
2023-08-28T18:09:27.131430-07:00 pve4-x3650-m3 kernel: [26946.191697] ZSTD_initStats_ultra+0x4b2f/0x4ba0 [zzstd]
2023-08-28T18:09:27.131431-07:00 pve4-x3650-m3 kernel: [26946.196993] zfs_ZSTD_compressBlock_btultra2+0x40ff/0x42b0 [zzstd]
2023-08-28T18:09:27.131432-07:00 pve4-x3650-m3 kernel: [26946.203232] ? zstd_mempool_alloc+0x242/0x2b0 [zzstd]
2023-08-28T18:09:27.131434-07:00 pve4-x3650-m3 kernel: [26946.208338] ? zstd_alloc+0x19/0x40 [zzstd]
2023-08-28T18:09:27.131435-07:00 pve4-x3650-m3 kernel: [26946.212577] ? ZSTD_resetCCtx_internal+0xee4/0x1290 [zzstd]
2023-08-28T18:09:27.131436-07:00 pve4-x3650-m3 kernel: [26946.218216] ZSTD_buildSeqStore+0x18b/0x310 [zzstd]
2023-08-28T18:09:27.131438-07:00 pve4-x3650-m3 kernel: [26946.223148] ZSTD_compressBlock_internal+0x34/0x3e0 [zzstd]
2023-08-28T18:09:27.131439-07:00 pve4-x3650-m3 kernel: [26946.228784] ZSTD_compressContinue_internal+0x433/0x6d0 [zzstd]
2023-08-28T18:09:27.131440-07:00 pve4-x3650-m3 kernel: [26946.234768] zfs_ZSTD_compressEnd+0x28/0x180 [zzstd]
2023-08-28T18:09:27.131441-07:00 pve4-x3650-m3 kernel: [26946.239789] zfs_ZSTD_compressStream2+0x1e6/0x860 [zzstd]
2023-08-28T18:09:27.131443-07:00 pve4-x3650-m3 kernel: [26946.245248] zfs_ZSTD_compress2+0x63/0xb0 [zzstd]
2023-08-28T18:09:27.131445-07:00 pve4-x3650-m3 kernel: [26946.250013] ? zfs_ZSTD_compress2+0x63/0xb0 [zzstd]
2023-08-28T18:09:27.131446-07:00 pve4-x3650-m3 kernel: [26946.254948] zfs_zstd_compress+0x103/0x1e0 [zzstd]
2023-08-28T18:09:27.131447-07:00 pve4-x3650-m3 kernel: [26946.259789] zio_compress_data+0xd3/0x130 [zfs]
2023-08-28T18:09:27.131447-07:00 pve4-x3650-m3 kernel: [26946.264694] zio_write_compress+0x59b/0xa40 [zfs]
2023-08-28T18:09:27.131449-07:00 pve4-x3650-m3 kernel: [26946.269652] zio_execute+0x97/0x170 [zfs]
2023-08-28T18:09:27.131450-07:00 pve4-x3650-m3 kernel: [26946.273931] taskq_thread+0x2af/0x4d0 [spl]
2023-08-28T18:09:27.131451-07:00 pve4-x3650-m3 kernel: [26946.278172] ? __pfx_default_wake_function+0x10/0x10
2023-08-28T18:09:27.131453-07:00 pve4-x3650-m3 kernel: [26946.283175] ? __pfx_zio_execute+0x10/0x10 [zfs]
2023-08-28T18:09:27.131454-07:00 pve4-x3650-m3 kernel: [26946.288148] ? __pfx_taskq_thread+0x10/0x10 [spl]
2023-08-28T18:09:27.131456-07:00 pve4-x3650-m3 kernel: [26946.292917] kthread+0xe9/0x110
2023-08-28T18:09:27.131457-07:00 pve4-x3650-m3 kernel: [26946.296090] ? __pfx_kthread+0x10/0x10
2023-08-28T18:09:27.131458-07:00 pve4-x3650-m3 kernel: [26946.299871] ret_from_fork+0x2c/0x50
2023-08-28T18:09:27.131459-07:00 pve4-x3650-m3 kernel: [26946.303479] </TASK>
2023-08-28T18:09:27.131460-07:00 pve4-x3650-m3 kernel: [26946.305721] ================================================================================

Link #1: https://bugzilla.kernel.org/show_bug.cgi?id=215943

Thanks in advance!

Stuart
 
Last edited:
  • Like
Reactions: Stoiko Ivanov
Fiona, et alia:

Yes, I’ve hit this issue multiple times.

When might the patch be hitting ProxMox?

Stuart
 
Then I'm surprised nobody else seems to have.

I'm afraid I can't give you any time window for that. Let's see what upstream ZFS says.

I understand.

Thank you for your research and for opening an issue with ZFS.


Stuart
 
Fiona, et alia:

I took notice that no one has actioned or commented on the bug you raised with upstream ZFS. Interesting, I’ve always seen ZFS as a very active project.

Stuart
 
Fiona, et alia:

I presume at this juncture the current version of ProxMox has a newer version of ZFS and as such is inclusive of the fix for the aforementioned issue?

Stuart
 
what's the effect of this bug? does it cause data corruption?
I'm pretty sure then it would've been addressed a long time ago. Disclaimer: I have no idea about zstd, but telling from the function names, it might just cause a non-ideal cost calculation during compression.
 
  • Like
Reactions: RolandK
FYI, I just got this message but I'm not sure if it's not a side effect of a more global crash.
The full log is attached.

Code:
2024-02-22T14:08:55.044549+01:00 heimdall kernel: [5447587.192484] show_signal_msg: 13 callbacks suppressed
2024-02-22T14:08:55.044566+01:00 heimdall kernel: [5447587.192489] CPU 6/KVM[897247]: segfault at 9000000 ip 0000000009000000 sp 00007fb2089aff88 error 14 in qemu-system-x86_64[558a46e1e000+316000] likely on CPU 18 (core 2, socket 0)
2024-02-22T14:08:55.044567+01:00 heimdall kernel: [5447587.192510] Code: Unable to access opcode bytes at 0x8ffffd6.
2024-02-22T14:08:55.046982+01:00 heimdall systemd[1]: Looping too fast. Throttling execution a little.
2024-02-22T14:08:55.060538+01:00 heimdall kernel: [5447587.208708] traps: worker[2862804] trap invalid opcode ip:5556c6a13aca sp:7fe2d9ff61f0 error:0 in qemu-system-x86_64[5556c6447000+612000]
2024-02-22T14:08:55.072693+01:00 heimdall kernel: [5447587.218787] spiceproxy work[2330788]: segfault at 5585e19d9370 ip 000055865fd14285 sp 00007ffe3d5c0590 error 4 in perl[55865fb96000+195000] likely on CPU 7 (core 7, socket 0)
2024-02-22T14:08:55.072699+01:00 heimdall kernel: [5447587.218802] Code: 21 00 49 8b 04 24 49 8b 6c 24 10 4c 8b 40 10 4c 8a 44 24 10 48 8b 54 24 08 4c 89 33 49 01 c8 4c 05 fa 41 f6 44 24 0f 20 7d 11 <48> 8b 83 d0 00 00 80 8b 40 38 f7 d0 83 e0 98 09 c5 41 89 e9 4c 89
2024-02-22T14:08:55.076557+01:00 heimdall kernel: [5447587.225639] traps: spiceproxy[3422] trap invalid opcode ip:55865fd09a8b sp:7ffe3d5bfff8 error:0 in perl[55865fb96000+195000]
2024-02-22T14:08:55.180616+01:00 heimdall kernel: [5447587.255496] ================================================================================
2024-02-22T14:08:55.180628+01:00 heimdall kernel: [5447587.255510] UBSAN: array-index-out-of-bounds in /home/tom/sources/pve/pve-kernel/proxmox-kernel-6.2.16/modules/pkg-zfs/module/zfs/dbuf.c:2870:32
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255520] index 16777216 is out of range for type 'dbuf_cache_t [2]'
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255526] CPU: 18 PID: 897247 Comm: CPU 6/KVM Tainted: P           O       6.2.16-15-pve #1
2024-02-22T14:08:55.180632+01:00 heimdall kernel: [5447587.255530] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
 

Attachments

Hi,
FYI, I just got this message but I'm not sure if it's not a side effect of a more global crash.
The full log is attached.

Code:
2024-02-22T14:08:55.044549+01:00 heimdall kernel: [5447587.192484] show_signal_msg: 13 callbacks suppressed
2024-02-22T14:08:55.044566+01:00 heimdall kernel: [5447587.192489] CPU 6/KVM[897247]: segfault at 9000000 ip 0000000009000000 sp 00007fb2089aff88 error 14 in qemu-system-x86_64[558a46e1e000+316000] likely on CPU 18 (core 2, socket 0)
2024-02-22T14:08:55.044567+01:00 heimdall kernel: [5447587.192510] Code: Unable to access opcode bytes at 0x8ffffd6.
2024-02-22T14:08:55.046982+01:00 heimdall systemd[1]: Looping too fast. Throttling execution a little.
2024-02-22T14:08:55.060538+01:00 heimdall kernel: [5447587.208708] traps: worker[2862804] trap invalid opcode ip:5556c6a13aca sp:7fe2d9ff61f0 error:0 in qemu-system-x86_64[5556c6447000+612000]
2024-02-22T14:08:55.072693+01:00 heimdall kernel: [5447587.218787] spiceproxy work[2330788]: segfault at 5585e19d9370 ip 000055865fd14285 sp 00007ffe3d5c0590 error 4 in perl[55865fb96000+195000] likely on CPU 7 (core 7, socket 0)
2024-02-22T14:08:55.072699+01:00 heimdall kernel: [5447587.218802] Code: 21 00 49 8b 04 24 49 8b 6c 24 10 4c 8b 40 10 4c 8a 44 24 10 48 8b 54 24 08 4c 89 33 49 01 c8 4c 05 fa 41 f6 44 24 0f 20 7d 11 <48> 8b 83 d0 00 00 80 8b 40 38 f7 d0 83 e0 98 09 c5 41 89 e9 4c 89
2024-02-22T14:08:55.076557+01:00 heimdall kernel: [5447587.225639] traps: spiceproxy[3422] trap invalid opcode ip:55865fd09a8b sp:7ffe3d5bfff8 error:0 in perl[55865fb96000+195000]
2024-02-22T14:08:55.180616+01:00 heimdall kernel: [5447587.255496] ================================================================================
you got segfaults in two different processes and complaints about invalid opcodes. I'd run a memtest and ensure latest BIOS upgrade and CPU microcode is installed: https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#sysadmin_firmware_cpu
Code:
2024-02-22T14:08:55.180628+01:00 heimdall kernel: [5447587.255510] UBSAN: array-index-out-of-bounds in /home/tom/sources/pve/pve-kernel/proxmox-kernel-6.2.16/modules/pkg-zfs/module/zfs/dbuf.c:2870:32
Did you compile the kernel yourself?
Code:
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255520] index 16777216 is out of range for type 'dbuf_cache_t [2]'
2024-02-22T14:08:55.180630+01:00 heimdall kernel: [5447587.255526] CPU: 18 PID: 897247 Comm: CPU 6/KVM Tainted: P           O       6.2.16-15-pve #1
Also, I'm guessing you are running Proxmox VE 8, (since 6.2.16-15 was only released for Proxmox VE 8), but that uses kernel 6.5 nowadays. Please upgrade.
Code:
2024-02-22T14:08:55.180632+01:00 heimdall kernel: [5447587.255530] Hardware name: Gigabyte Technology Co., Ltd. X399 DESIGNARE EX/X399 DESIGNARE EX-CF, BIOS F12 12/11/2019
 

Fiona,

It seems that zfs 2.2.3 has come out. Has the version of zstd been updated yet there within? Moreover, will ProxMox soon update to ZFS 2.2.3? I presume that if I simply switch my compression from zstd to a different algorithm this issue is gone, correct?

Thanks!

Stuart
 
It seems that zfs 2.2.3 has come out. Has the version of zstd been updated yet there within?
I don't think so telling from a quick search. And the relevant issue is still open too: https://github.com/openzfs/zfs/issues/15219

Moreover, will ProxMox soon update to ZFS 2.2.3?
Yes, the patch for that has already been sent (but not yet applied): https://lists.proxmox.com/pipermail/pve-devel/2024-March/062112.html

I presume that if I simply switch my compression from zstd to a different algorithm this issue is gone, correct?
Have you seen any actual issue? Otherwise, I don't think you need to worry about the UBSAN warning in this case.

AFAIK, there are no plans to deviate from the bundled zstd version compared to upstream ZFS from our side.
 
I don't think so telling from a quick search. And the relevant issue is still open too: https://github.com/openzfs/zfs/issues/15219

That was my assessment too, thanks for confirming for me.

Yes, the patch for that has already been sent (but not yet applied): https://lists.proxmox.com/pipermail/pve-devel/2024-March/062112.html


Have you seen any actual issue? Otherwise, I don't think you need to worry about the UBSAN warning in this case.

Well when the bug hits for me it crashes the entire server, both on ProxMox and on other Linux machines running ZFS.

AFAIK, there are no plans to deviate from the bundled zstd version compared to upstream ZFS from our side.

Oh no I meant something else. I meant it might be better (for now) if I just decompress and compress my files again using a compression methodology other than zstd as a work around.

Than for your thoughts on this.

Stuart
 
Last edited:
Fiona,

I see that with respect to the bug filed with OpenZFS (https://github.com/openzfs/zfs/issues/15219) there seems to be a question for you awaiting reply?

Can you kindly look into if the OpenZFS folks have yet resolved this issue? I am running the latest ProxMox and would like to know if I have to switch to different compression or what. I do not wish to keep having system crashes when I use the ZFS array (it is currently offline, but I need to use it again soon).

Thank you for tracking this issue!

Stuart
 
That's not a question for me, but for the OpenZFS developers by @fabian asking which approach to take for fixing the issue upstream.
 
Fiona,

Ah, I was not sure to whom it was directed. In as much as it was not directed to you, do you have any suggestions as to how we can try to get the ZFS developers to put some attention onto this?

Stuart
 
Unfortunately for you, it doesn't seem like many people are affected by this. You can of course ping the Github issue, but I'm not sure there's much more that can be done until they tell us what the desired approach is.
 
I
Unfortunately for you, it doesn't seem like many people are affected by this. You can of course ping the Github issue, but I'm not sure there's much more that can be done until they tell us what the desired approach is.

I posted a message within GitHub asking why it has taken almost a year to resolve it or at least decide how to resolve it.

Stuart
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!