MariaDB database corruption.

Jan-Willem Spuij · 2025-02-05T13:11:15+0100

Hello,

We are experiencing database corruption with an Ubuntu 24.04 LTS container and the latest MariaDB 10.11.8-MariaDB-ubuntu.24.04.1. PVE kernel version is: 6.8.12-6-pve.
Bugs are reported in the MariaDB jira:

https://jira.mariadb.org/browse/MDEV-35334
https://jira.mariadb.org/browse/MDEV-35886

However, the hangs seem to be connected to a series of patches backported to kernel version 6.1 (LTS). However, we are running 6.8.12-5-pve of course. I'm at a loss now what to do. Is this known within the proxmox community? Anybody else experiencing corruption of InnoDB tables (especially the first byte being the checksum ending up as 01 00 00 00)

fabian · 2025-02-05T13:29:58+0100

the linked bug report seems to indicate this was an issue limited to the 6.1 backport.. do you really have the same symptoms? how are you running mariadb, in a VM or in a container?

Jan-Willem Spuij · 2025-02-05T13:51:03+0100

It's a container, not a VM, and I'm having one of the symptoms: in the IBD file, the first four bytes of the header are changed into 01 00 00 00. The hanging does not occur. However within those jira issues it's speculated that the hang and corruption could have the same origin, namely those patches. In my case (not using Row compression), it seems that the checksum is stored elsewhere, so I can just change the 01 00 00 00 to 00 00 00 00 and mariaDB can read the table again.

I do however try to avoid editing table files manually after a server restart. This corruption only occurs when stopping the MariaDB server. As long as the server is running, the corruption does not occur. The corruption is only the first byte. Correcting the first byte, lets mariadb read the table instantly. Checksum with innodbchecksum then matches again.

Servers are ECC memory checked, ZFS storage in raid1, scrubs reveal no errors.

fabian · 2025-02-05T14:22:06+0100

okay, so a container should definitely use the host kernel and thus not be affected by the broken backport (the code in question looks quite different in 6.8). you could try our 6.11-based kernel, maybe there is a second bug that got already fixed. else we'd maybe need to narrow it down a bit more..

Jan-Willem Spuij · 2025-02-05T14:27:38+0100

The linux kernel maintainers more or less came to the same conclusion (That the corruption is not triggered by the recent patches, only the hangs).

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093243

It seems that it's either something else in the kernel, or something MariaDB specific. I'll try a more recent kernel at least.

fabian · 2025-02-05T14:41:16+0100

keep us posted! if you find any more leads on the kernel side and can reproduce the corruption easily, we can also build test kernels if needed.

Search

Search

MariaDB database corruption.

Jan-Willem Spuij

Active Member

fabian

Proxmox Staff Member

Jan-Willem Spuij

Active Member

fabian

Proxmox Staff Member

Jan-Willem Spuij

Active Member

fabian

Proxmox Staff Member

We value your privacy