LXC on ZFS is failing MARIA-DB after a day

cpzengel

Renowned Member
Nov 12, 2015
221
27
93
Aschaffenburg, Germany
zfs.rocks
Hi,

since a few days my mariadb is getting killed by something

I am at Debian 9 current Patches and Proxmox 4.4
After ZFS Rollback everything is fine for a few hours and crashes again

---


root@pve1:~# pveversion -v

proxmox-ve: 4.4-87 (running kernel: 4.4.59-1-pve)

pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)

pve-kernel-4.4.35-1-pve: 4.4.35-77

pve-kernel-4.4.59-1-pve: 4.4.59-87

pve-kernel-4.4.44-1-pve: 4.4.44-84

pve-kernel-4.4.49-1-pve: 4.4.49-86

lvm2: 2.02.116-pve3

corosync-pve: 2.4.2-2~pve4+1

libqb0: 1.0.1-1

pve-cluster: 4.0-49

qemu-server: 4.0-110

pve-firmware: 1.1-11

libpve-common-perl: 4.0-94

libpve-access-control: 4.0-23

libpve-storage-perl: 4.0-76

pve-libspice-server1: 0.12.8-2

vncterm: 1.3-2

pve-docs: 4.4-4

pve-qemu-kvm: 2.7.1-4

pve-container: 1.0-99

pve-firewall: 2.0-33

pve-ha-manager: 1.0-40

ksm-control-daemon: 1.2-1

glusterfs-client: 3.5.2-2+deb8u3

lxc-pve: 2.0.7-4

lxcfs: 2.0.6-pve1

criu: 1.6.0-1

novnc-pve: 0.5-9

smartmontools: 6.5+svn4324-1~pve80

zfsutils: 0.6.5.9-pve15~bpo80



----

-- Unit mariadb.service has begun starting up.

May 23 08:35:46 ISPCONFIG mysqld[11584]: 2017-05-23 8:35:46 4144297984 [Note] /usr/sbin/mysqld (mysqld 10.1.23-MariaDB-8) starting as process 11584 ..

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV

May 23 08:35:46 ISPCONFIG systemd[1]: Failed to start MariaDB database server.

-- Subject: Unit mariadb.service has failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has failed.

--

-- The result is failed.

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Unit entered failed state.

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Failed with result 'signal'.

May 23 08:35:52 ISPCONFIG systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.

May 23 08:35:52 ISPCONFIG systemd[1]: Stopped MariaDB database server.

-- Subject: Unit mariadb.service has finished shutting down

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has finished shutting down.

May 23 08:35:52 ISPCONFIG systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted

May 23 08:35:52 ISPCONFIG systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted

May 23 08:35:52 ISPCONFIG systemd[1]: Starting MariaDB database server...

-- Subject: Unit mariadb.service has begun start-up

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has begun starting up.

May 23 08:35:53 ISPCONFIG mysqld[11714]: 2017-05-23 8:35:53 4144052224 [Note] /usr/sbin/mysqld (mysqld 10.1.23-MariaDB-8) starting as process 11714 ..

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV

May 23 08:35:55 ISPCONFIG systemd[1]: Failed to start MariaDB database server.

-- Subject: Unit mariadb.service has failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has failed.

--

-- The result is failed.

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Unit entered failed state.

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Failed with result 'signal'.
 
This could be the problem, but do not have to be:

ZFS does not support the O_SYNC on a filesystem (not zvol). Therefore an access with O_SYNC will fail and the software will have to deal with this. If you get a core dump of your killed mysql, maybe you can determine if this is the case. Also, debugging via strace could be helpful.

I tried to solve this problem by monkey-patching the glibc's open-call but failed in the long run (e.g. to get Oracle Database running "properly").
 
Therefore an access with O_SYNC will fail and the software will have to deal with this.


But I can guess you can modify your mariadb conf (for perconadb is ok), and to say that you will use fsync insted of O_SYNC. I can also mention that in case of perconadb, by deafault it use fsync if O_SYNC is not possible. Another ideea is to make a test VM, uninstall mariadb, install perconabd, and see the very verbose log . Maybe you will see what is the problem.
In the worst case, if perconadb is ok, and mariadb is not ok, you can switch to perconadb. Is not a big difference between them.

Good luck - you need it :(
 
But I can guess you can modify your mariadb conf (for perconadb is ok), and to say that you will use fsync insted of O_SYNC. I can also mention that in case of perconadb, by deafault it use fsync if O_SYNC is not possible.

Oh, that's great! Unfortunately, this is not possible with the "real" Oracle Database. I spend over 16hrs on that topic... then I pulled it.