LXC on ZFS is failing MARIA-DB after a day

cpzengel

Renowned Member
Nov 12, 2015
217
22
83
Aschaffenburg, Germany
zfs.rocks
Hi,

since a few days my mariadb is getting killed by something

I am at Debian 9 current Patches and Proxmox 4.4
After ZFS Rollback everything is fine for a few hours and crashes again

---


root@pve1:~# pveversion -v

proxmox-ve: 4.4-87 (running kernel: 4.4.59-1-pve)

pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)

pve-kernel-4.4.35-1-pve: 4.4.35-77

pve-kernel-4.4.59-1-pve: 4.4.59-87

pve-kernel-4.4.44-1-pve: 4.4.44-84

pve-kernel-4.4.49-1-pve: 4.4.49-86

lvm2: 2.02.116-pve3

corosync-pve: 2.4.2-2~pve4+1

libqb0: 1.0.1-1

pve-cluster: 4.0-49

qemu-server: 4.0-110

pve-firmware: 1.1-11

libpve-common-perl: 4.0-94

libpve-access-control: 4.0-23

libpve-storage-perl: 4.0-76

pve-libspice-server1: 0.12.8-2

vncterm: 1.3-2

pve-docs: 4.4-4

pve-qemu-kvm: 2.7.1-4

pve-container: 1.0-99

pve-firewall: 2.0-33

pve-ha-manager: 1.0-40

ksm-control-daemon: 1.2-1

glusterfs-client: 3.5.2-2+deb8u3

lxc-pve: 2.0.7-4

lxcfs: 2.0.6-pve1

criu: 1.6.0-1

novnc-pve: 0.5-9

smartmontools: 6.5+svn4324-1~pve80

zfsutils: 0.6.5.9-pve15~bpo80



----

-- Unit mariadb.service has begun starting up.

May 23 08:35:46 ISPCONFIG mysqld[11584]: 2017-05-23 8:35:46 4144297984 [Note] /usr/sbin/mysqld (mysqld 10.1.23-MariaDB-8) starting as process 11584 ..

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV

May 23 08:35:46 ISPCONFIG systemd[1]: Failed to start MariaDB database server.

-- Subject: Unit mariadb.service has failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has failed.

--

-- The result is failed.

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Unit entered failed state.

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Failed with result 'signal'.

May 23 08:35:52 ISPCONFIG systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.

May 23 08:35:52 ISPCONFIG systemd[1]: Stopped MariaDB database server.

-- Subject: Unit mariadb.service has finished shutting down

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has finished shutting down.

May 23 08:35:52 ISPCONFIG systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted

May 23 08:35:52 ISPCONFIG systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted

May 23 08:35:52 ISPCONFIG systemd[1]: Starting MariaDB database server...

-- Subject: Unit mariadb.service has begun start-up

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has begun starting up.

May 23 08:35:53 ISPCONFIG mysqld[11714]: 2017-05-23 8:35:53 4144052224 [Note] /usr/sbin/mysqld (mysqld 10.1.23-MariaDB-8) starting as process 11714 ..

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV

May 23 08:35:55 ISPCONFIG systemd[1]: Failed to start MariaDB database server.

-- Subject: Unit mariadb.service has failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has failed.

--

-- The result is failed.

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Unit entered failed state.

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Failed with result 'signal'.
 
I don't think its related to ZFS. Try to launch MariaDB from console
Code:
/usr/sbin/mysqld
and watch what happens
 
This could be the problem, but do not have to be:

ZFS does not support the O_SYNC on a filesystem (not zvol). Therefore an access with O_SYNC will fail and the software will have to deal with this. If you get a core dump of your killed mysql, maybe you can determine if this is the case. Also, debugging via strace could be helpful.

I tried to solve this problem by monkey-patching the glibc's open-call but failed in the long run (e.g. to get Oracle Database running "properly").
 
Therefore an access with O_SYNC will fail and the software will have to deal with this.


But I can guess you can modify your mariadb conf (for perconadb is ok), and to say that you will use fsync insted of O_SYNC. I can also mention that in case of perconadb, by deafault it use fsync if O_SYNC is not possible. Another ideea is to make a test VM, uninstall mariadb, install perconabd, and see the very verbose log . Maybe you will see what is the problem.
In the worst case, if perconadb is ok, and mariadb is not ok, you can switch to perconadb. Is not a big difference between them.

Good luck - you need it :(
 
But I can guess you can modify your mariadb conf (for perconadb is ok), and to say that you will use fsync insted of O_SYNC. I can also mention that in case of perconadb, by deafault it use fsync if O_SYNC is not possible.

Oh, that's great! Unfortunately, this is not possible with the "real" Oracle Database. I spend over 16hrs on that topic... then I pulled it.
 
Have you tested same environment with non-ZFS backed storage for this VM, to see if problem is <host> vs <datastore = ZFS> ?
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!