LXC on ZFS is failing MARIA-DB after a day

cpzengel · May 23, 2017

Hi,

since a few days my mariadb is getting killed by something

I am at Debian 9 current Patches and Proxmox 4.4
After ZFS Rollback everything is fine for a few hours and crashes again

---

root@pve1:~# pveversion -v

proxmox-ve: 4.4-87 (running kernel: 4.4.59-1-pve)

pve-manager: 4.4-13 (running version: 4.4-13/7ea56165)

pve-kernel-4.4.35-1-pve: 4.4.35-77

pve-kernel-4.4.59-1-pve: 4.4.59-87

pve-kernel-4.4.44-1-pve: 4.4.44-84

pve-kernel-4.4.49-1-pve: 4.4.49-86

lvm2: 2.02.116-pve3

corosync-pve: 2.4.2-2~pve4+1

libqb0: 1.0.1-1

pve-cluster: 4.0-49

qemu-server: 4.0-110

pve-firmware: 1.1-11

libpve-common-perl: 4.0-94

libpve-access-control: 4.0-23

libpve-storage-perl: 4.0-76

pve-libspice-server1: 0.12.8-2

vncterm: 1.3-2

pve-docs: 4.4-4

pve-qemu-kvm: 2.7.1-4

pve-container: 1.0-99

pve-firewall: 2.0-33

pve-ha-manager: 1.0-40

ksm-control-daemon: 1.2-1

glusterfs-client: 3.5.2-2+deb8u3

lxc-pve: 2.0.7-4

lxcfs: 2.0.6-pve1

criu: 1.6.0-1

novnc-pve: 0.5-9

smartmontools: 6.5+svn4324-1~pve80

zfsutils: 0.6.5.9-pve15~bpo80

----

-- Unit mariadb.service has begun starting up.

May 23 08:35:46 ISPCONFIG mysqld[11584]: 2017-05-23 8:35:46 4144297984 [Note] /usr/sbin/mysqld (mysqld 10.1.23-MariaDB-8) starting as process 11584 ..

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV

May 23 08:35:46 ISPCONFIG systemd[1]: Failed to start MariaDB database server.

-- Subject: Unit mariadb.service has failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has failed.

--

-- The result is failed.

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Unit entered failed state.

May 23 08:35:46 ISPCONFIG systemd[1]: mariadb.service: Failed with result 'signal'.

May 23 08:35:52 ISPCONFIG systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.

May 23 08:35:52 ISPCONFIG systemd[1]: Stopped MariaDB database server.

-- Subject: Unit mariadb.service has finished shutting down

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has finished shutting down.

May 23 08:35:52 ISPCONFIG systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted

May 23 08:35:52 ISPCONFIG systemd[1]: Failed to set devices.allow on /system.slice/mariadb.service: Operation not permitted

May 23 08:35:52 ISPCONFIG systemd[1]: Starting MariaDB database server...

-- Subject: Unit mariadb.service has begun start-up

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has begun starting up.

May 23 08:35:53 ISPCONFIG mysqld[11714]: 2017-05-23 8:35:53 4144052224 [Note] /usr/sbin/mysqld (mysqld 10.1.23-MariaDB-8) starting as process 11714 ..

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Main process exited, code=killed, status=11/SEGV

May 23 08:35:55 ISPCONFIG systemd[1]: Failed to start MariaDB database server.

-- Subject: Unit mariadb.service has failed

-- Defined-By: systemd

-- Support: https://www.debian.org/support

--

-- Unit mariadb.service has failed.

--

-- The result is failed.

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Unit entered failed state.

May 23 08:35:55 ISPCONFIG systemd[1]: mariadb.service: Failed with result 'signal'.

Nemesiz · May 23, 2017

I don't think its related to ZFS. Try to launch MariaDB from console

Code:

/usr/sbin/mysqld

and watch what happens

Rhinox · May 23, 2017

11/segv is segmentation fault. Does your DB has enough of RAM?

cpzengel · May 23, 2017

Rhinox said:
11/segv is segmentation fault. Does your DB has enough of RAM?

seriously? i just reinstalled the entire ipsconfig system to a vm and migrated all the data 3h long

morph027 · May 24, 2017

Also running multiple MariaDBs in LXC on ZFS without any problems here.

RedneckBob · May 24, 2017

Anything interesting in the output of 'journalctl -xn' or 'dmesg'?

-RB

LnxBil · May 25, 2017

This could be the problem, but do not have to be:

ZFS does not support the O_SYNC on a filesystem (not zvol). Therefore an access with O_SYNC will fail and the software will have to deal with this. If you get a core dump of your killed mysql, maybe you can determine if this is the case. Also, debugging via strace could be helpful.

I tried to solve this problem by monkey-patching the glibc's open-call but failed in the long run (e.g. to get Oracle Database running "properly").

guletz · May 25, 2017

LnxBil said:
Therefore an access with O_SYNC will fail and the software will have to deal with this.

But I can guess you can modify your mariadb conf (for perconadb is ok), and to say that you will use fsync insted of O_SYNC. I can also mention that in case of perconadb, by deafault it use fsync if O_SYNC is not possible. Another ideea is to make a test VM, uninstall mariadb, install perconabd, and see the very verbose log . Maybe you will see what is the problem.
In the worst case, if perconadb is ok, and mariadb is not ok, you can switch to perconadb. Is not a big difference between them.

Good luck - you need it

LnxBil · May 25, 2017

guletz said:
But I can guess you can modify your mariadb conf (for perconadb is ok), and to say that you will use fsync insted of O_SYNC. I can also mention that in case of perconadb, by deafault it use fsync if O_SYNC is not possible.

Oh, that's great! Unfortunately, this is not possible with the "real" Oracle Database. I spend over 16hrs on that topic... then I pulled it.

fortechitsolutions · May 26, 2017

Have you tested same environment with non-ZFS backed storage for this VM, to see if problem is <host> vs <datastore = ZFS> ?

Search

Search

LXC on ZFS is failing MARIA-DB after a day

cpzengel

Renowned Member

Nemesiz

Renowned Member

Rhinox

Active Member

cpzengel

Renowned Member

morph027

Renowned Member

RedneckBob

Renowned Member

LnxBil

Distinguished Member

guletz

Famous Member

LnxBil

Distinguished Member

fortechitsolutions

Renowned Member