Crash in an OpenVZ VM

massivescale

Renowned Member
May 15, 2012
18
4
68
localhost
I have imported a cPanel server to an OpenVZ container on the latest no-subscription PVE. After about an hour or two of running the container, it reproducibly crashes with a filesystem-related error with processes in D state.

The container then behaves as it has no disk access, while hardware node works great. All drives are local. I haven't found high I/O spikes in graphite logs, so I don't think it's a badly performing disk.

Because of the D-state processes, the container cannot be stopped and reboot is the only answer.

Can somebody help me with this?

Code:
root@le03:~# pveversion 
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
root@le03:~# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Code:
May 26 17:51:32 le03 kernel: php           D ffff8807e73e2900     0 14918  14913 3001 0x00020000
May 26 17:51:32 le03 kernel: ffff880815c1bd78 0000000000000082 0000000000000000 ffff8807cfc46000
May 26 17:51:32 le03 kernel: ffff880815c1bd38 ffffffff811c69c7 ffff880028310770 0000000064642c64
May 26 17:51:32 le03 kernel: ffff880028310760 00000001003ce4c1 ffff8807e73e2ec8 000000000001ec80
May 26 17:51:32 le03 kernel: Call Trace:
May 26 17:51:32 le03 kernel: [<ffffffff811c69c7>] ? __d_lookup+0xa7/0x150
May 26 17:51:32 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 26 17:51:32 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 26 17:51:32 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 26 17:51:32 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 26 17:51:32 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 26 17:51:32 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 26 17:51:32 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 26 17:51:32 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e
May 26 17:51:32 le03 kernel: php           D ffff8807e73733f0     0 14925  14922 3001 0x00020000
May 26 17:51:32 le03 kernel: ffff8808188f1d78 0000000000000086 0000000000000000 ffff880816c97000
May 26 17:51:32 le03 kernel: ffff8808188f1d38 ffffffff811c69c7 ffff8808188f1d08 00000000c8e69303
May 26 17:51:32 le03 kernel: 0000000000000000 00000001003ce4be ffff8807e73739b8 000000000001ec80
May 26 17:51:32 le03 kernel: Call Trace:
May 26 17:51:32 le03 kernel: [<ffffffff811c69c7>] ? __d_lookup+0xa7/0x150
May 26 17:51:32 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 26 17:51:32 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 26 17:51:32 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 26 17:51:32 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 26 17:51:32 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 26 17:51:32 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 26 17:51:32 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 26 17:51:32 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e
May 27 14:15:23 le03 kernel: sshd          D ffff8807e4aa2d30     0 13441   3974 3001 0x00020000
May 27 14:15:23 le03 kernel: ffff88068fd33d78 0000000000200086 0000000000000000 ffff88081bb78ed0
May 27 14:15:23 le03 kernel: ffff88068fd33d08 ffffffff811cfba0 ffff88068fd33d08 0000000064f1173c
May 27 14:15:23 le03 kernel: 0000000000000000 000000010021613f ffff8807e4aa32f8 000000000001ec80
May 27 14:15:23 le03 kernel: Call Trace:
May 27 14:15:23 le03 kernel: [<ffffffff811cfba0>] ? mntput_no_expire+0x30/0x110
May 27 14:15:23 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 27 14:15:23 le03 kernel: [<ffffffff811bfa68>] ? do_filp_open+0x788/0xc60
May 27 14:15:23 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 27 14:15:23 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 27 14:15:23 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 27 14:15:23 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 27 14:15:23 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 27 14:15:23 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 27 14:15:23 le03 kernel: [<ffffffff814a60b2>] ? compat_sys_socketcall+0x192/0x210
May 27 14:15:23 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e
May 27 14:17:23 le03 kernel: sshd          D ffff8807e4aa2d30     0 13441   3974 3001 0x00020000
May 27 14:17:23 le03 kernel: ffff88068fd33d78 0000000000200086 0000000000000000 ffff88081bb78ed0
May 27 14:17:23 le03 kernel: ffff88068fd33d08 ffffffff811cfba0 ffff88068fd33d08 0000000064f1173c
May 27 14:17:23 le03 kernel: 0000000000000000 000000010021613f ffff8807e4aa32f8 000000000001ec80
May 27 14:17:23 le03 kernel: Call Trace:
May 27 14:17:23 le03 kernel: [<ffffffff811cfba0>] ? mntput_no_expire+0x30/0x110
May 27 14:17:23 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 27 14:17:23 le03 kernel: [<ffffffff811bfa68>] ? do_filp_open+0x788/0xc60
May 27 14:17:23 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 27 14:17:23 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 27 14:17:23 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 27 14:17:23 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 27 14:17:23 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 27 14:17:23 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 27 14:17:23 le03 kernel: [<ffffffff814a60b2>] ? compat_sys_socketcall+0x192/0x210
May 27 14:17:23 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e