Crash in an OpenVZ VM

massivescale

Renowned Member
May 15, 2012
18
4
68
localhost
I have imported a cPanel server to an OpenVZ container on the latest no-subscription PVE. After about an hour or two of running the container, it reproducibly crashes with a filesystem-related error with processes in D state.

The container then behaves as it has no disk access, while hardware node works great. All drives are local. I haven't found high I/O spikes in graphite logs, so I don't think it's a badly performing disk.

Because of the D-state processes, the container cannot be stopped and reboot is the only answer.

Can somebody help me with this?

Code:
root@le03:~# pveversion 
pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)
root@le03:~# pveversion -v
proxmox-ve-2.6.32: 3.2-126 (running kernel: 2.6.32-29-pve)
pve-manager: 3.2-4 (running version: 3.2-4/e24a91c1)
pve-kernel-2.6.32-28-pve: 2.6.32-124
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.5-1
pve-cluster: 3.0-12
qemu-server: 3.1-16
pve-firmware: 1.1-3
libpve-common-perl: 3.0-18
libpve-access-control: 3.0-11
libpve-storage-perl: 3.0-19
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-6
vzctl: 4.0-1pve5
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.7-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.2-1

Code:
May 26 17:51:32 le03 kernel: php           D ffff8807e73e2900     0 14918  14913 3001 0x00020000
May 26 17:51:32 le03 kernel: ffff880815c1bd78 0000000000000082 0000000000000000 ffff8807cfc46000
May 26 17:51:32 le03 kernel: ffff880815c1bd38 ffffffff811c69c7 ffff880028310770 0000000064642c64
May 26 17:51:32 le03 kernel: ffff880028310760 00000001003ce4c1 ffff8807e73e2ec8 000000000001ec80
May 26 17:51:32 le03 kernel: Call Trace:
May 26 17:51:32 le03 kernel: [<ffffffff811c69c7>] ? __d_lookup+0xa7/0x150
May 26 17:51:32 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 26 17:51:32 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 26 17:51:32 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 26 17:51:32 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 26 17:51:32 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 26 17:51:32 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 26 17:51:32 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 26 17:51:32 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e
May 26 17:51:32 le03 kernel: php           D ffff8807e73733f0     0 14925  14922 3001 0x00020000
May 26 17:51:32 le03 kernel: ffff8808188f1d78 0000000000000086 0000000000000000 ffff880816c97000
May 26 17:51:32 le03 kernel: ffff8808188f1d38 ffffffff811c69c7 ffff8808188f1d08 00000000c8e69303
May 26 17:51:32 le03 kernel: 0000000000000000 00000001003ce4be ffff8807e73739b8 000000000001ec80
May 26 17:51:32 le03 kernel: Call Trace:
May 26 17:51:32 le03 kernel: [<ffffffff811c69c7>] ? __d_lookup+0xa7/0x150
May 26 17:51:32 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 26 17:51:32 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 26 17:51:32 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 26 17:51:32 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 26 17:51:32 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 26 17:51:32 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 26 17:51:32 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 26 17:51:32 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e
May 27 14:15:23 le03 kernel: sshd          D ffff8807e4aa2d30     0 13441   3974 3001 0x00020000
May 27 14:15:23 le03 kernel: ffff88068fd33d78 0000000000200086 0000000000000000 ffff88081bb78ed0
May 27 14:15:23 le03 kernel: ffff88068fd33d08 ffffffff811cfba0 ffff88068fd33d08 0000000064f1173c
May 27 14:15:23 le03 kernel: 0000000000000000 000000010021613f ffff8807e4aa32f8 000000000001ec80
May 27 14:15:23 le03 kernel: Call Trace:
May 27 14:15:23 le03 kernel: [<ffffffff811cfba0>] ? mntput_no_expire+0x30/0x110
May 27 14:15:23 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 27 14:15:23 le03 kernel: [<ffffffff811bfa68>] ? do_filp_open+0x788/0xc60
May 27 14:15:23 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 27 14:15:23 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 27 14:15:23 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 27 14:15:23 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 27 14:15:23 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 27 14:15:23 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 27 14:15:23 le03 kernel: [<ffffffff814a60b2>] ? compat_sys_socketcall+0x192/0x210
May 27 14:15:23 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e
May 27 14:17:23 le03 kernel: sshd          D ffff8807e4aa2d30     0 13441   3974 3001 0x00020000
May 27 14:17:23 le03 kernel: ffff88068fd33d78 0000000000200086 0000000000000000 ffff88081bb78ed0
May 27 14:17:23 le03 kernel: ffff88068fd33d08 ffffffff811cfba0 ffff88068fd33d08 0000000064f1173c
May 27 14:17:23 le03 kernel: 0000000000000000 000000010021613f ffff8807e4aa32f8 000000000001ec80
May 27 14:17:23 le03 kernel: Call Trace:
May 27 14:17:23 le03 kernel: [<ffffffff811cfba0>] ? mntput_no_expire+0x30/0x110
May 27 14:17:23 le03 kernel: [<ffffffff8155db1e>] __mutex_lock_slowpath+0x13e/0x180
May 27 14:17:23 le03 kernel: [<ffffffff811bfa68>] ? do_filp_open+0x788/0xc60
May 27 14:17:23 le03 kernel: [<ffffffff8155d9bb>] mutex_lock+0x2b/0x50
May 27 14:17:23 le03 kernel: [<ffffffff81136901>] generic_file_aio_write+0x71/0x100
May 27 14:17:23 le03 kernel: [<ffffffffa00ce1c8>] ext4_file_write+0x58/0x190 [ext4]
May 27 14:17:23 le03 kernel: [<ffffffff811abf72>] do_sync_write+0xf2/0x140
May 27 14:17:23 le03 kernel: [<ffffffff811ac258>] vfs_write+0xb8/0x1a0
May 27 14:17:23 le03 kernel: [<ffffffff811acb51>] sys_write+0x51/0x90
May 27 14:17:23 le03 kernel: [<ffffffff814a60b2>] ? compat_sys_socketcall+0x192/0x210
May 27 14:17:23 le03 kernel: [<ffffffff810520c0>] cstar_dispatch+0x7/0x2e
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!