We have a new critical problem that started to appear on more than one of our PVE 3.0 nodes.
During nightly vzdump snapshot backups, possibly when a heavy file IO operation occurs inside a container, the entire server freezes in iowait. Processes still run, but seemingly no disk operations finish, load average crawls up to the hundreds. Only a hard reset solves the problem (even shutdown -rn is unable to successfully restart the server).
Problem started to appear since we upgraded to the new 2.6.32-20 kernel last week!
htop shows huge kernel iowait
iotop shows no userland io operations
console messages show hung task timeout
Environment
Intel Core i7, Adaptec HW RAID
Proxmox VE 3.0
ext4 filesystem, deadline scheduler
During nightly vzdump snapshot backups, possibly when a heavy file IO operation occurs inside a container, the entire server freezes in iowait. Processes still run, but seemingly no disk operations finish, load average crawls up to the hundreds. Only a hard reset solves the problem (even shutdown -rn is unable to successfully restart the server).
Problem started to appear since we upgraded to the new 2.6.32-20 kernel last week!
htop shows huge kernel iowait
iotop shows no userland io operations
console messages show hung task timeout
Environment
Intel Core i7, Adaptec HW RAID
Proxmox VE 3.0
ext4 filesystem, deadline scheduler
Code:
pve-manager: 3.0-23 (pve-manager/3.0/957f0862)
running kernel: 2.6.32-20-pve
proxmox-ve-2.6.32: 3.0-100
pve-kernel-2.6.32-20-pve: 2.6.32-100
pve-kernel-2.6.32-19-pve: 2.6.32-96
pve-kernel-2.6.32-18-pve: 2.6.32-88
lvm2: 2.02.95-pve3
clvm: 2.02.95-pve3
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-1
pve-cluster: 3.0-4
qemu-server: 3.0-20
pve-firmware: 1.0-22
libpve-common-perl: 3.0-4
libpve-access-control: 3.0-4
libpve-storage-perl: 3.0-8
vncterm: 1.1-4
vzctl: 4.0-1pve3
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-13
ksm-control-daemon: 1.1-1
Last edited: