I've been running Proxmox @ OVH for years with very little problem. I've now hit something that I'm stuck on.
Whenever I move a large-ish volume of data across folders on a single Prox node I end up with iowait issues. The cpu will be next to idle but the load will shoot up to ridiculous levels, if I'm not careful it can reach over 80 !
I've tried multiple ways of moving data. For example using cp to move a directory of ~66GB containing ~5300 files. With rsync moving a directory of ~21GB containing ~12000 files. With tar backing up a directory of ~21GB containing ~12000 files. Restoring data to a Zimbra server running in an OpenVZ container.
The high load doesn't appear straight way and seems to occur at a random point in the move. Once it's occurred when I've missed it and then fallen back to normal levels again while the process is still active and before it's finished. I am now having to run a script while doing larger data moves to monitor / kill the process if the load goes too high, but this shouldn't be necessary.
I have thought that perhaps it was a hardware issue but the OVH control panel reports nothing. A couple of weeks ago I booted into rescue mode and again saw nothing reported. I have contacted OVH support who agree, after a long email chain, that there's no hardware problems.
I have also performed multiple tests on the mdadm raid array and can find nothing there.
I've attached some files for extra View attachment fdisk.txtView attachment lspci.txtView attachment mdstat.txtView attachment parted.txtView attachment pveperf-root.txtinformation.
Whenever I move a large-ish volume of data across folders on a single Prox node I end up with iowait issues. The cpu will be next to idle but the load will shoot up to ridiculous levels, if I'm not careful it can reach over 80 !
I've tried multiple ways of moving data. For example using cp to move a directory of ~66GB containing ~5300 files. With rsync moving a directory of ~21GB containing ~12000 files. With tar backing up a directory of ~21GB containing ~12000 files. Restoring data to a Zimbra server running in an OpenVZ container.
The high load doesn't appear straight way and seems to occur at a random point in the move. Once it's occurred when I've missed it and then fallen back to normal levels again while the process is still active and before it's finished. I am now having to run a script while doing larger data moves to monitor / kill the process if the load goes too high, but this shouldn't be necessary.
I have thought that perhaps it was a hardware issue but the OVH control panel reports nothing. A couple of weeks ago I booted into rescue mode and again saw nothing reported. I have contacted OVH support who agree, after a long email chain, that there's no hardware problems.
I have also performed multiple tests on the mdadm raid array and can find nothing there.
I've attached some files for extra View attachment fdisk.txtView attachment lspci.txtView attachment mdstat.txtView attachment parted.txtView attachment pveperf-root.txtinformation.
Code:
# pveversion -v
proxmox-ve-2.6.32: 3.4-150 (running kernel: 2.6.32-37-pve)
pve-manager: 3.4-3 (running version: 3.4-3/2fc72fee)
pve-kernel-2.6.32-37-pve: 2.6.32-150
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-2
pve-cluster: 3.0-16
qemu-server: 3.4-3
pve-firmware: 1.1-4
libpve-common-perl: 3.0-24
libpve-access-control: 3.0-16
libpve-storage-perl: 3.0-32
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.2-8
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1