High IO Load

tincboy

Renowned Member
Apr 13, 2010
466
6
83
I've a Proxmox 1.9 server which it's rises to 20 right now,
I've reboot it one and it was OK for half an hour but again it rises to 20,
I've used "atop -l -D" to find the process id which is using IO very much but it shows me the init process with id of 1 just read 512 MBps and make the disk busy and rise the load
Any one has any suggestion to fix this issue?
 
I've never used atop, I usually use iotop when I am looking for IO related issues.

apt-get install iotop

I usually add a delay of 3-5 seconds, ie:
iotop -d 3

That command causes iotop to monitor io for 3 seconds, then display the processes using io with the one using the most at the top.
It will refresh every three seconds.

Also take a look at the output of free
If you are using lots of swap that could be part of your problem.
 
I've never used atop, I usually use iotop when I am looking for IO related issues.

apt-get install iotop

I usually add a delay of 3-5 seconds, ie:
iotop -d 3

That command causes iotop to monitor io for 3 seconds, then display the processes using io with the one using the most at the top.
It will refresh every three seconds.

Also take a look at the output of free
If you are using lots of swap that could be part of your problem.
Thanks, I will take a look with iotop instead of atop as you advised,
this server has 2 GB free memory and there's no swap.
Anyone knows what's for that huge amount of read of init process? pid 1
 
I've found out if I start a specific VM this situation occurs, while it's stoped everything is al right,
this vm is on top of a lvm,
even when I want to make a backup from that vm via vzdump command load rises to 25,
Does any one has any suggestion how to make a backup from that vm or solve the issue ?
 
I've found out if I start a specific VM this situation occurs, while it's stoped everything is al right,
this vm is on top of a lvm,
even when I want to make a backup from that vm via vzdump command load rises to 25,
Does any one has any suggestion how to make a backup from that vm or solve the issue ?

Does the load go that high if you backup that VM while it is stopped?
Are there any IO related errors if you run the command: dmesg

I find that setting the cache mode for LVM disks to none helps with performance.
Stop the VM.
Edit the qemu config for that vm: /etc/qemu-server/XXX.conf where XXX is your VMid.

After each lvm disk entry add ,cache=none so it looks something like this:

virtio0: DRBD1:vm-101-disk-1,cache=none

Lastly the problem could be that the VM is simply doing too much disk IO and you just need to investigate what is wrong inside that VM.
 
Yes, even while it's stoped and I want to just backup the vm it rises the load,
It's the output of " dmesg | grep error "
Code:
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
sd 1:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 2356718606
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
sd 1:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 2356718606
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
         res 51/40:00:0e:ac:78/40:00:8c:00:00/00 Emask 0x9 (media error)
ata2.00: error: { UNC }
sd 1:0:0:0: [sdb] Add. Sense: Unrecovered read error - auto reallocate failed
end_request: I/O error, dev sdb, sector 2356718606

running fsck -f -p didn't help that much,
 
I've moved all other vms to an other server,
But what can I do for that damaged vm?
Hi,
you have some choices, I would go this way:
Power down the node and take the disk to a computer which have an second blank disk.
Boot there an live-distro (like grml) and try to copy the whole disk with dd or better with ddrescue/gddrescue to the blank disk.
Sometime you have an real choice to get the data back, if the hdd powered down for a while (temperature).

If the logical volume much smaller than the whole disk, you should copy only the lv (also with dd or ddrescue).

If you don't have experience with dd and lvm ask one which have.

Udo