Disk IO

dad311

Member
Jul 20, 2009
41
1
6
Twice in the last two days, my Proxmox server has had IO delays or 60-90%. These delays have came just after restarting a OpenVZ container.

CPU usage is low 5-10%, but disk activity is no-stop. The only way I found to fix the issue is to reboot. I recently clustered two servers together, could this cause the issue? One server is on 1.6 and the other on 1.7.

I'm looking for any troubleshooting tips for tracking down this issue next time it happens.

This is a basic home server, nothing mission critical.
 
if you join a node to the cluster all ISO images and templates stored on the local dir are copied to the new node. so I depends on the amount of data you got there.
(see 'du -h /var/lib/vz/templates')
 
I just joined the 2nd server to the cluster. At this time, all the VMs are on the first server. There shouldn't be to much data passing between the servers. I'm puzzled why just restarting a OpenVZ container can bring down the whole server. :confused:
 
I just joined the 2nd server to the cluster. At this time, all the VMs are on the first server. There shouldn't be to much data passing between the servers. I'm puzzled why just restarting a OpenVZ container can bring down the whole server. :confused:
Hi,
it's depends on your io-load, what do you do inside the VM and how good your disks (raidcontroller) are.

BTW: your cluster node should have the same version.

Udo
 
Hi,
it's depends on your io-load, what do you do inside the VM and how good your disks (raidcontroller) are.

BTW: your cluster node should have the same version.

Udo

My io-load is about 1%. I do a Mythtv backend, a few VPNs, PBX and a backup server.

Raid controller? I cant afford one of those!:(

Today, I noticed that any large file that are transfered to my second proxmox server (unclustered now)have wrong md5sums. After a lot of troubleshooting, I found bad memory.

After unclustering, I ran several backups and restarted a few machines without any issues. Im starting to think it might have been the bad memory in the second server or the clustering of to different versions.
 
Late last night I added a new Openvz container to my new un-clustered Proxmox server. Installed the container, hit the start button and disk IO hit 80% and required a reboot 10 minutes later. How can I find out what is accessing the disk and why?
 
This is a bit of a shot in the dark, but if any of you are running Fail2ban on your vm' take note of the log scanning at boot up. I had a similar issue with a light to moderately used apache vm was rebooted that was running Fail2Ban during the middle of the day. The log file was 50+ MB for apache's access logs.

At bootup Fail2Ban (and other things like Root Kit Hunter etc) does a full scan of these logs to catch back up due to the reboot.

If you can eventually get in, kill the process, manually run a log rotate and then start the services back up again.