Performance issue with Proxmox 4 - high IO delay

alex3137

Member
Jan 19, 2015
43
3
8
Hello.

I was running Proxmox 3.x on my server for a year but I experience slow performances after upgrading to Proxmox 4 (I did a clean install from OVH Proxmox 4 template and restored the containers manually).

I am hosting basic apps for my own usage (owncloud, seafile, openvpn, emby, sonarr, couchpotato, deluge...). It used to work like a charm in the past and now everything is slow to respond.

For example, connecting via SSH to the host or a container is unusually long (it takes couple of seconds just to get a reply from the server), connecting to my VPN container can take up to 30 secs when usually it is done in 5 secs, installing a simple package like "tree" can take a minute or two in a container, listing files or auto-completion in bash takes couple of seconds...

I noted in Proxmox UI a unusual high IO delay. With all the containers stopped, just doing aptitude update on the host raises IO delay between 5 - 10 %. All the containers running iddle, there is a permanent IO delay of 1 to 5 % (used to be 0 all the time with Proxmox 3). Downloading one or two torrents from Deluge causes IO delay to raise to 70% !

So I don't understand what is happening, IO delay used to be low all the time with Proxmox 3 expect when I was doing backups or copying files (which is an expected behaviour).

My server is a Kimsufi with an Intel i5 - 4 cores, 16 GB of memory (around 5 GB used at the moment) and 2 TB hard drive.

Here is the result of pveperf when server is iddle with no container running:

Code:
CPU BOGOMIPS:      21331.24
REGEX/SECOND:      1355841
HD SIZE:           19.10 GB (/dev/sda2)
BUFFERED READS:    155.27 MB/sec
AVERAGE SEEK TIME: 7.15 ms
FSYNCS/SECOND:     34.23
DNS EXT:           46.06 ms
DNS INT:           1005.14 ms (xxxx.me)


Running "iotop" on the host while torrents are downloading shows processes consuming all the IOs:

Code:
Total DISK READ :  5.16 M/s | Total DISK WRITE :  4.09 M/s
Actual DISK READ:  0.00 B/s | Actual DISK WRITE:  2.42 M/s
  TID  PRIO  USER  DISK READ  DISK WRITE  SWAPIN  IO>  COMMAND
21322 be/4 messageb  5.16 M/s  2.29 M/s  0.00 % 96.37 % python /usr/bin/deluged --port=58846 --config=/var/lib/deluge/.config/deluge
  574 be/3 root  0.00 B/s  0.00 B/s  0.00 % 92.26 % [jbd2/dm-0-8]
11111 be/3 root  0.00 B/s  0.00 B/s  0.00 % 88.65 % [jbd2/loop10-8]
12310 be/4 root  0.00 B/s  61.28 K/s  0.00 % 18.55 % [nfsd]
12306 be/4 root  0.00 B/s  91.92 K/s  0.00 % 18.52 % [nfsd]
12307 be/4 root  0.00 B/s  153.20 K/s  0.00 % 18.52 % [nfsd]
12308 be/4 root  0.00 B/s  76.60 K/s  0.00 % 11.59 % [nfsd]
12309 be/4 root  0.00 B/s  61.28 K/s  0.00 % 11.58 % [nfsd]
12313 be/4 root  0.00 B/s  107.24 K/s  0.00 % 10.30 % [nfsd]
12311 be/4 root  0.00 B/s  107.24 K/s  0.00 %  9.56 % [nfsd]
12312 be/4 root  0.00 B/s  107.24 K/s  0.00 %  7.10 % [nfsd]
 5043 be/0 root  0.00 B/s 1053.25 K/s  0.00 %  0.75 % [kworker/u17:53]
  208 be/3 root  0.00 B/s  0.00 B/s  0.00 %  0.62 % [jbd2/sda2-8]
  364 be/4 root  0.00 B/s  0.00 B/s  0.00 %  0.02 % [kmmpd-loop5]
26999 be/0 root  0.00 B/s  3.83 K/s  0.00 %  0.00 % [kworker/u17:6]
15367 be/4 root  0.00 B/s  3.83 K/s  0.00 %  0.00 % pmxcfs
31593 be/4 root  0.00 B/s  7.66 K/s  0.00 %  0.00 % rsyslogd -c5 [rs:main Q:Reg]
 1432 be/4 root  0.00 B/s  11.49 K/s  0.00 %  0.00 % pmxcfs
  1 be/4 root  0.00 B/s  0.00 B/s  0.00 %  0.00 % init
  2 be/4 root  0.00 B/s  0.00 B/s  0.00 %  0.00 % [kthreadd]
  3 be/4 root  0.00 B/s  0.00 B/s  0.00 %  0.00 % [ksoftirqd/0]
  5 be/0 root  0.00 B/s  0.00 B/s  0.00 %  0.00 % [kworker/0:0H]
  7 be/4 root  0.00 B/s  0.00 B/s  0.00 %  0.00 % [rcu_sched]
  8 be/4 root  0.00 B/s  0.00 B/s  0.00 %  0.00 % [rcu_bh]

I don't know how to identify the root cause, any suggestion ?
 
  • Like
Reactions: Clement87
When I had an unusual high IO delay (even when all VMs were turned off) I just rebooted my server. That helped me.
 
So I converted all my containers from RAW disks to chroot. IO wait are better, performances have improved but it is still not as good as when I was under OpenVZ. Just have a look at my example below.

I looked at the backup logs when I was running PVE 3.4 and compared them with the duration from a backup log last night with my server under PVE 4.1. The conainer is 185 GB large:
  • when I was under Proxmox 3.4 / OpenVZ: 03:27:15
  • and Proxmox 4.1 / LXC: 09:24:58 !!!!
There is definitely something wrong with storage performance and I see other users reporting the same issue on this forum. Can somebody from Proxmox staff give his feedback / recommendation please ? Thanks
 
I regret I upgraded to Proxmox 4. If I had known I would have such performance issues, I would have stayed on version 3... Many users are reporting the same issue and I haven't seen any official reply from Proxmox staff... Don't know if they are just ignoring the problem or too busy to investigate...
 
I regret I upgraded to Proxmox 4. If I had known I would have such performance issues, I would have stayed on version 3... Many users are reporting the same issue and I haven't seen any official reply from Proxmox staff... Don't know if they are just ignoring the problem or too busy to investigate...

Hi mate,
I have the same problem as you do. But I have also a proxmox 4 at my house. So I can compare. So let's be clear about that. If it is a driver problem, then it would be a debian problem, if it is a process system problem it would be again debian problem. Proxmox is just an software layout at the top of a debian system.
But as it turned out, my problem appear after a consistency of data error. I guess your system is not in RAID mode? if it is not then we can't check if your problem is the same for sure but I'm thinking more about a problem in your system files. And by my humble thinking, I don't think that the file system check is 100% accurate about the contents of the data. So the only thing I could suggest you is to uninstall the proxmox package and see if you still have io wait without anything running. If you don't have, then it was a problem in your files and you have to reinstall the whole thing.
Me i'm backing up all the data, they will excahnge all the parts of my server except the disk and I will see if the io wait is still occuring. if it is then I will have to reinstall the whole thing too.
 
Last edited:
Is it really the disk IO that is causing a long backup duration? There can be problems with the suspend mode of backup on LXC containers where the suspension itself can hang for a long time.

Long wait times on connection are often a sign of DNS problems.
 
he should give us the number of seconds between each percentage to be sure. But there are a lot of others causes than the disks would can cause the problem.
 
@fabian I'm not sure that this was an actual good answer to the problem of alex since you only post a link about the decision of the devs to cahnge the default filesystem and not really to the performance and I/O delays of this actual problem.
Anyway @alex3137 , OVH have replaced my server and apparently it was not a hardware problems and since your and mine problems were encountered after a time of use, to me it's clearly a problem of corruption of the file system. So backup your VM and reinstall your server with the last ovh template proxmox 4.0 and I'm sure that jsut after the installation you won't have any I/O delays. I'm reinstalling now.
 
The switch from ext3 to ext4 changed a default mount option which does affect I/O performance, which is why I posted it (so that users who are experiencing I/O performance issues can check whether they are using ext4 with barriers and whether they want to turn them off or not, the linked thread provides more details about this topic). If you are using ext4, it is something worth investigating.
 
@fabian ok but apparently his problem was encountered after some time, not directly after installation or upgrade. + I am experiencing the same thing and it was appearing after some time on my ovh server.
And it does not appear on my home server proxmox. So more likely a file corruption problem. The choice of the ext4 and the barrier don't seem to matter a lot from the benchmarks i've seen. But of course it's always a good idea but what experiencing alex is more like my problem :
https://forum.proxmox.com/threads/problem-io-delay-after-reconstruction-of-raid.26432/
 
so after, some extensive tests of the disk which revealed no errors, a replacement of the RAM, of all electronics parts except the disks(MB,RAM,controller, CPU), and a reinstallation of the system, no more problems of IO delay, it is now in normal range. So it was a corrupted system file certainly.
Please dev team can you find a way to make a better way to check the contents of the file system? something like a hash of all critical system files?
 
so after, some extensive tests of the disk which revealed no errors, a replacement of the RAM, of all electronics parts except the disks(MB,RAM,controller, CPU), and a reinstallation of the system, no more problems of IO delay, it is now in normal range. So it was a corrupted system file certainly.
Please dev team can you find a way to make a better way to check the contents of the file system? something like a hash of all critical system files?
How do you conclude it was corrupted file system ? Because you are running out of ideas ?

My server is from a fresh Proxmox 4 install. Don't want to do that again (I have a lot of data to backup and restore) if there is no way to confirm this is actually the issue.

I am using ext3 by the way.
 
@alex3137 Yes of course by elimination but not all of it . The hardware has been replaced and after a fresh installation everything have been restored with good performance. So clearly it was a corruption of file. Do I have to remember you that fsck don't check the contents of file. So If there is a corruption occuring on several disk of a raid, it will be repeatedly copied so you should clearly do that. And since, normally you didn't configured much files in the proxmox system itself(because normally we don't have to, especially with the ovh installation process), the restore process of each vm and its configuration file will be fine. But you will maybe have the same problem as me, you will have to restore 2 times the same vm.

There is no way to troubleshoot a corruption only if you would have an hash of the whole system files which was the same version as you and with all the modifications that you wanted or to check all the contents of every system file.
 
I have the same problem with the io delay, even after a fresh reinstall with proxmox 4.2 template of ovh it does not work. My lxc vms are in raw format. I read changing this to chroot helps, how to do that? If i backup and restore the vm in proxmox webinteface, i cannot choose the format.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!