Open files issue on PVE node

bjsko

Well-Known Member
Sep 25, 2019
30
4
48
Hi,

we experienced an issue with VM's running in Proxmox 6.x where one or more disks from the VM perspective reported 100% busy, but no data was read or written according to iostat. Our storage backend is an external CEPH nautilus cluster.

This seemed to happen if the VM had many filesystems/several disk images attached and high activity. But we were also able to trigger the issue on pretty much any VM by reading and writing a lot of data and maybe run a fstrim or an mkfs. The only way we found to recover the VM was to reboot it.

We have about 250 VM's on two clusters, both on the same version, and could reproduce the issue on either cluster. We were a bit mislead to think it had something to do with the number of disk images for a VM as the issue was first seen in the go-live phase of a VM with more disks than our standard setup - but still only 9 disk images. Obviously the go-live did not happen and we spent a lot of time trying to figure out what was going on.

To cut a long story short; After a while we noticed that when we triggered the issue on a VM, the kvm process on the PVE node for the VM was running with 100% CPU. Running a strace of the kvm process id revealed:
Code:
ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=202, events=POLLIN}, {fd=206, events=POLLIN}, {fd=207, events=POLLIN}, {fd=208, events=POLLIN}], 11, {tv_sec=0, tv_nsec=771556}, NULL, 8) = 2 ([{fd=10, revents=POLLIN}, {fd=11, revents=POLLIN}], left {tv_sec=0, tv_nsec=770679})
read(10, "\2\0\0\0\0\0\0\0", 16)        = 8
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
accept4(11, 0x7f6f26dc8c60, [128], SOCK_CLOEXEC) = -1 EMFILE (Too many open files)
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8

Finally we had an clue.

Looking at /proc/<pid>/limits we saw "Max Open Files" were 1024. Further, counting the file descriptors in /proc/<pid>/fd showed us that as soon as the number of file descriptors went above 1024, the issue occurred on the VM.

We are now setting a higher number of open files pr process in a file in /etc/security/limits.d/ and also for systemd in a file in /etc/systemd/system.conf.d

The question is if we have done something incorrect from the start by not tuning open files or if the default should be set higher by the installation, or at least be mentioned in the documentation (if it isn't already). Another question is what would be a better default that 1024.

Any comment/insights to this would be highly appreciated!

BR
Bjørn
 
  • Like
Reactions: Zing
The actual limits are often setup specific and require manual tuning. With that said, Proxmox VE 6.2 (released this week) includes some increase of these limits.
https://pve.proxmox.com/pipermail/pve-user/2020-May/171669.html

You can also try to use iothreads for the VM disks. By default (off) all disks will be handled by one thread only.
 
It seems this is still an issue on 6.2. Is there any specific reason why the number of open files for a kvm guest would be limited to 1024? Any decent webserver guest would already get into trouble with this limit since the host blocks most network traffic. Also there doesn't seem to be any "good" way to set the number of open files (unless you change both limits.conf and systemd config by hand).
Even older plain kvm servers I managed had more open files per guest (1048576). So maybe can we at least have a howto on how to correctly and future-proof increase this limit? Even better would be that the default would also increase in a future proxmox version.
 
It seems this is still an issue on 6.2. Is there any specific reason why the number of open files for a kvm guest would be limited to 1024? Any decent webserver guest would already get into trouble with this limit since the host blocks most network traffic. Also there doesn't seem to be any "good" way to set the number of open files (unless you change both limits.conf and systemd config by hand).
Even older plain kvm servers I managed had more open files per guest (1048576). So maybe can we at least have a howto on how to correctly and future-proof increase this limit? Even better would be that the default would also increase in a future proxmox version.

I could not see that number of open files changed when updating to 6.2, but as I had been tinkering with it myself, I wasn't sure. I haven't tried a fresh install of 6.2, though.

We use ansible, and as I already had a playbook for "doing stuff" to PVE nodes, I just added the required changes there. To set number of open files, we add a config file under /etc/security/limits.d/ and another under /etc/systemd/system.conf.d/. By adding them in these locations, we hopefully prevent the config to "disappear" during upgrades.

I don't know if this is the preferred way to do it or not, but it seems to work for us.

Tuning is system specific, indeed - but still, some clarification or mentioning tuning parameters like this in the docs or a howto article would have been really helpful.

BR
Bjørn
 
The actual limits are often setup specific and require manual tuning. With that said, Proxmox VE 6.2 (released this week) includes some increase of these limits.
https://pve.proxmox.com/pipermail/pve-user/2020-May/171669.html

You can also try to use iothreads for the VM disks. By default (off) all disks will be handled by one thread only.

Thanks for your reply and sorry for totally forgetting to reply...

As I have mentioned earlier in this thread, we added config for number of open files and haven't seen the described issues after that.
We also experimented with iothreads as part of the troubleshooting process, but did not see any difference for this exact issue.

A list of values that was changed with 6.2 (and their before and after values) would also be great! :)

Having said that, overall we are very happy with the way Proxmox works in our environment and keep on moving systems from our old virtualisation platform to PVE.

Many thanks
Bjørn
 
As I have mentioned earlier in this thread, we added config for number of open files and haven't seen the described issues after that.
We also experimented with iothreads as part of the troubleshooting process, but did not see any difference for this exact issue.
For IOthreads more then one disk is needed. When activated, Qemu uses on thread per disk.

A list of values that was changed with 6.2 (and their before and after values) would also be great! :)
See here.
https://git.proxmox.com/?p=pve-cont...1;hp=7808afef1ab75950d0043b69239e7b02626b098a
 
@Alwin : like I said, it would be nice if proxmox somewhere documented on how to increase the open files per process (or provide a command for this, or per server a setting, ...) and tell people that currently this parameter can cause load issues / server hangs if too low. In my case it blocked all disk io (since the fd's were taken by network connections) and even qemu-ga started to freak out at 100% because of this.
 
The workaround is the next:
1. **/etc/sysctl.d/90-rs-proxmox.conf**: ``` # Default: 1048576 fs.nr_open = 2097152 # Default: 8388608 fs.inotify.max_queued_events = 8388608 # Default: 65536 fs.inotify.max_user_instances = 1048576 # Default: 4194304 fs.inotify.max_user_watches = 4194304 # Default: 262144 vm.max_map_count = 262144 ``` 2. **/etc/security/limits.d/90-rs-proxmox.conf**: ``` * soft nofile 1048576 * hard nofile 2097152 root soft nofile 1048576 root hard nofile 2097152 * soft memlock 1048576 * hard memlock 2097152 ``` 3. `mkdir -p /etc/systemd/system/pvedaemon.service.d && touch /etc/systemd/system/pvedaemon.service.d/limits.conf` Add the content of **/etc/systemd/system/pvedaemon.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 4. `mkdir -p /etc/systemd/system/pve-guests.service.d && touch /etc/systemd/system/pve-guests.service.d/limits.conf` Add the content of **/etc/systemd/system/pve-guests.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 5. `mkdir -p /etc/systemd/system/pve-ha-lrm.service.d && touch /etc/systemd/system/pve-ha-lrm.service.d/limits.conf` Add the content of **/etc/systemd/system/pve-ha-lrm.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 6. Restart processes: ``` systemctl daemon-reload systemctl restart pvedaemon.service pve-ha-lrm.service ``` 7. By design, you are not allowed to restart **pve-guests.service** manually. So you might need to reboot your hypervisor machine.

Keep in mind that `fs.nr_open` should be greater-or-equal to the value you are going to set via limits.conf.
 
Last edited:
  • Like
Reactions: hotelrwanda
The workaround is the next:
1. **/etc/sysctl.d/90-rs-proxmox.conf**: ``` # Default: 1048576 fs.nr_open = 2097152 # Default: 8388608 fs.inotify.max_queued_events = 8388608 # Default: 65536 fs.inotify.max_user_instances = 1048576 # Default: 4194304 fs.inotify.max_user_watches = 4194304 # Default: 262144 vm.max_map_count = 262144 ``` 2. **/etc/security/limits.d/90-rs-proxmox.conf**: ``` * soft nofile 1048576 * hard nofile 2097152 root soft nofile 1048576 root hard nofile 2097152 * soft memlock 1048576 * hard memlock 2097152 ``` 3. `mkdir -p /etc/systemd/system/pvedaemon.service.d && touch /etc/systemd/system/pvedaemon.service.d/limits.conf` Add the content of **/etc/systemd/system/pvedaemon.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 4. `mkdir -p /etc/systemd/system/pve-guests.service.d && touch /etc/systemd/system/pve-guests.service.d/limits.conf` Add the content of **/etc/systemd/system/pve-guests.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 5. `mkdir -p /etc/systemd/system/pve-ha-lrm.service.d && touch /etc/systemd/system/pve-ha-lrm.service.d/limits.conf` Add the content of **/etc/systemd/system/pve-ha-lrm.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 6. Restart processes: ``` systemctl daemon-reload systemctl restart pvedaemon.service pve-ha-lrm.service ``` 7. By design, you are not allowed to restart **pve-guests.service** manually. So you might need to reboot your hypervisor machine.

Keep in mind that `fs.nr_open` should be greater-or-equal to the value you are going to set via limits.conf.
Thank you, this fixed it for us!
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!