Open files issue on PVE node

bjsko · May 14, 2020

Hi,

we experienced an issue with VM's running in Proxmox 6.x where one or more disks from the VM perspective reported 100% busy, but no data was read or written according to iostat. Our storage backend is an external CEPH nautilus cluster.

This seemed to happen if the VM had many filesystems/several disk images attached and high activity. But we were also able to trigger the issue on pretty much any VM by reading and writing a lot of data and maybe run a fstrim or an mkfs. The only way we found to recover the VM was to reboot it.

We have about 250 VM's on two clusters, both on the same version, and could reproduce the issue on either cluster. We were a bit mislead to think it had something to do with the number of disk images for a VM as the issue was first seen in the go-live phase of a VM with more disks than our standard setup - but still only 9 disk images. Obviously the go-live did not happen and we spent a lot of time trying to figure out what was going on.

To cut a long story short; After a while we noticed that when we triggered the issue on a VM, the kvm process on the PVE node for the VM was running with 100% CPU. Running a strace of the kvm process id revealed:

Code:

ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=202, events=POLLIN}, {fd=206, events=POLLIN}, {fd=207, events=POLLIN}, {fd=208, events=POLLIN}], 11, {tv_sec=0, tv_nsec=771556}, NULL, 8) = 2 ([{fd=10, revents=POLLIN}, {fd=11, revents=POLLIN}], left {tv_sec=0, tv_nsec=770679})
read(10, "\2\0\0\0\0\0\0\0", 16)        = 8
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8
accept4(11, 0x7f6f26dc8c60, [128], SOCK_CLOEXEC) = -1 EMFILE (Too many open files)
write(10, "\1\0\0\0\0\0\0\0", 8)        = 8

Finally we had an clue.

Looking at /proc/<pid>/limits we saw "Max Open Files" were 1024. Further, counting the file descriptors in /proc/<pid>/fd showed us that as soon as the number of file descriptors went above 1024, the issue occurred on the VM.

We are now setting a higher number of open files pr process in a file in /etc/security/limits.d/ and also for systemd in a file in /etc/systemd/system.conf.d

The question is if we have done something incorrect from the start by not tuning open files or if the default should be set higher by the installation, or at least be mentioned in the documentation (if it isn't already). Another question is what would be a better default that 1024.

Any comment/insights to this would be highly appreciated!

BR
Bjørn

Alwin · May 14, 2020

The actual limits are often setup specific and require manual tuning. With that said, Proxmox VE 6.2 (released this week) includes some increase of these limits.
https://pve.proxmox.com/pipermail/pve-user/2020-May/171669.html

You can also try to use iothreads for the VM disks. By default (off) all disks will be handled by one thread only.

liedekef · Aug 18, 2020

It seems this is still an issue on 6.2. Is there any specific reason why the number of open files for a kvm guest would be limited to 1024? Any decent webserver guest would already get into trouble with this limit since the host blocks most network traffic. Also there doesn't seem to be any "good" way to set the number of open files (unless you change both limits.conf and systemd config by hand).
Even older plain kvm servers I managed had more open files per guest (1048576). So maybe can we at least have a howto on how to correctly and future-proof increase this limit? Even better would be that the default would also increase in a future proxmox version.

bjsko · Aug 19, 2020

liedekef said:
It seems this is still an issue on 6.2. Is there any specific reason why the number of open files for a kvm guest would be limited to 1024? Any decent webserver guest would already get into trouble with this limit since the host blocks most network traffic. Also there doesn't seem to be any "good" way to set the number of open files (unless you change both limits.conf and systemd config by hand).
Even older plain kvm servers I managed had more open files per guest (1048576). So maybe can we at least have a howto on how to correctly and future-proof increase this limit? Even better would be that the default would also increase in a future proxmox version.

I could not see that number of open files changed when updating to 6.2, but as I had been tinkering with it myself, I wasn't sure. I haven't tried a fresh install of 6.2, though.

We use ansible, and as I already had a playbook for "doing stuff" to PVE nodes, I just added the required changes there. To set number of open files, we add a config file under /etc/security/limits.d/ and another under /etc/systemd/system.conf.d/. By adding them in these locations, we hopefully prevent the config to "disappear" during upgrades.

I don't know if this is the preferred way to do it or not, but it seems to work for us.

Tuning is system specific, indeed - but still, some clarification or mentioning tuning parameters like this in the docs or a howto article would have been really helpful.

BR
Bjørn

bjsko · Aug 19, 2020

Alwin said:
The actual limits are often setup specific and require manual tuning. With that said, Proxmox VE 6.2 (released this week) includes some increase of these limits.
https://pve.proxmox.com/pipermail/pve-user/2020-May/171669.html

You can also try to use iothreads for the VM disks. By default (off) all disks will be handled by one thread only.

Thanks for your reply and sorry for totally forgetting to reply...

As I have mentioned earlier in this thread, we added config for number of open files and haven't seen the described issues after that.
We also experimented with iothreads as part of the troubleshooting process, but did not see any difference for this exact issue.

A list of values that was changed with 6.2 (and their before and after values) would also be great!

Having said that, overall we are very happy with the way Proxmox works in our environment and keep on moving systems from our old virtualisation platform to PVE.

Many thanks
Bjørn

Alwin · Aug 19, 2020

bjsko said:
As I have mentioned earlier in this thread, we added config for number of open files and haven't seen the described issues after that.
We also experimented with iothreads as part of the troubleshooting process, but did not see any difference for this exact issue.

For IOthreads more then one disk is needed. When activated, Qemu uses on thread per disk.

bjsko said:
A list of values that was changed with 6.2 (and their before and after values) would also be great!

See here.
https://git.proxmox.com/?p=pve-cont...1;hp=7808afef1ab75950d0043b69239e7b02626b098a

liedekef · Aug 19, 2020

@Alwin : like I said, it would be nice if proxmox somewhere documented on how to increase the open files per process (or provide a command for this, or per server a setting, ...) and tell people that currently this parameter can cause load issues / server hangs if too low. In my case it blocked all disk io (since the fd's were taken by network connections) and even qemu-ga started to freak out at 100% because of this.

Andrey Zentavr · Dec 23, 2021

I have the same issue when I have DB server with 10 virtio drives per VM.

Andrey Zentavr · Dec 23, 2021

The workaround is the next:


1. **/etc/sysctl.d/90-rs-proxmox.conf**:
   ```
   # Default: 1048576
   fs.nr_open = 2097152
   # Default: 8388608
   fs.inotify.max_queued_events = 8388608
   # Default: 65536
   fs.inotify.max_user_instances = 1048576
   # Default: 4194304
   fs.inotify.max_user_watches = 4194304
   # Default: 262144
   vm.max_map_count = 262144
   ```
2. **/etc/security/limits.d/90-rs-proxmox.conf**:
   ```
   *       soft    nofile  1048576
   *       hard    nofile  2097152
   root    soft    nofile  1048576
   root    hard    nofile  2097152
   *       soft    memlock 1048576
   *       hard    memlock 2097152
   ```
3. `mkdir -p /etc/systemd/system/pvedaemon.service.d && touch /etc/systemd/system/pvedaemon.service.d/limits.conf`
   Add the content of **/etc/systemd/system/pvedaemon.service.d/limits.conf**:
   ```ini
   [Service]
   LimitNOFILE=infinity
   LimitMEMLOCK=infinity
   LimitNPROC=infinity
   TasksMax=infinity
   ```
4. `mkdir -p /etc/systemd/system/pve-guests.service.d && touch /etc/systemd/system/pve-guests.service.d/limits.conf`
   Add the content of **/etc/systemd/system/pve-guests.service.d/limits.conf**:
   ```ini
   [Service]
   LimitNOFILE=infinity
   LimitMEMLOCK=infinity
   LimitNPROC=infinity
   TasksMax=infinity
   ```
5. `mkdir -p /etc/systemd/system/pve-ha-lrm.service.d && touch /etc/systemd/system/pve-ha-lrm.service.d/limits.conf`
   Add the content of **/etc/systemd/system/pve-ha-lrm.service.d/limits.conf**:
   ```ini
   [Service]
   LimitNOFILE=infinity
   LimitMEMLOCK=infinity
   LimitNPROC=infinity
   TasksMax=infinity
   ```
6. Restart processes:
   ```
   systemctl daemon-reload
   systemctl restart pvedaemon.service pve-ha-lrm.service
   ```
7. By design, you are not allowed to restart **pve-guests.service** manually. So you might need to reboot your
   hypervisor machine.

Keep in mind that `fs.nr_open` should be greater-or-equal to the value you are going to set via limits.conf.

hotelrwanda · Sep 26, 2022

Andrey Zentavr said:
The workaround is the next:
1. **/etc/sysctl.d/90-rs-proxmox.conf**: ``` # Default: 1048576 fs.nr_open = 2097152 # Default: 8388608 fs.inotify.max_queued_events = 8388608 # Default: 65536 fs.inotify.max_user_instances = 1048576 # Default: 4194304 fs.inotify.max_user_watches = 4194304 # Default: 262144 vm.max_map_count = 262144 ``` 2. **/etc/security/limits.d/90-rs-proxmox.conf**: ``` * soft nofile 1048576 * hard nofile 2097152 root soft nofile 1048576 root hard nofile 2097152 * soft memlock 1048576 * hard memlock 2097152 ``` 3. `mkdir -p /etc/systemd/system/pvedaemon.service.d && touch /etc/systemd/system/pvedaemon.service.d/limits.conf` Add the content of **/etc/systemd/system/pvedaemon.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 4. `mkdir -p /etc/systemd/system/pve-guests.service.d && touch /etc/systemd/system/pve-guests.service.d/limits.conf` Add the content of **/etc/systemd/system/pve-guests.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 5. `mkdir -p /etc/systemd/system/pve-ha-lrm.service.d && touch /etc/systemd/system/pve-ha-lrm.service.d/limits.conf` Add the content of **/etc/systemd/system/pve-ha-lrm.service.d/limits.conf**: ```ini [Service] LimitNOFILE=infinity LimitMEMLOCK=infinity LimitNPROC=infinity TasksMax=infinity ``` 6. Restart processes: ``` systemctl daemon-reload systemctl restart pvedaemon.service pve-ha-lrm.service ``` 7. By design, you are not allowed to restart **pve-guests.service** manually. So you might need to reboot your hypervisor machine.

Keep in mind that `fs.nr_open` should be greater-or-equal to the value you are going to set via limits.conf.

Thank you, this fixed it for us!

Search

Search

Open files issue on PVE node

bjsko

Well-Known Member

Alwin

Proxmox Retired Staff

liedekef

Member

bjsko

Well-Known Member

bjsko

Well-Known Member

Alwin

Proxmox Retired Staff

liedekef

Member

Andrey Zentavr

Active Member

Andrey Zentavr

Active Member

hotelrwanda

Member

We value your privacy