Hi,
we experienced an issue with VM's running in Proxmox 6.x where one or more disks from the VM perspective reported 100% busy, but no data was read or written according to iostat. Our storage backend is an external CEPH nautilus cluster.
This seemed to happen if the VM had many filesystems/several disk images attached and high activity. But we were also able to trigger the issue on pretty much any VM by reading and writing a lot of data and maybe run a fstrim or an mkfs. The only way we found to recover the VM was to reboot it.
We have about 250 VM's on two clusters, both on the same version, and could reproduce the issue on either cluster. We were a bit mislead to think it had something to do with the number of disk images for a VM as the issue was first seen in the go-live phase of a VM with more disks than our standard setup - but still only 9 disk images. Obviously the go-live did not happen and we spent a lot of time trying to figure out what was going on.
To cut a long story short; After a while we noticed that when we triggered the issue on a VM, the kvm process on the PVE node for the VM was running with 100% CPU. Running a strace of the kvm process id revealed:
Finally we had an clue.
Looking at /proc/<pid>/limits we saw "Max Open Files" were 1024. Further, counting the file descriptors in /proc/<pid>/fd showed us that as soon as the number of file descriptors went above 1024, the issue occurred on the VM.
We are now setting a higher number of open files pr process in a file in /etc/security/limits.d/ and also for systemd in a file in /etc/systemd/system.conf.d
The question is if we have done something incorrect from the start by not tuning open files or if the default should be set higher by the installation, or at least be mentioned in the documentation (if it isn't already). Another question is what would be a better default that 1024.
Any comment/insights to this would be highly appreciated!
BR
Bjørn
we experienced an issue with VM's running in Proxmox 6.x where one or more disks from the VM perspective reported 100% busy, but no data was read or written according to iostat. Our storage backend is an external CEPH nautilus cluster.
This seemed to happen if the VM had many filesystems/several disk images attached and high activity. But we were also able to trigger the issue on pretty much any VM by reading and writing a lot of data and maybe run a fstrim or an mkfs. The only way we found to recover the VM was to reboot it.
We have about 250 VM's on two clusters, both on the same version, and could reproduce the issue on either cluster. We were a bit mislead to think it had something to do with the number of disk images for a VM as the issue was first seen in the go-live phase of a VM with more disks than our standard setup - but still only 9 disk images. Obviously the go-live did not happen and we spent a lot of time trying to figure out what was going on.
To cut a long story short; After a while we noticed that when we triggered the issue on a VM, the kvm process on the PVE node for the VM was running with 100% CPU. Running a strace of the kvm process id revealed:
Code:
ppoll([{fd=4, events=POLLIN}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}, {fd=10, events=POLLIN}, {fd=11, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=202, events=POLLIN}, {fd=206, events=POLLIN}, {fd=207, events=POLLIN}, {fd=208, events=POLLIN}], 11, {tv_sec=0, tv_nsec=771556}, NULL, 8) = 2 ([{fd=10, revents=POLLIN}, {fd=11, revents=POLLIN}], left {tv_sec=0, tv_nsec=770679})
read(10, "\2\0\0\0\0\0\0\0", 16) = 8
write(10, "\1\0\0\0\0\0\0\0", 8) = 8
accept4(11, 0x7f6f26dc8c60, [128], SOCK_CLOEXEC) = -1 EMFILE (Too many open files)
write(10, "\1\0\0\0\0\0\0\0", 8) = 8
Finally we had an clue.
Looking at /proc/<pid>/limits we saw "Max Open Files" were 1024. Further, counting the file descriptors in /proc/<pid>/fd showed us that as soon as the number of file descriptors went above 1024, the issue occurred on the VM.
We are now setting a higher number of open files pr process in a file in /etc/security/limits.d/ and also for systemd in a file in /etc/systemd/system.conf.d
The question is if we have done something incorrect from the start by not tuning open files or if the default should be set higher by the installation, or at least be mentioned in the documentation (if it isn't already). Another question is what would be a better default that 1024.
Any comment/insights to this would be highly appreciated!
BR
Bjørn