LXCFS strange problem again...

mstgeo

Member
May 15, 2022
40
1
8
HI!

I am suffering again some issues with LXCFS. Could be, someone had a similiar problem and can help to solve it... Would be great ! ;)

root@linux-server:/# journalctl -u lxcfs


Sep 25 10:05:18 linux-server systemd[1]: Started FUSE filesystem for LXC.
Sep 25 10:05:18 linux-server lxcfs[2601]: Running constructor lxcfs_init to reload liblxcfs
Sep 25 10:05:18 linux-server lxcfs[2601]: mount namespace: 4
Sep 25 10:05:18 linux-server lxcfs[2601]: hierarchies:
Sep 25 10:05:18 linux-server lxcfs[2601]: 0: fd: 5:
Sep 25 10:05:18 linux-server lxcfs[2601]: 1: fd: 6: name=systemd
Sep 25 10:05:18 linux-server lxcfs[2601]: 2: fd: 7: cpu,cpuacct
Sep 25 10:05:18 linux-server lxcfs[2601]: 3: fd: 8: hugetlb
Sep 25 10:05:18 linux-server lxcfs[2601]: 4: fd: 9: perf_event
Sep 25 10:05:18 linux-server lxcfs[2601]: 5: fd: 10: cpuset
Sep 25 10:05:18 linux-server lxcfs[2601]: 6: fd: 11: blkio
Sep 25 10:05:18 linux-server lxcfs[2601]: 7: fd: 12: pids
Sep 25 10:05:18 linux-server lxcfs[2601]: 8: fd: 13: net_cls,net_prio
Sep 25 10:05:18 linux-server lxcfs[2601]: 9: fd: 14: freezer
Sep 25 10:05:18 linux-server lxcfs[2601]: 10: fd: 15: memory
Sep 25 10:05:18 linux-server lxcfs[2601]: 11: fd: 16: devices
Sep 25 10:05:18 linux-server lxcfs[2601]: 12: fd: 17: rdma
Sep 25 10:05:18 linux-server lxcfs[2601]: Kernel supports pidfds
Sep 25 10:05:18 linux-server lxcfs[2601]: Kernel supports swap accounting
Sep 25 10:05:18 linux-server lxcfs[2601]: api_extensions:
Sep 25 10:05:18 linux-server lxcfs[2601]: - cgroups
Sep 25 10:05:18 linux-server lxcfs[2601]: - sys_cpu_online
Sep 25 10:05:18 linux-server lxcfs[2601]: - proc_cpuinfo
Sep 25 10:05:18 linux-server lxcfs[2601]: - proc_diskstats
Sep 25 10:05:18 linux-server lxcfs[2601]: - proc_loadavg
Sep 25 10:05:18 linux-server lxcfs[2601]: - proc_meminfo
Sep 25 10:05:18 linux-server lxcfs[2601]: - proc_stat
Sep 25 10:05:18 linux-server lxcfs[2601]: - proc_swaps
Sep 25 10:05:18 linux-server lxcfs[2601]: - proc_uptime
Sep 25 10:05:18 linux-server lxcfs[2601]: - shared_pidns
Sep 25 10:05:18 linux-server lxcfs[2601]: - cpuview_daemon
Sep 25 10:05:18 linux-server lxcfs[2601]: - loadavg_daemon
Sep 25 10:05:18 linux-server lxcfs[2601]: - pidfds


Sep 26 00:00:14 linux-server lxcfs[2601]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Sep 26 00:00:14 linux-server lxcfs[2601]: utils.c: 291: send_creds: Invalid argument - Failed getting reply from server over socketpair: 2
Sep 26 00:00:16 linux-server lxcfs[2601]: utils.c: 315: send_creds: Connection refused - Failed at sendmsg: 2
Sep 26 00:00:18 linux-server lxcfs[2601]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process


What the heck is this ? Anyone ? ;)

Thanks in advance!

see ya !
 
what exactly is the problem besides those warnings in the logs?
 
HI!

Thanks for a fast answer...

Below is what syslog says...


Sep 26 00:00:14 linux-server lxcfs[2601]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process
Sep 26 00:00:14 linux-server lxcfs[2601]: utils.c: 291: send_creds: Invalid argument - Failed getting reply from server over socketpair: 2
Sep 26 00:00:16 linux-server lxcfs[2601]: utils.c: 315: send_creds: Connection refused - Failed at sendmsg: 2
Sep 26 00:00:16 linux-server spiceproxy[3252]: worker 3253 finished
Sep 26 00:00:17 linux-server pvestatd[59259]: status update time (11.109 seconds)
Sep 26 00:00:18 linux-server lxcfs[2601]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process


Just FYI, this is a big server with +120 LXC and +15 KVM machines... I attached the screen of machine specs...

Hmm... What can it be ?

Maybe I should wait whether it will occur again... ;)

cya!
 

Attachments

  • obraz_2022-09-28_121011313.png
    obraz_2022-09-28_121011313.png
    4.1 KB · Views: 5
Hi,

The issue occured again tonight. It is happening more or less every 2 weeks time... The only solution to have the environment working is to restart a physical node.

Below we have the dump from syslog, hope someone can help in that topic... Thanks in advance !

FYI, backup was running LXC / KVM using vzdump tool from proxmox.

Any ideas how to get rid of this issue and downtimes ? Machnine has +120 LXC / +15 KVM, 2 x Intel Gold 96vCPU / 768 GB of RAM, 6 x NVME. A bacup is done to NFS storage server...


Attachin the errors from syslog in a file here...

see ya !
 

Attachments

  • SYSLOG.LOG
    102.7 KB · Views: 1
those are now completely different errors (that indicate you are running into some kind of resource limit? running processes maybe?) and also the logs only start after the symptoms have appeared, the more interesting part is likely at least a few minutes before that..

is this a 6.4 machine? 6.4 is EOL, so I would suggest upgrading to 7.x..
 
HI!

@fabian We're going to upgrade, however first we need to get rid of this issue... Any ideas what can be adjusted to avoid that ? I suppose it is starting with a backup... but the backup is done every night. So why is this happening every ~2 weeks time ? hmm... ? Any ideas ;)

All backups during that time are OK, no waiting, no congestions, no CPU load, planty of free RAM...

I already boosted the AIO, same I adjusted LXC processes with which we also had problems when our machine got big....

I'll be more than happy if someone can post a wise solution to this ;)

see ya!
 
Last edited:
well, the things you posted initially and the log you posted now are completely different things.. it would be a start to clearly describe what the issue is, include all relevant logs (not only after the problem has started) and anything you modified to deviate from a standard setup.. but likely not much effort will go into analyzing and fixing a rare issue on an EOL version.
 
HI!

What should I provide you then so you could take a look what can it be ? Any ssystem info / logs / etc ?

Thx!

cya!
 
then I'd suggest trying the workaround mentioned there ;)
 
OK, done !

Set up -- /etc/systemd/system.conf:

DefaultTasksMax=infinity
DefaultLimitNOFILE=infinity
DefaultLimitNPROC=infinity
DefaultLimitMEMLOCK=infinity

Let's hope this will solve my case, too.

Thx !
 
HI!

14/15 days and the problem is back again... wthat the heck is this ?

Oct 25 22:43:08 linux-server pvestatd[3189]: command 'lxc-info -n 126 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:08 linux-server pvestatd[3189]: command 'lxc-info -n 195 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:08 linux-server pve-firewall[3174]: status update error: command 'ipset save' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:08 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:43:08 linux-server pvestatd[3189]: command 'lxc-info -n 205 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:08 linux-server pvestatd[3189]: command 'lxc-info -n 228 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:08 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:43:08 linux-server pvestatd[3189]: command 'lxc-info -n 214 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:08 linux-server pvestatd[3189]: command 'lxc-info -n 241 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:08 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:43:09 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:43:09 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:43:09 linux-server pvestatd[3189]: command 'lxc-info -n 178 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:09 linux-server pvestatd[3189]: command 'lxc-info -n 233 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:09 linux-server pvestatd[3189]: command 'lxc-info -n 160 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:09 linux-server pvestatd[3189]: command 'lxc-info -n 125 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:09 linux-server pvestatd[3189]: command 'lxc-info -n 172 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:09 linux-server pvestatd[3189]: command 'lxc-info -n 131 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:43:09 linux-server pvestatd[3189]: command 'lxc-info -n 115 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
/usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 149 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 161 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 150 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 244 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 235 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 155 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 135 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 184 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 207 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 245 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 234 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 138 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 164 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 152 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 132 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 171 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 158 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 185 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 134 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 145 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:07 linux-server pvestatd[3189]: command 'lxc-info -n 168 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
/usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:08 linux-server pvestatd[3189]: fork failed: Resource temporarily unavailable
Oct 25 22:45:08 linux-server pvestatd[3189]: command 'lxc-info -n 205 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:08 linux-server pve-firewall[3174]: status update error: command 'iptables-save' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:08 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:08 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:08 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:08 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:08 linux-server lxcfs[2585]: fuse: error creating thread: Resource temporarily unavailable
Oct 25 22:45:08 linux-server pvestatd[3189]: command 'lxc-info -n 130 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:08 linux-server pvestatd[3189]: command 'lxc-info -n 215 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:08 linux-server pvestatd[3189]: command 'lxc-info -n 106 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:08 linux-server pvestatd[3189]: command 'lxc-info -n 150 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
Oct 25 22:45:08 linux-server pvestatd[3189]: command 'lxc-info -n 161 -p' failed: open3: fork failed: Resource temporarily unavailable at /usr/share/perl5/PVE/Tools.pm line 449.
/usr/share/perl5/PVE/Tools.pm line 449.

Anyone can help here ?

Thanks in advance!

see ya!
 
did you verify the updated limits are actually in place? are you running very large numbers of containers or workloads that would cause spawning a huge number of processes?
 
Hi Fabian,

Yes, all is OK as far as setting limits is concerned. All is done there... I applied them all.

The server behaves well during 14-15 days, then WHILE the VZDUMP backup is taken on 14th / 15th day, we're getting the mentioned errors... Ealier journalctl -u lxcfs is clear...

I was forced to reboot the physical machine on monday the 24th of october at 3:00 a.m when I got a notify from external nagios... All was RED. After that, you can see the below:

-- Logs begin at Tue 2022-10-25 23:05:06 BST, end at Fri 2022-10-28 09:18:23 BST. --
Oct 25 23:05:12 linux-server systemd[1]: Started FUSE filesystem for LXC.
Oct 25 23:05:12 linux-server lxcfs[2652]: Running constructor lxcfs_init to reload liblxcfs
Oct 25 23:05:12 linux-server lxcfs[2652]: mount namespace: 4
Oct 25 23:05:12 linux-server lxcfs[2652]: hierarchies:
Oct 25 23:05:12 linux-server lxcfs[2652]: 0: fd: 5:
Oct 25 23:05:12 linux-server lxcfs[2652]: 1: fd: 6: name=systemd
Oct 25 23:05:12 linux-server lxcfs[2652]: 2: fd: 7: blkio
Oct 25 23:05:12 linux-server lxcfs[2652]: 3: fd: 8: rdma
Oct 25 23:05:12 linux-server lxcfs[2652]: 4: fd: 9: cpu,cpuacct
Oct 25 23:05:12 linux-server lxcfs[2652]: 5: fd: 10: hugetlb
Oct 25 23:05:12 linux-server lxcfs[2652]: 6: fd: 11: freezer
Oct 25 23:05:12 linux-server lxcfs[2652]: 7: fd: 12: perf_event
Oct 25 23:05:12 linux-server lxcfs[2652]: 8: fd: 13: net_cls,net_prio
Oct 25 23:05:12 linux-server lxcfs[2652]: 9: fd: 14: pids
Oct 25 23:05:12 linux-server lxcfs[2652]: 10: fd: 15: devices
Oct 25 23:05:12 linux-server lxcfs[2652]: 11: fd: 16: memory
Oct 25 23:05:12 linux-server lxcfs[2652]: 12: fd: 17: cpuset
Oct 25 23:05:12 linux-server lxcfs[2652]: Kernel supports pidfds
Oct 25 23:05:12 linux-server lxcfs[2652]: Kernel supports swap accounting
Oct 25 23:05:12 linux-server lxcfs[2652]: api_extensions:
Oct 25 23:05:12 linux-server lxcfs[2652]: - cgroups
Oct 25 23:05:12 linux-server lxcfs[2652]: - sys_cpu_online
Oct 25 23:05:12 linux-server lxcfs[2652]: - proc_cpuinfo
Oct 25 23:05:12 linux-server lxcfs[2652]: - proc_diskstats
Oct 25 23:05:12 linux-server lxcfs[2652]: - proc_loadavg
Oct 25 23:05:12 linux-server lxcfs[2652]: - proc_meminfo
Oct 25 23:05:12 linux-server lxcfs[2652]: - proc_stat
Oct 25 23:05:12 linux-server lxcfs[2652]: - proc_swaps
Oct 25 23:05:12 linux-server lxcfs[2652]: - proc_uptime
Oct 25 23:05:12 linux-server lxcfs[2652]: - shared_pidns
Oct 25 23:05:12 linux-server lxcfs[2652]: - cpuview_daemon
Oct 25 23:05:12 linux-server lxcfs[2652]: - loadavg_daemon
Oct 25 23:05:12 linux-server lxcfs[2652]: - pidfds
Oct 27 00:00:17 linux-server lxcfs[2652]: utils.c: 254: recv_creds: Timed out waiting for scm_cred: No such process


Nothing really happens except the last line with one issue. After 14/15 days (ALWAYS) we're getting tons of such errors while the backup is taken.

As far as the server is concerned, yes it is big one and have lot of machines running there.

What we have are: 2 x Intel Gold Xeon 96 cores + 768GB RAM + 2 x NVME for SYSTEM (RAID-1) + 6 x SSD for VM data (RAID-5).

There we are running: 130 LXC VM machines + 12 KVM ones.

The CPU usage during the day is ~ 10%, 230GB out of 768GB RAM is taken, Load average - 29.83,34.69,29.61 -- it is getting 50-70 while the backup is running, disks are 16-18 % busy in atop.

I have no idea why this is happening every 14/15 days only and all the rest, the server is OK.

Is there any solution to this ? Any advice ?

Thanks ! ;)

I can get no sleep because of that... sever is going to be down, thanks God every 2 weeks time :D

see ya !
 
I'd suggest monitoring the number of processes to see if it's gradually rising (if so, find out why/what kind of processes) or making a big jump during the backup (again, you'd need to find out what all the processes are that cause the issue!).

possibly something keeps spawning processes but doesn't clean them up or they end up hanging forever in a blocking state, accumulating..
 
Could it be vzdump causing this ? Should we increase some limits for the backup process, too like we did for lxcfs ealier ?
 
since it's multiple services triggering the limit, my guess is that your whole system is running out of PIDs, but it's hard to tell from afar. it should be pretty obvious if you can monitor the amount of pids and maybe even the full process tree for a while, and especially dump it before rebooting if the issue occurs.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!