This problem is driving me nuts.
We have 3 new servers where we installed PVE 5.1-36.
I tried to start 32 LXC containers at a time through API and it seems like there's a new parameter in the API for creating containers. Instead of passing --cpulimit X now I am required to pass --cores X. That's fine. we made the adjustments and ok. We're happy that PVE 5.1 is gonna work out! Nope.
Problem 1) After creating containers I encountered the following error:
Googled the shit out of this error and figured out that we should increase the following variables in /etc/sysctl.conf. For other people encountering the same problem here's my two cents:
Note: Proxmox Team, would it be possible to consider including these variables in /etc/sysctl.d/pve.conf by default?
Problem 2) Everytime the server CRASHED after a certain time (20min, 30min)...
The last log entries I saw in /var/log/syslog in the reported times of the crash were:
So then again, googled a lot and found someone in this forum talking about removing the SWAP partition from /etc/fstab. Did that and of course after a reboot no mounted swap partition anymore. Is this good or bad? No idea. One thing is for sure. Warning about timeout waiting on that device disappeared. Expected, since it's not mounted anymore.
I was almost about to get happy... If only...
Problem 3) Start log of one LXC Container. Notice that there’s an EXT4-fs warning (device loopXX): ext4_multi_mount_protect
Does this,
mean any problem? After waiting a bit the containers start correctly.
The real nightmare starts after about 30 minutes of the HOST operating...
It crashes! No network, no ssh, no IPMI (KVM) no access at all! Only after a full reboot I can recover the server.
I read a lot in this forum and I've seen people reporting their pveperf results. I decided to try that and here's what I got:
I am worried that this "unable to open HD at /usr/bin/pveperf" could mean that something's really wrong with my disks or any other hardware related problem and might create problems in the future.
Could you guys share your comments whether this would be something to be worried about? Also I am still monitoring the application to check for CRASHES. Are there any known issues in the new PVE 5.1 kernel with multiple containers?
We have 3 new servers where we installed PVE 5.1-36.
I tried to start 32 LXC containers at a time through API and it seems like there's a new parameter in the API for creating containers. Instead of passing --cpulimit X now I am required to pass --cores X. That's fine. we made the adjustments and ok. We're happy that PVE 5.1 is gonna work out! Nope.
Problem 1) After creating containers I encountered the following error:
Code:
Nov 8 02:55:00 marshall pvesr[28315]: Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 390.
Googled the shit out of this error and figured out that we should increase the following variables in /etc/sysctl.conf. For other people encountering the same problem here's my two cents:
Code:
fs.inotify.max_user_watches=1048576
fs.inotify.max_user_instances=8192
Note: Proxmox Team, would it be possible to consider including these variables in /etc/sysctl.d/pve.conf by default?
Problem 2) Everytime the server CRASHED after a certain time (20min, 30min)...
The last log entries I saw in /var/log/syslog in the reported times of the crash were:
So then again, googled a lot and found someone in this forum talking about removing the SWAP partition from /etc/fstab. Did that and of course after a reboot no mounted swap partition anymore. Is this good or bad? No idea. One thing is for sure. Warning about timeout waiting on that device disappeared. Expected, since it's not mounted anymore.
I was almost about to get happy... If only...
Problem 3) Start log of one LXC Container. Notice that there’s an EXT4-fs warning (device loopXX): ext4_multi_mount_protect
Does this,
Code:
EXT4-fs warning: ext4_multi_mount_protect:324: MMP interval 42 higher than expected, please wait.
mean any problem? After waiting a bit the containers start correctly.
The real nightmare starts after about 30 minutes of the HOST operating...
It crashes! No network, no ssh, no IPMI (KVM) no access at all! Only after a full reboot I can recover the server.
I read a lot in this forum and I've seen people reporting their pveperf results. I decided to try that and here's what I got:
Code:
CPU BOGOMIPS: 38410.86
REGEX/SECOND: 2271408
HD SIZE: 1813.94 GB (/dev/root)
unable to open HD at /usr/bin/pveperf line 150.
I am worried that this "unable to open HD at /usr/bin/pveperf" could mean that something's really wrong with my disks or any other hardware related problem and might create problems in the future.
Could you guys share your comments whether this would be something to be worried about? Also I am still monitoring the application to check for CRASHES. Are there any known issues in the new PVE 5.1 kernel with multiple containers?
Last edited: