HA cluster problem with more than 100 CTs per node

AhmedF

Renowned Member
Dec 26, 2012
26
1
68
Hi,

I'm running a HA cluster with 8 nodes and a shared NAS device totaling of 1000 CTs , everything is running very smoothly but when I need to reboot one of the nodes , I first stop rgmanager to relocate the HA CTs to other nodes then reboot the node but once it's started up I see these errors in the syslog and CTs are not coming back to this node like setup with cluster.conf with nofailback=0 "that used to work fine before"

Code:
Dec  7 12:21:16 clusterxxx rgmanager[41730]: [pvevm] got empty cluster VM list
Dec  7 12:21:16 cluster3b1 rgmanager[41732]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.
Dec  7 12:21:16 cluster3b1 rgmanager[41733]: [pvevm] CT xxxxx is already stopped
Dec  7 12:21:16 cluster3b1 rgmanager[41728]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.
Dec  7 12:21:16 cluster3b1 rgmanager[41750]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.

am running

Code:
proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-34-pve: 2.6.32-139
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

can you please advise ?
 
Increase the fs.inotify.max_user_instances sysctl value. (Somewhere around twice the amount of containers you want to run should do...)
(And add it to /etc/sysctl.conf to make it permanent)
 
Increase the fs.inotify.max_user_instances sysctl value. (Somewhere around twice the amount of containers you want to run should do...)
(And add it to /etc/sysctl.conf to make it permanent)

Thanks for your reply , will give this a try.
 
That helped and fixed the error with
Code:
Dec  7 12:21:16 cluster3b1 rgmanager[41732]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.

but after rebooting the same node , still getting these errors

Code:
Dec  7 16:22:59 clusterxxx rgmanager[3273]: stop on pvevm "xxxxx" returned 2 (invalid argument(s))
Dec  7 16:22:59 clusterxxx rgmanager[40489]: [pvevm] got empty cluster VM list

Thanks in advance
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!