HA cluster problem with more than 100 CTs per node

AhmedF

Renowned Member
Dec 26, 2012
26
1
68
Hi,

I'm running a HA cluster with 8 nodes and a shared NAS device totaling of 1000 CTs , everything is running very smoothly but when I need to reboot one of the nodes , I first stop rgmanager to relocate the HA CTs to other nodes then reboot the node but once it's started up I see these errors in the syslog and CTs are not coming back to this node like setup with cluster.conf with nofailback=0 "that used to work fine before"

Code:
Dec  7 12:21:16 clusterxxx rgmanager[41730]: [pvevm] got empty cluster VM list
Dec  7 12:21:16 cluster3b1 rgmanager[41732]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.
Dec  7 12:21:16 cluster3b1 rgmanager[41733]: [pvevm] CT xxxxx is already stopped
Dec  7 12:21:16 cluster3b1 rgmanager[41728]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.
Dec  7 12:21:16 cluster3b1 rgmanager[41750]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.

am running

Code:
proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-34-pve: 2.6.32-139
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

can you please advise ?
 
Increase the fs.inotify.max_user_instances sysctl value. (Somewhere around twice the amount of containers you want to run should do...)
(And add it to /etc/sysctl.conf to make it permanent)
 
Increase the fs.inotify.max_user_instances sysctl value. (Somewhere around twice the amount of containers you want to run should do...)
(And add it to /etc/sysctl.conf to make it permanent)

Thanks for your reply , will give this a try.
 
That helped and fixed the error with
Code:
Dec  7 12:21:16 cluster3b1 rgmanager[41732]: [pvevm] Unable to create new inotify object: Too many open files at /usr/share/perl5/PVE/INotify.pm line 388.

but after rebooting the same node , still getting these errors

Code:
Dec  7 16:22:59 clusterxxx rgmanager[3273]: stop on pvevm "xxxxx" returned 2 (invalid argument(s))
Dec  7 16:22:59 clusterxxx rgmanager[40489]: [pvevm] got empty cluster VM list

Thanks in advance