pve cluster has been unknown state in the VM

Hi,
this sounds like you experience connection issues with at least one node in your cluster. As a first step check the cluster network. Check the journal for errors when this happens journalctl -r -b and check the status of the cluster pvecm status.
 
Hi,

This is result from journalctl -r -b


Code:
Jan 25 22:27:44 srv01.example.com pvestatd[45949]: status update time (10.005 seconds)
Jan 25 22:27:44 srv01.example.com pvestatd[45949]: status update error: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[45949]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:27:44 srv01.example.com pvestatd[44571]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[44571]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[46370]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[49961]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[44571]: ipcc_send_rec[1] failed: Too many open files

This is the result from check status pvecm status

Code:
Cluster information
-------------------
Name:             Cluster
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jan 25 22:33:41 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.da
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.1 (local)
0x00000002          1 192.168.2.1

By the way, I created the cluster over internet with different Public IP
 
By the way, I created the cluster over internet with different Public IP
I don't understand what you mean by that, the cluster network uses local ip addresses 192.168.1.1 and 192.168.1.2. Is this somehow connected via VPN?

In any case, for the cluster a low latency network is required in order for it to work correctly.
ipcc_send_rec[3] failed: Too many open files
Could you post the output of pveversion -v and journalctl -b -u pvestatd.
Also check sysctl fs.inotify, ulimit -n and lsof -np 45949.
 
I don't understand what you mean by that, the cluster network uses local ip addresses 192.168.1.1 and 192.168.1.2. Is this somehow connected via VPN?

In any case, for the cluster a low latency network is required in order for it to work correctly.

Could you post the output of pveversion -v and journalctl -b -u pvestatd.
Also check sysctl fs.inotify, ulimit -n and lsof -np 45949.
Hi,
I'm sorry if I'm making you confused. I mean, the cluster has created using a public IP and connected with public IP, and that IP is assuming the public IP with a different segment.

This is the result: pveversion -v
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-5.15: 7.3-1
pve-kernel-helper: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve2

This is the result of journalctl -b -u pvestatd

Code:
-- The journal begins on Wednesday, 2023-01-25, at 22:17:16 WIB and ends on Thursday, 2023-01-26, at 01:34:32 WIB. --
Jan 25 22:17:16 srv01.example.com pvestatd[54020]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54041]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54041]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: ipcc_send_rec ipcc_send_rec [1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: status update time (10.005 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: status update time (10.005 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: ipcc_send_rec[1] failed: Too many open files

This is the result of sysctl fs.inotify


Code:
fs.inotify.max_queued_events = 8388608
fs.inotify.max_user_instances = 1024
fs.inotify.max_user_watches = 4194304


This is the result of ulimit -n

Code:
1024

For the lsof -n 45949 is not showing anything


If I do restart the host, the cluster running normally but when I am click the other tab, back the error again.
 
Last edited:
Hi,

I don't have metric server, and I already set the max_user_instances to 65536 but still same with unknown state. So what I must do for the server working well ?

Thanks

1675243876174.png