pve cluster has been unknown state in the VM

Hi,
this sounds like you experience connection issues with at least one node in your cluster. As a first step check the cluster network. Check the journal for errors when this happens journalctl -r -b and check the status of the cluster pvecm status.
 
Hi,

This is result from journalctl -r -b


Code:
Jan 25 22:27:44 srv01.example.com pvestatd[45949]: status update time (10.005 seconds)
Jan 25 22:27:44 srv01.example.com pvestatd[45949]: status update error: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[45949]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:27:44 srv01.example.com pvestatd[44571]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[44571]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[46370]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[49961]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:27:44 srv01.example.com pvestatd[44571]: ipcc_send_rec[1] failed: Too many open files

This is the result from check status pvecm status

Code:
Cluster information
-------------------
Name:             Cluster
Config Version:   2
Transport:        knet
Secure auth:      on

Quorum information
------------------
Date:             Wed Jan 25 22:33:41 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000001
Ring ID:          1.da
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   2
Highest expected: 2
Total votes:      2
Quorum:           2
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          1 192.168.1.1 (local)
0x00000002          1 192.168.2.1

By the way, I created the cluster over internet with different Public IP
 
By the way, I created the cluster over internet with different Public IP
I don't understand what you mean by that, the cluster network uses local ip addresses 192.168.1.1 and 192.168.1.2. Is this somehow connected via VPN?

In any case, for the cluster a low latency network is required in order for it to work correctly.
ipcc_send_rec[3] failed: Too many open files
Could you post the output of pveversion -v and journalctl -b -u pvestatd.
Also check sysctl fs.inotify, ulimit -n and lsof -np 45949.
 
I don't understand what you mean by that, the cluster network uses local ip addresses 192.168.1.1 and 192.168.1.2. Is this somehow connected via VPN?

In any case, for the cluster a low latency network is required in order for it to work correctly.

Could you post the output of pveversion -v and journalctl -b -u pvestatd.
Also check sysctl fs.inotify, ulimit -n and lsof -np 45949.
Hi,
I'm sorry if I'm making you confused. I mean, the cluster has created using a public IP and connected with public IP, and that IP is assuming the public IP with a different segment.

This is the result: pveversion -v
Code:
proxmox-ve: 7.3-1 (running kernel: 5.15.83-1-pve)
pve-manager: 7.3-4 (running version: 7.3-4/d69b70d4)
pve-kernel-5.15: 7.3-1
pve-kernel-helper: 7.3-1
pve-kernel-5.15.83-1-pve: 5.15.83-1
ceph-fuse: 14.2.21-1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.3
libproxmox-backup-qemu0: 1.3.1-1
libpve-access-control: 7.3-1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.3-1
libpve-guest-common-perl: 4.2-3
libpve-http-server-perl: 4.1-5
libpve-storage-perl: 7.3-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.0-3
lxcfs: 4.0.12-pve1
novnc-pve: 1.3.0-3
proxmox-backup-client: 2.3.2-1
proxmox-backup-file-restore: 2.3.2-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.0-1
proxmox-widget-toolkit: 3.5.3
pve-cluster: 7.3-1
pve-container: 4.4-2
pve-docs: 7.3-1
pve-edk2-firmware: 3.20220526-1
pve-firewall: 4.2-7
pve-firmware: 3.6-2
pve-ha-manager: 3.5.1
pve-i18n: 2.8-1
pve-qemu-kvm: 7.1.0-4
pve-xtermjs: 4.16.0-1
qemu-server: 7.3-2
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+2
vncterm: 1.7-1
zfsutils-linux: 2.1.7-pve2

This is the result of journalctl -b -u pvestatd

Code:
-- The journal begins on Wednesday, 2023-01-25, at 22:17:16 WIB and ends on Thursday, 2023-01-26, at 01:34:32 WIB. --
Jan 25 22:17:16 srv01.example.com pvestatd[54020]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54041]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54041]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54062]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49522]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[49547]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: status update time (10.004 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: ipcc_send_rec ipcc_send_rec [1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[46245]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57085]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: status update time (10.005 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: status update time (10.005 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[48158]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[54597]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[55018]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: ipcc_send_rec[1] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: ipcc_send_rec[2] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[57066]: ipcc_send_rec[3] failed: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: can't lock file '/var/log/pve/tasks/.active.lock' - got timeout
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: status update error: Too many open files
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: status update time (10.003 seconds)
Jan 25 22:17:16 srv01.example.com pvestatd[56426]: ipcc_send_rec[1] failed: Too many open files

This is the result of sysctl fs.inotify


Code:
fs.inotify.max_queued_events = 8388608
fs.inotify.max_user_instances = 1024
fs.inotify.max_user_watches = 4194304


This is the result of ulimit -n

Code:
1024

For the lsof -n 45949 is not showing anything


If I do restart the host, the cluster running normally but when I am click the other tab, back the error again.
 
Last edited:
Hi,

I don't have metric server, and I already set the max_user_instances to 65536 but still same with unknown state. So what I must do for the server working well ?

Thanks

1675243876174.png
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!