Cluster Greyed Out

grobs

Active Member
Apr 1, 2016
56
0
26
37
France
Hi everyone,

My Proxmox 6 Cluster is in a bad state.

I wanted to add RAM to one of my containers (on which some processes were killed by OOM Killer) using the GUI and it displayed this error:

Parameter verification failed. (400)
memory: unable to hotplug memory: closing file '/sys/fs/cgroup/memory///lxc/215/memory.limit_in_bytes' failed - Invalid argument

A day after this issue, the cluster was in a very strange state:
The quorum is OK (maximum votes) but proxmox seems to be unable to contact the containers (containers' names and infos are greyed out in the GUIand CLI is showing an error). The issue is the same on every node (even on the node I'm currently connected on the GUI).

image.png

Every pct command on every node is showing this error:

pm6-01:/etc/pve# pct list
short read on command socket (16 != 0)

I tried to restart pve-cluster, pvestatd, pvedaemon and pveproxy, /etc/pve is accessible and populated.

proxmox-ve: 6.2-1 (running kernel: 5.3.13-1-pve)
pve-manager: 6.2-4 (running version: 6.2-4/9824574a)
pve-kernel-5.4: 6.2-2
pve-kernel-helper: 6.2-2
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.41-1-pve: 5.4.41-1
pve-kernel-5.4.34-1-pve: 5.4.34-2
pve-kernel-5.3.18-3-pve: 5.3.18-3
pve-kernel-5.3.18-2-pve: 5.3.18-2
pve-kernel-5.3.18-1-pve: 5.3.18-1
pve-kernel-5.3.13-3-pve: 5.3.13-3
pve-kernel-5.3.13-2-pve: 5.3.13-2
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.3.10-1-pve: 5.3.10-1
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.3-pve1
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.15-pve1
libproxmox-acme-perl: 1.0.4
libpve-access-control: 6.1-1
libpve-apiclient-perl: 3.0-3
libpve-common-perl: 6.1-2
libpve-guest-common-perl: 3.0-10
libpve-http-server-perl: 3.0-5
libpve-storage-perl: 6.1-8
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 4.0.2-1
lxcfs: 4.0.3-pve2
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.2-1
pve-cluster: 6.1-8
pve-container: 3.1-6
pve-docs: 6.2-4
pve-edk2-firmware: 2.20200229-1
pve-firewall: 4.1-2
pve-firmware: 3.1-1
pve-ha-manager: 3.0-9
pve-i18n: 2.1-2
pve-qemu-kvm: 5.0.0-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.2-2
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.4-pve1

Could you please help?

Possibly related to:



Regards
 

Attachments

  • image.png
    image.png
    106.4 KB · Views: 10
Last edited:
After upgrading today, I have the same problem :-( All LXC container on all 3 nodes are greyed out. Only the VMs are green.
 
Even if it could solve the issue, reboot isn't a suitable solution in production and this issue should be deeply investigated in my opinion.
 
If you find the combination auf CLI commands o resolve without reboot, I would be very happy to know. I have also one node which I do not want to reboot :) Perhaps it has to do something with new kernel and new LXC, than a reboot is necessary in any case.
 
Even if it could solve the issue, reboot isn't a suitable solution in production and this issue should be deeply investigated in my opinion.


Can you do the following

systemctl restart pvestatd
systemctl restart pveproxy
 
I restarted both services on all 3 nodes - no success. The restarted nodes have no problem and the one not restarted yet has still the problem.
 
Perhaps it has to do something with new kernel and new LXC, than a reboot is necessary in any case.
=> You're right. In this case, if the solution we find is to reboot, the issue shoudn't come back. I'll notice you.


Can you do the following

systemctl restart pvestatd
systemctl restart pveproxy
=> I've already executed those commands on every node of the cluster, as described in the first post, but without any change.

For information, I was'nt able to add RAM to the original container that seems to have caused the issue until I rebooted the LXC container.
The cluster issue is still there though.


I just tried to update a different cluster with the same result: I have to reboot each node.
=> This is interesting... Proxmox Team (@dietmar, @tom ...), are you able to reproduce this? Could you please help us debugging this strange and blocking issue?
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!