Hi,
we switched on a long time off member of our cluster.
After this we were not able to login on all members via the web GUI.
We solved this by stopping corosync on all servers an started on one pair with setting expected nodes to 2 and bring up one corosync after an other.
This worked and we were able to login again.
But ...
one server shows a red cross in the web GUI and is only working by half.
It has a qourum:
But if I try, for example, to unlock a qm, I get:
Or when I do something on the web GUI (try to migrate the running VM to an other node) I get :
But I think this is a false message, since I have a quorum.
If I look at the file permissions, due to permission denied message, I see the following:
Normally root should have w permissions, compared to the other nodes.
Any idea how I can solve this (without rebooting this node)?
We are running:
Due to the problems with migrations we are still at kernel 5.11.22-3-pve and did not updated the cluster.
Best regards.
we switched on a long time off member of our cluster.
After this we were not able to login on all members via the web GUI.
We solved this by stopping corosync on all servers an started on one pair with setting expected nodes to 2 and bring up one corosync after an other.
This worked and we were able to login again.
But ...
one server shows a red cross in the web GUI and is only working by half.
It has a qourum:
Code:
Quorum information
------------------
Date: Thu Nov 2 09:42:00 2023
Quorum provider: corosync_votequorum
Nodes: 18
Node ID: 0x00000008
Ring ID: 2.6a29
Quorate: Yes
Votequorum information
----------------------
Expected votes: 18
Highest expected: 18
Total votes: 18
Quorum: 10
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000002 1 192.168.248.2
0x00000003 1 192.168.248.3
0x00000004 1 192.168.248.4
0x00000005 1 192.168.248.5
0x00000006 1 192.168.248.6
0x00000007 1 192.168.248.7
0x00000008 1 192.168.248.8 (local)
0x00000009 1 192.168.248.41
0x0000000a 1 192.168.248.42
0x0000000b 1 192.168.248.43
0x0000000c 1 192.168.248.44
0x0000000d 1 192.168.248.45
0x0000000e 1 192.168.248.46
0x0000000f 1 192.168.248.81
0x00000011 1 192.168.248.83
0x00000013 1 192.168.249.210
0x00000014 1 192.168.249.209
0x00000016 1 192.168.249.212
But if I try, for example, to unlock a qm, I get:
Code:
root@pk-pm-cpu-08:~# qm unlock 116
unable to open file '/etc/pve/nodes/pk-pm-cpu-08/qemu-server/116.conf.tmp.3150701' - Permission denied
Or when I do something on the web GUI (try to migrate the running VM to an other node) I get :
Code:
cluster not ready - no quorum? (500)
If I look at the file permissions, due to permission denied message, I see the following:
Code:
drwxr-xr-x 2 root www-data 0 Jan 1 1970 .
drwxr-xr-x 100 root root 12288 Nov 2 09:42 ..
-r--r----- 1 root www-data 451 Nov 1 11:17 authkey.pub
-r--r----- 1 root www-data 451 Nov 1 11:17 authkey.pub.old
-r--r----- 1 root www-data 442 Apr 15 2020 ceph.conf
-r--r----- 1 root www-data 10685 Jan 1 1970 .clusterlog
-r--r----- 1 root www-data 2278 Feb 28 2023 corosync.conf
-r--r----- 1 root www-data 58 Mar 8 2021 datacenter.cfg
-rw-r----- 1 root www-data 2 Jan 1 1970 .debug
dr-xr-xr-x 2 root www-data 0 Jan 29 2020 firewall
dr-xr-xr-x 2 root www-data 0 Mar 13 2021 ha
-r--r----- 1 root www-data 159 Jul 19 09:25 jobs.cfg
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 local -> nodes/pk-pm-cpu-08
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 lxc -> nodes/pk-pm-cpu-08/lxc
-r--r----- 1 root www-data 1498 Jan 1 1970 .members
dr-xr-xr-x 2 root www-data 0 Nov 18 2019 nodes
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 openvz -> nodes/pk-pm-cpu-08/openvz
dr-x------ 2 root www-data 0 Nov 18 2019 priv
-r--r----- 1 root www-data 2074 Nov 18 2019 pve-root-ca.pem
-r--r----- 1 root www-data 1679 Nov 18 2019 pve-www.key
lr-xr-xr-x 1 root www-data 0 Jan 1 1970 qemu-server -> nodes/pk-pm-cpu-08/qemu-server
-r--r----- 1 root www-data 71 Aug 3 16:29 replication.cfg
-r--r----- 1 root www-data 727 Jan 1 1970 .rrd
dr-xr-xr-x 2 root www-data 0 Mar 13 2021 sdn
-r--r----- 1 root www-data 1564 Jul 1 2022 storage.cfg
-r--r----- 1 root www-data 2155 Aug 3 16:29 user.cfg
-r--r----- 1 root www-data 5391 Jan 1 1970 .version
dr-xr-xr-x 2 root www-data 0 Mar 8 2021 virtual-guest
-r--r----- 1 root www-data 7117 Jan 1 1970 .vmlist
-r--r----- 1 root www-data 120 Aug 3 16:29 vzdump.cron
Any idea how I can solve this (without rebooting this node)?
We are running:
Code:
proxmox-ve: 7.1-1 (running kernel: 5.11.22-3-pve)
pve-manager: 7.1-11 (running version: 7.1-11/8d529482)
pve-kernel-5.15: 7.1-13
pve-kernel-helper: 7.1-13
pve-kernel-5.13: 7.1-9
pve-kernel-5.15.27-1-pve: 5.15.27-1
pve-kernel-5.15.19-2-pve: 5.15.19-3
pve-kernel-5.15.19-1-pve: 5.15.19-1
pve-kernel-5.13.19-6-pve: 5.13.19-14
pve-kernel-5.11.22-3-pve: 5.11.22-7
ceph: 16.2.7
ceph-fuse: 16.2.7
corosync: 3.1.5-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.22-pve2
libproxmox-acme-perl: 1.4.1
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.1-6
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.1-5
libpve-guest-common-perl: 4.1-1
libpve-http-server-perl: 4.1-1
libpve-storage-perl: 7.1-1
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.11-1
lxcfs: 4.0.11-pve1
novnc-pve: 1.3.0-2
openvswitch-switch: 2.15.0+ds1-2
proxmox-backup-client: 2.1.5-1
proxmox-backup-file-restore: 2.1.5-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.4-7
pve-cluster: 7.1-3
pve-container: 4.1-4
pve-docs: 7.1-2
pve-edk2-firmware: 3.20210831-2
pve-firewall: 4.2-5
pve-firmware: 3.3-6
pve-ha-manager: 3.3-3
pve-i18n: 2.6-2
pve-qemu-kvm: 6.1.1-2
pve-xtermjs: 4.16.0-1
qemu-server: 7.1-4
smartmontools: 7.2-pve2
spiceterm: 3.2-2
swtpm: 0.7.1~bpo11+1
vncterm: 1.7-1
zfsutils-linux: 2.1.2-pve1
Best regards.
Last edited: