/etc/pve permissions

Marco Barbera

Member
Jan 29, 2020
5
0
21
52
Hello,

I'm new to Proxmox and I'm still testing this Environment that I like very much...I found something strange on a node that I update from VE5 to Ve6. When I try to join the VE6 cluster with other 2 nodes, there are some issue and the node seems to join but It doesn't and the mai cluster with node 1 and 2 is working very slowly and there is a red cross on node 3 and the shell access from GUI is not working for every nodes.

After investigating I found a difference from node1,2 and 3. In the node 3 the permissions in the folder /etc/pve are different from the other two nodes and I think that in this way it not possible to add the authorized_keys to /etc/pve/priv to make the node correctly join the cluster.

These are the permission for pve folder in nodes 1 and 2:

-rw------- 1 root www-data 1.7K Jan 29 10:33 authkey.key
-rw------- 1 root www-data 1.4K Jan 28 19:12 authorized_keys
-rw------- 1 root www-data 2.0K Jan 4 18:59 known_hosts
drwx------ 2 root www-data 0 Jan 1 10:30 lock
-rw------- 1 root www-data 3.2K Jan 1 10:30 pve-root-ca.key
-rw------- 1 root www-data 3 Jan 4 18:59 pve-root-ca.srl


and these are permissions for node 3:

-r-------- 1 root www-data 1,7K gen 28 20:02 authkey.key
-r-------- 1 root www-data 397 gen 28 20:02 authorized_keys
-r-------- 1 root www-data 788 gen 28 20:02 known_hosts
dr-x------ 2 root www-data 0 gen 28 20:02 lock
-r-------- 1 root www-data 3,2K gen 28 20:02 pve-root-ca.key
-r-------- 1 root www-data 3 gen 28 20:02 pve-root-ca.srl

How can I add -w permission and add the keys to authorized_keys? Why are they different? Is this the issue that does not permit to node 3 to join and worl correctly in the cluster?

Thanks in advance for help

Marco
 
After investigating I found a difference from node1,2 and 3. In the node 3 the permissions in the folder /etc/pve are different from the other two nodes and I think that in this way it not possible to add the authorized_keys to /etc/pve/priv to make the node correctly join the cluster.

That means that node three is not quorate, i.e., it's cluster communication with the two other node fails.

How can I add -w permission and add the keys to authorized_keys? Why are they different? Is this the issue that does not permit to node 3 to join and worl correctly in the cluster?

You need to ensure that the corosync (cluster communication daemon) configuration is correct on the third node, and that the corosync and pve-cluster services are correctly running on that node.

Code:
systemctl status corosync pve-cluster
# from node 1 or 2:
cat /etc/pve/corosync.conf
# from node 3 (note the different path):
cat /etc/corosync/corosync.conf
 
Hello,

here the results:

NODE 3:
● corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-01-29 11:47:06 UTC; 4h 7min ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 23950 (corosync)
Tasks: 9 (limit: 4915)
Memory: 144.6M
CGroup: /system.slice/corosync.service
└─23950 /usr/sbin/corosync -f

gen 29 15:54:05 ns3162395 corosync[23950]: [QUORUM] Members[1]: 3
gen 29 15:54:05 ns3162395 corosync[23950]: [MAIN ] Completed service synchronization, ready to provide service.
gen 29 15:54:08 ns3162395 corosync[23950]: [TOTEM ] A new membership (3.6af60) was formed. Members
gen 29 15:54:08 ns3162395 corosync[23950]: [CPG ] downlist left_list: 0 received
gen 29 15:54:08 ns3162395 corosync[23950]: [QUORUM] Members[1]: 3
gen 29 15:54:08 ns3162395 corosync[23950]: [MAIN ] Completed service synchronization, ready to provide service.
gen 29 15:54:12 ns3162395 corosync[23950]: [TOTEM ] A new membership (3.6af74) was formed. Members
gen 29 15:54:12 ns3162395 corosync[23950]: [CPG ] downlist left_list: 0 received
gen 29 15:54:12 ns3162395 corosync[23950]: [QUORUM] Members[1]: 3
gen 29 15:54:12 ns3162395 corosync[23950]: [MAIN ] Completed service synchronization, ready to provide service.

● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2020-01-29 11:52:48 UTC; 4h 1min ago
Process: 25055 ExecStart=/usr/bin/pmxcfs (code=exited, status=0/SUCCESS)
Main PID: 25062 (pmxcfs)
Tasks: 6 (limit: 4915)
Memory: 10.7M
CGroup: /system.slice/pve-cluster.service
└─25062 /usr/bin/pmxcfs

gen 29 11:52:47 ns3162395 pmxcfs[25062]: [status] notice: update cluster info (cluster name SPWCluster, version = 3)
gen 29 11:52:48 ns3162395 systemd[1]: Started The Proxmox VE cluster filesystem.
gen 29 11:52:48 ns3162395 pmxcfs[25062]: [dcdb] notice: members: 3/25062
gen 29 11:52:48 ns3162395 pmxcfs[25062]: [dcdb] notice: all data is up to date
gen 29 11:52:48 ns3162395 pmxcfs[25062]: [status] notice: members: 3/25062
gen 29 11:52:48 ns3162395 pmxcfs[25062]: [status] notice: all data is up to date
gen 29 12:52:48 ns3162395 pmxcfs[25062]: [dcdb] notice: data verification successful
gen 29 13:52:50 ns3162395 pmxcfs[25062]: [dcdb] notice: data verification successful
gen 29 14:52:48 ns3162395 pmxcfs[25062]: [dcdb] notice: data verification successful
gen 29 15:52:47 ns3162395 pmxcfs[25062]: [dcdb] notice: data verification successful


NODE 2:
corosync.service - Corosync Cluster Engine
Loaded: loaded (/lib/systemd/system/corosync.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2020-01-04 18:59:10 UTC; 3 weeks 3 days ago
Docs: man:corosync
man:corosync.conf
man:corosync_overview
Main PID: 4027 (corosync)
Tasks: 9 (limit: 4915)
Memory: 5.5G
CGroup: /system.slice/corosync.service
└─4027 /usr/sbin/corosync -f

Jan 29 15:56:04 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 759 ms
Jan 29 15:56:05 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 1801 ms
Jan 29 15:56:07 SPWeb02proxmox6 corosync[4027]: [TOTEM ] A new membership (1.6b21c) was formed. Members
Jan 29 15:56:08 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 759 ms
Jan 29 15:56:09 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 1801 ms
Jan 29 15:56:10 SPWeb02proxmox6 corosync[4027]: [TOTEM ] A new membership (1.6b230) was formed. Members
Jan 29 15:56:11 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 759 ms
Jan 29 15:56:12 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 1801 ms
Jan 29 15:56:14 SPWeb02proxmox6 corosync[4027]: [TOTEM ] A new membership (1.6b244) was formed. Members
Jan 29 15:56:14 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 759 ms

● pve-cluster.service - The Proxmox VE cluster filesystem
Loaded: loaded (/lib/systemd/system/pve-cluster.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2020-01-04 18:59:10 UTC; 3 weeks 3 days ago
Main PID: 4035 (pmxcfs)
Tasks: 10 (limit: 4915)
Memory: 68.1M
CGroup: /system.slice/pve-cluster.service
└─4035 /usr/bin/pmxcfs

Jan 29 15:56:12 SPWeb02proxmox6 pmxcfs[4035]: [dcdb] notice: cpg_send_message retry 80
Jan 29 15:56:12 SPWeb02proxmox6 pmxcfs[4035]: [status] notice: cpg_send_message retry 80
Jan 29 15:56:13 SPWeb02proxmox6 pmxcfs[4035]: [dcdb] notice: cpg_send_message retry 90
Jan 29 15:56:13 SPWeb02proxmox6 pmxcfs[4035]: [status] notice: cpg_send_message retry 90
Jan 29 15:56:14 SPWeb02proxmox6 pmxcfs[4035]: [dcdb] notice: cpg_send_message retry 100
Jan 29 15:56:14 SPWeb02proxmox6 pmxcfs[4035]: [dcdb] notice: cpg_send_message retried 100 times
Jan 29 15:56:14 SPWeb02proxmox6 pmxcfs[4035]: [dcdb] crit: cpg_send_message failed: 6
Jan 29 15:56:14 SPWeb02proxmox6 pmxcfs[4035]: [status] notice: cpg_send_message retry 100
Jan 29 15:56:14 SPWeb02proxmox6 pmxcfs[4035]: [status] notice: cpg_send_message retried 100 times
Jan 29 15:56:14 SPWeb02proxmox6 pmxcfs[4035]: [status] crit: cpg_send_message failed: 6




NODE 3:
root@ns3162395:~# cat /etc/corosync/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: SPWeb01proxmox6
nodeid: 1
quorum_votes: 1
ring0_addr: 188.165.223.130
}
node {
name: SPWeb02proxmox6
nodeid: 2
quorum_votes: 1
ring0_addr: 51.68.181.11
}
node {
name: ns3162395
nodeid: 3
quorum_votes: 1
ring0_addr: 188.165.233.174
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: SPWCluster
config_version: 3
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}



NODE 2:
root@SPWeb02proxmox6:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: SPWeb01proxmox6
nodeid: 1
quorum_votes: 1
ring0_addr: 188.165.223.130
}
node {
name: SPWeb02proxmox6
nodeid: 2
quorum_votes: 1
ring0_addr: 51.68.181.11
}
node {
name: ns3162395
nodeid: 3
quorum_votes: 1
ring0_addr: 188.165.233.174
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: SPWCluster
config_version: 3
interface {
linknumber: 0
}
ip_version: ipv4-6
secauth: on
version: 2
}


Why the permissions are different from NODE 1 and 2 from 3 in the /etc/priv/ ?

Thanks

Marco
 
Why the permissions are different from NODE 1 and 2 from 3 in the /etc/priv/ ?
They are different because that node three is not quorate, i.e., it's cluster communication with the two other node fails.

So, can you post also pveversion -v from all nodes, to ensure the update went OK. Speaking of that, did you strictly follow the upgrade documentation at: https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0#Actions_step-by-step

From the IPs (could make sense to censor them a bit) I see that they are not in a private LAN, does UDP traffic from port 5404 and 5405 and work?

Jan 29 15:56:04 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 759 ms
Jan 29 15:56:05 SPWeb02proxmox6 corosync[4027]: [TOTEM ] Token has not been received in 1801 ms
Jan 29 15:56:07 SPWeb02proxmox6 corosync[4027]: [TOTEM ] A new membership (1.6b21c) was formed. Members

Above is not normal, something with the network is off..
 
They are different because that node three is not quorate, i.e., it's cluster communication with the two other node fails.

Ok

So, can you post also pveversion -v from all nodes, to ensure the update went OK. Speaking of that, did you strictly follow the upgrade documentation at: https://pve.proxmox.com/wiki/Upgrade_from_5.x_to_6.0#Actions_step-by-step

I follow exactly this article. At the end of the message there is the result of pveversion -v


From the IPs (could make sense to censor them a bit) I see that they are not in a private LAN, does UDP traffic from port 5404 and 5405 and work?

I need to check

Above is not normal, something with the network is off..

How can I verify? Server con ping and connect via SSH each other without any problem...


PVEVERSION - V

NODE 1:
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-helper: 6.1-2
pve-kernel-5.3: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-10
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-2
pve-cluster: 6.1-3
pve-container: 3.0-18
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 4.3.0-1
pve-zsync: 2.0-1
qemu-server: 6.1-4
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

NODE 2:
proxmox-ve: 6.1-2 (running kernel: 5.3.13-1-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-helper: 6.1-2
pve-kernel-5.3: 6.1-1
pve-kernel-5.0: 6.0-11
pve-kernel-5.3.13-1-pve: 5.3.13-1
pve-kernel-5.0.21-5-pve: 5.0.21-10
ceph-fuse: 12.2.11+dfsg1-2.1+b1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-10
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve3
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-2
pve-cluster: 6.1-3
pve-container: 3.0-18
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-9
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-3
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 4.3.0-1
pve-zsync: 2.0-1
qemu-server: 6.1-4
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

NODE 3:
proxmox-ve: 6.1-2 (running kernel: 5.3.13-2-pve)
pve-manager: 6.1-5 (running version: 6.1-5/9bf06119)
pve-kernel-5.3: 6.1-2
pve-kernel-helper: 6.1-2
pve-kernel-4.15: 5.4-12
pve-kernel-5.3.13-2-pve: 5.3.13-2
pve-kernel-4.15.18-24-pve: 4.15.18-52
ceph-fuse: 12.2.12-pve1
corosync: 3.0.2-pve4
criu: 3.11-3
glusterfs-client: 5.5-3
ifupdown: 0.8.35+pve1
ksm-control-daemon: 1.3-1
libjs-extjs: 6.0.1-10
libknet1: 1.13-pve1
libpve-access-control: 6.0-5
libpve-apiclient-perl: 3.0-2
libpve-common-perl: 6.0-11
libpve-guest-common-perl: 3.0-3
libpve-http-server-perl: 3.0-3
libpve-storage-perl: 6.1-3
libqb0: 1.0.5-1
libspice-server1: 0.14.2-4~pve6+1
lvm2: 2.03.02-pve4
lxc-pve: 3.2.1-1
lxcfs: 3.0.3-pve60
novnc-pve: 1.1.0-1
proxmox-mini-journalreader: 1.1-1
proxmox-widget-toolkit: 2.1-2
pve-cluster: 6.1-3
pve-container: 3.0-19
pve-docs: 6.1-3
pve-edk2-firmware: 2.20191127-1
pve-firewall: 4.0-10
pve-firmware: 3.0-4
pve-ha-manager: 3.0-8
pve-i18n: 2.0-4
pve-qemu-kvm: 4.1.1-2
pve-xtermjs: 4.3.0-1
qemu-server: 6.1-4
smartmontools: 7.1-pve2
spiceterm: 3.1-1
vncterm: 1.6-1
zfsutils-linux: 0.8.3-pve1

How can I remove the NODE 3 from cluster to check if if works fine again?

Thanks

Marco
 
Hello,

I was frustraded and I update all from shell then reboot all the nodes in "windows style"...

Now The Cluster has recover and all the Nodes are correctly show in the GUI.

The problem is on NODE 3 (the last added) where when try to click on the node (from GUI) an error occourred with the following message;

tls_process_server_certificate: certificate verify failed (596)

What's wrong? How can I remove the NODE 3 from cluster and then retry to add again?

Thanks

Marco
 
UPDATE:

I reboot the NODE 3 then I Update the certs "pvecm updatecerts" and It seems to work Now...

Thanks for all and I will continue to set the environment

Marco
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!