Node web interface not responding, pve-ssl.pem' - No such file or director. Strange cluster bevhavio

Dec 26, 2018
138
3
23
36
Soon going into production, so i have to figure this one out.
We have 3 nodes. All was working before.
The web interface of node 2 is not responding.
In the cluster information second of the other two nodes the "Join cluster" i greyed out.
But from the column on the left it all seems fine.

Selection_006.png


root@proxmox1:/etc/pve/nodes/proxmox3# pvesh get /cluster/config/join
unable to read '/etc/pve/nodes/proxmox2/pve-ssl.pem' - No such file or directory


root@proxmox2:/etc/pve/nodes/proxmox1# ls -lah
total 1,5K
drwxr-xr-x 2 root www-data 0 april 8 09:11 .
drwxr-xr-x 2 root www-data 0 april 8 09:11 ..
-rw-r----- 1 root www-data 83 april 24 09:51 lrm_status
drwxr-xr-x 2 root www-data 0 april 8 09:11 lxc
drwxr-xr-x 2 root www-data 0 april 8 09:11 openvz
drwx------ 2 root www-data 0 april 8 09:11 priv
-rw-r----- 1 root www-data 1,7K april 8 09:11 pve-ssl.key
-rw-r----- 1 root www-data 1,7K april 8 09:11 pve-ssl.pem
drwxr-xr-x 2 root www-data 0 april 8 09:11 qemu-server


root@proxmox2:/etc/pve/nodes/proxmox2# ls -lah
total 512
drwxr-xr-x 2 root www-data 0 april 10 14:53 .
drwxr-xr-x 2 root www-data 0 april 8 09:11 ..
-rw-r----- 1 root www-data 83 april 24 10:02 lrm_status
drwxr-xr-x 2 root www-data 0 april 10 14:53 lxc
drwxr-xr-x 2 root www-data 0 april 10 14:53 qemu-server

root@proxmox1:/etc/pve/nodes/proxmox3# ls -lah
total 1,5K
drwxr-xr-x 2 root www-data 0 april 8 09:41 .
drwxr-xr-x 2 root www-data 0 april 8 09:11 ..
-rw-r----- 1 root www-data 83 april 24 09:51 lrm_status
drwxr-xr-x 2 root www-data 0 april 8 09:41 lxc
drwxr-xr-x 2 root www-data 0 april 8 09:41 openvz
drwx------ 2 root www-data 0 april 8 09:41 priv
-rw-r----- 1 root www-data 1,7K april 8 09:41 pve-ssl.key
-rw-r----- 1 root www-data 1,7K april 8 09:41 pve-ssl.pem
drwxr-xr-x 2 root www-data 0 april 8 09:41 qemu-server
 
Don't know if this is a coincidence but in the log folder:
-rw-r----- 1 root adm 1,3K april 10 14:53 user.log.1
-rw-r----- 1 root adm 12K april 10 14:53 mail.log.1
-rw-r----- 1 root adm 12K april 10 14:53 mail.info.1

3 log files were dated the same as the changes to proxmox2 files in /etc/pve/nodes/proxmox2

Only unusual may be from user.log.1
Apr 8 09:19:22 proxmox2 lvm[694]: Monitoring thin pool pve-data.
Apr 8 09:43:56 proxmox2 lvm[693]: Monitoring thin pool pve-data.
Apr 8 10:27:29 proxmox2 lvm[754]: Monitoring thin pool pve-data.
Apr 8 14:53:07 proxmox2 dmeventd[754]: No longer monitoring thin pool pve-data.
Apr 8 14:53:07 proxmox2 lvm[754]: Monitoring thin pool pve-data-tpool.
Apr 8 14:59:37 proxmox2 lvm[754]: WARNING: Thin pool pve-data-tpool data is now 82.01% full.
Apr 8 14:59:57 proxmox2 lvm[754]: WARNING: Thin pool pve-data-tpool data is now 86.26% full.
Apr 8 15:00:17 proxmox2 lvm[754]: WARNING: Thin pool pve-data-tpool data is now 90.53% full.
Apr 8 15:00:47 proxmox2 lvm[754]: WARNING: Thin pool pve-data-tpool data is now 96.80% full.
Apr 8 15:01:07 proxmox2 lvm[754]: WARNING: Thin pool pve-data-tpool data is now 100.00% full.
Apr 10 10:03:09 proxmox2 lvm[824]: Monitoring thin pool pve-data-tpool.
Apr 10 10:09:32 proxmox2 lvm[751]: Monitoring thin pool pve-data-tpool.
Apr 10 10:13:31 proxmox2 lvm[759]: Monitoring thin pool pve-data-tpool.
Apr 10 10:26:18 proxmox2 lvm[796]: Monitoring thin pool pve-data-tpool.
Apr 10 14:05:31 proxmox2 lvm[775]: Monitoring thin pool pve-data-tpool.
Apr 10 14:12:44 proxmox2 lvm[769]: Monitoring thin pool pve-data-tpool.
Apr 10 14:53:09 proxmox2 lvm[785]: Monitoring thin pool pve-data-tpool.
 

Attachments

  • upload_2019-4-24_10-19-32.png
    upload_2019-4-24_10-19-32.png
    95.3 KB · Views: 6
Hi,
please try to regenerate your SSL certificates by running
Code:
pvecm updatecerts --force
systemctl restart pveproxy.service
 
I overlooked that the status says 'standalone node'. What's the output of `pvecm satus` and `cat /etc/pve/corosync.conf`?
 
First. this might be related to the fact that i unsuccessfully tried to remove the proxmox2 node using the NodeId "0x00000002" and then using the name "192.168.99.166" The commands shown below returned nothing.
https://forum.proxmox.com/threads/pvecm-nodes-does-not-show-correct-nodename.53265

112 pvecm delnode 192.168.99.166
113 pvecm delnode 192.168.99.166
114 pvecm delnode 0x00000002
115 pvecm delnode 0x00000002

Later found out in a virtual environment (on another test system) that I had to use the name shown in the web interface. So the node was not removed.



root@proxmox2:/etc/pve/nodes/proxmox2# pvecm status
Quorum information
------------------
Date: Wed Apr 24 10:37:12 2019
Quorum provider: corosync_votequorum
Nodes: 3
Node ID: 0x00000002
Ring ID: 1/1164
Quorate: Yes

Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.99.118
0x00000002 1 192.168.99.166 (local)
0x00000003 1 192.168.99.167




root@proxmox3:~# cat /etc/pve/corosync.conf
logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: proxmox1
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.99.118
}
node {
name: proxmox2
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.99.166
}
node {
name: proxmox3
nodeid: 3
quorum_votes: 1
ring0_addr: 192.168.99.167
}
}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: cluster0
config_version: 3
interface {
bindnetaddr: 192.168.99.118
ringnumber: 0
}
ip_version: ipv4
secauth: on
version: 2
}