[SOLVED]web access to node is gone after cluster activation

sheshman · Aug 9, 2023

Hi,

Activated cluster between two proxmox pve (Linux 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z)), system said joined to cluster.

Cluster : 172.168.17.203 (not behind nat direct access to internet)
Node : 172.168.11.202 (behind nat, 8006 and 5900 to 5999 forwarded to node)

After joinning to cluster i've lost web access to node, i mean web gui is loading up but it doesn't accept my password, i can ssh to node with my usual password but gui is not accepting.

Also it shows red x on the node on cluster, when i try to click Datacenter->Cluster it returns "'/etc/pve/nodes/egeturk/pve-ssl.pem' does not exist! (500)", i've found this topic about this problem : https://forum.proxmox.com/threads/after-cluster-activation-no-web-access-to-node.123823/ they were offering to use below commands to fix ;

Code:

rm /etc/pve/priv/pve-root*
pvecm updatecerts --force
systemctl restart pveproxy.service

when i use "rm /etc/pve/priv/pve-root*" it returns permission denied and other steps not work because 1st command fails.

Screenshots as attached, any advice?

Moayad · Aug 9, 2023

Hi,

Can you post the syslog as attached since the node joined the cluster?

Bash:

journalctl --since "2023-08-09 05:30" --until "2023-08-09 09:00" > /tmp/Syslog.txt

You may edit the time/date in the above command.

sheshman · Aug 9, 2023

Moayad said:
Hi,

Can you post the syslog as attached since the node joined the cluster?

Bash:

journalctl --since "2023-08-09 05:30" --until "2023-08-09 09:00" > /tmp/Syslog.txt

You may edit the time/date in the above command.

I was removed the node before your message, now re-added and changed log times from 09:00->18:00, hope this helps

Moayad · Aug 10, 2023

Hello,

Can you also please provide us with the pveversion -v?

Have you restarted the pveproxy and pvedaemon services?

UPDATE: and the pve-cluster service as well

sheshman · Aug 10, 2023

Hello,

Output of the pversion -v as below;

Code:

proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

No i didn't try restart pveproxy and pvedaemon services but i'll try today and keep you posted.

sheshman · Aug 10, 2023

unfortunately restart pveproxy and pvedaemon services didn't solved the problem

sheshman · Aug 11, 2023

UPDATE :
isp says there are no restriction on internet line but somehow we are unable to open port 22 to outer world, so that's why i've installed tailscale now both servers can ping and reach their ports through tailscale vpn.

So, tried to join cluster through cli instead of GUI to see what seems to be problem, it's getting stuck on "waiting for quorum..." all the time.

What could be the reason, which logs should i check to debug this problem.

Both servers can connect 8006-22 and all other ports, tested with "nc -zvw10 server_ip port" command

Moayad · Aug 11, 2023

Can you please provide us with output of `cat /etc/pve/corosync.conf` command?

sheshman · Aug 11, 2023

corosync.conf as below;

Code:

logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: helsinki
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 100.126.153.77 // i'm changing here to this ip before try to add node to cluster, on default it's adding server's wan ip
  }
  node {
    name: istanbul
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.245 // this is my node's LAN ip, as you know it's adding to config after i use on node, i think this is the problem
                                                       because it must be 100.114.70.125 instead of 192.168.1.245
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: mycluster
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

I've an thought about the problem but not quite sure, i'm trying to add my node with;

Code:

pvecm add helsinki

and helsinki is registered in /etc/hosts as 100.114.70.125 so when i ping helsinki it answers, but while adding the node to cluster it says;

Code:

No cluster network links passed explicitly, fallback to local node IP '192.168.1.245'

it's trying to communicate through node's LAN ip address, i think if i can force to use vpn ip instead of LAN ip that'll solve my problem, but i've no idea how to do it

sheshman · Aug 11, 2023

UPDATE : i found this command to force use vpn ip instead of local ip;

Code:

pvecm add 100.126.153.77 -link0 100.114.70.125 --use_ssh

still getting stuck on waiting for quorum...

sheshman · Aug 11, 2023

Fixed

I just realised i've shared missing information with you guys, terribly sorry for that.

I was using below commands to seperate node from cluster while trying to fix problem;

Code:

systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
killall pmxcfs
systemctl start pve-cluster
pvecm delnode oldnode
pvecm expected 1
rm /var/lib/corosync/*

it turns out that "pvecm expected 1" causing the getting stuck on "waiting for quorum", just seperated node from cluster with below commands;

Code:

systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
killall pmxcfs
systemctl start pve-cluster
pvecm delnode oldnode
rm /var/lib/corosync/*

re-joined to cluster with;

Code:

pvecm add 100.126.153.77 -link0 100.114.70.125 --use_ssh

it joined and started to work instantly.

So, it is so important to provide all steps you are doing while requesting help from experts and i did a big mistake by forgetting to share those extra commands i was using, sorry my bad.

Thanks for your time and patience

Search

Search

[SOLVED]web access to node is gone after cluster activation

sheshman

Member

Attachments

Moayad

Proxmox Staff Member

sheshman

Member

Attachments

Moayad

Proxmox Staff Member

sheshman

Member

sheshman

Member

sheshman

Member

Moayad

Proxmox Staff Member

sheshman

Member

sheshman

Member

sheshman

Member

We value your privacy