[SOLVED]web access to node is gone after cluster activation

sheshman

Member
Jan 16, 2023
51
4
8
Hi,

Activated cluster between two proxmox pve (Linux 6.2.16-3-pve #1 SMP PREEMPT_DYNAMIC PVE 6.2.16-3 (2023-06-17T05:58Z)), system said joined to cluster.

Cluster : 172.168.17.203 (not behind nat direct access to internet)
Node : 172.168.11.202 (behind nat, 8006 and 5900 to 5999 forwarded to node)

After joinning to cluster i've lost web access to node, i mean web gui is loading up but it doesn't accept my password, i can ssh to node with my usual password but gui is not accepting.

Also it shows red x on the node on cluster, when i try to click Datacenter->Cluster it returns "'/etc/pve/nodes/egeturk/pve-ssl.pem' does not exist! (500)", i've found this topic about this problem : https://forum.proxmox.com/threads/after-cluster-activation-no-web-access-to-node.123823/ they were offering to use below commands to fix ;
Code:
rm /etc/pve/priv/pve-root*
pvecm updatecerts --force
systemctl restart pveproxy.service

when i use "rm /etc/pve/priv/pve-root*" it returns permission denied and other steps not work because 1st command fails.

Screenshots as attached, any advice?
 

Attachments

  • 002.png
    002.png
    23.5 KB · Views: 7
  • 003.png
    003.png
    38.4 KB · Views: 7
  • 001.png
    001.png
    79.7 KB · Views: 7
Hi,

Can you post the syslog as attached since the node joined the cluster?

Bash:
journalctl --since "2023-08-09 05:30" --until "2023-08-09 09:00" > /tmp/Syslog.txt

You may edit the time/date in the above command.
 
  • Like
Reactions: sheshman
Hi,

Can you post the syslog as attached since the node joined the cluster?

Bash:
journalctl --since "2023-08-09 05:30" --until "2023-08-09 09:00" > /tmp/Syslog.txt

You may edit the time/date in the above command.
I was removed the node before your message, now re-added and changed log times from 09:00->18:00, hope this helps
 

Attachments

  • Syslog.txt
    101 KB · Views: 2
Hello,

Can you also please provide us with the pveversion -v?

Have you restarted the pveproxy and pvedaemon services?

UPDATE: and the pve-cluster service as well :)
 
Last edited:
Hello,

Output of the pversion -v as below;
Code:
proxmox-ve: 8.0.1 (running kernel: 6.2.16-3-pve)
pve-manager: 8.0.3 (running version: 8.0.3/bbf3993334bfa916)
pve-kernel-6.2: 8.0.2
pve-kernel-6.2.16-3-pve: 6.2.16-3
ceph-fuse: 17.2.6-pve1+3
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx2
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-3
libknet1: 1.25-pve1
libproxmox-acme-perl: 1.4.6
libproxmox-backup-qemu0: 1.4.0
libproxmox-rs-perl: 0.3.0
libpve-access-control: 8.0.3
libpve-apiclient-perl: 3.3.1
libpve-common-perl: 8.0.5
libpve-guest-common-perl: 5.0.3
libpve-http-server-perl: 5.0.3
libpve-rs-perl: 0.8.3
libpve-storage-perl: 8.0.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve3
novnc-pve: 1.4.0-2
proxmox-backup-client: 2.99.0-1
proxmox-backup-file-restore: 2.99.0-1
proxmox-kernel-helper: 8.0.2
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.0.5
pve-cluster: 8.0.1
pve-container: 5.0.3
pve-docs: 8.0.3
pve-edk2-firmware: 3.20230228-4
pve-firewall: 5.0.2
pve-firmware: 3.7-1
pve-ha-manager: 4.0.2
pve-i18n: 3.0.4
pve-qemu-kvm: 8.0.2-3
pve-xtermjs: 4.16.0-3
qemu-server: 8.0.6
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.1.12-pve1

No i didn't try restart pveproxy and pvedaemon services but i'll try today and keep you posted.
 
  • Like
Reactions: Moayad
UPDATE :
isp says there are no restriction on internet line but somehow we are unable to open port 22 to outer world, so that's why i've installed tailscale now both servers can ping and reach their ports through tailscale vpn.

So, tried to join cluster through cli instead of GUI to see what seems to be problem, it's getting stuck on "waiting for quorum..." all the time.

What could be the reason, which logs should i check to debug this problem.

Both servers can connect 8006-22 and all other ports, tested with "nc -zvw10 server_ip port" command
 
Last edited:
corosync.conf as below;
Code:
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: helsinki
    nodeid: 1
    quorum_votes: 1
    ring0_addr: 100.126.153.77 // i'm changing here to this ip before try to add node to cluster, on default it's adding server's wan ip
  }
  node {
    name: istanbul
    nodeid: 2
    quorum_votes: 1
    ring0_addr: 192.168.1.245 // this is my node's LAN ip, as you know it's adding to config after i use on node, i think this is the problem
                                                       because it must be 100.114.70.125 instead of 192.168.1.245
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: mycluster
  config_version: 2
  interface {
    linknumber: 0
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}

I've an thought about the problem but not quite sure, i'm trying to add my node with;
Code:
pvecm add helsinki

and helsinki is registered in /etc/hosts as 100.114.70.125 so when i ping helsinki it answers, but while adding the node to cluster it says;
Code:
No cluster network links passed explicitly, fallback to local node IP '192.168.1.245'
it's trying to communicate through node's LAN ip address, i think if i can force to use vpn ip instead of LAN ip that'll solve my problem, but i've no idea how to do it :)
 
Last edited:
UPDATE : i found this command to force use vpn ip instead of local ip;
Code:
pvecm add 100.126.153.77 -link0 100.114.70.125 --use_ssh
still getting stuck on waiting for quorum...
 
Fixed :)

I just realised i've shared missing information with you guys, terribly sorry for that.

I was using below commands to seperate node from cluster while trying to fix problem;
Code:
systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
killall pmxcfs
systemctl start pve-cluster
pvecm delnode oldnode
pvecm expected 1
rm /var/lib/corosync/*
it turns out that "pvecm expected 1" causing the getting stuck on "waiting for quorum", just seperated node from cluster with below commands;
Code:
systemctl stop pve-cluster
systemctl stop corosync
pmxcfs -l
rm /etc/pve/corosync.conf
rm -r /etc/corosync/*
killall pmxcfs
systemctl start pve-cluster
pvecm delnode oldnode
rm /var/lib/corosync/*

re-joined to cluster with;
Code:
pvecm add 100.126.153.77 -link0 100.114.70.125 --use_ssh
it joined and started to work instantly.

So, it is so important to provide all steps you are doing while requesting help from experts and i did a big mistake by forgetting to share those extra commands i was using, sorry my bad.

Thanks for your time and patience
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!