Proxmox Cluster not working

Janick

New Member
Jan 30, 2023
9
1
3
Dear Proxmox Forum

I'am currently facing issues with my Proxmox Cluster.
I have added SystemB to my cluster created on SystemA and since then I am getting the following error message:
"Connection error 596: tls_process_server_certificate: certificate verify failed"

I have tried unsuccessfully to debug the system with information from the following thread:
https://forum.proxmox.com/threads/t...tificate-certificate-verify-failed-596.76192/

Some information displayed in syslog:

SystemA:
Bash:
Jan 30 08:09:17 SystemA pveproxy[59222]: Cluster not quorate - extending auth key lifetime!
Jan 30 08:09:17 SystemA corosync[2850]:   [KNET  ] rx: Packet rejected from 192.168.191.1:2151
Jan 30 08:09:18 SystemA corosync[2850]:   [KNET  ] rx: Packet rejected from 192.168.191.1:2151

SystemB:
Bash:
Jan 30 08:07:03 SystemB pveproxy[3776398]: Could not verify remote node certificate 'BB:A4:2B:4E:AD:80:C1:D2:3E:14:77:76:88:57:A0:A8:63:54:8B:89:31:FB:0D:43:C1:13:80:35:52:E3:4E:3C' with list of pinned certificates, refreshing cache
Jan 30 08:07:03 SystemB corosync[3683]:   [KNET  ] rx: Packet rejected from 192.168.130.1:5405
Jan 30 08:07:06 SystemB corosync[3683]:   [KNET  ] rx: Packet rejected from 192.168.130.1:5405

- Both nodes have the same PVE Version 7.3-4
- The systems are connected via VPN
- UDP ports 5405-5412 for corosync are open for the VPN tunnel
- The system time is the same on both systems
- Node SystemB is marked as offline in the webinterface of SystemA but the shell is working. (And vise-versa)
- The journalctl output of "journalctl -u corosync -u pve-cluster -b" is attached

Any idea what could cause the issue?

Thank you in advance and best regards,
Janick
 

Attachments

  • SystemA.txt
    3.5 KB · Views: 5
  • SystemB.txt
    3.5 KB · Views: 5
Hi
First of all. Thank you for the speedy reply!
SSH is working for both systems.
 
Dear Forum members.

What could be a possible error?
It seems as if I can not fix the issue and I am grateful for any advice.

BR,
Janick
 
Hi,
please provide the /etc/corosync/corosync.conf of both nodes as well as the output of pvecm status. Note that running the cluster network over VPN is not recommended, as corosync requires a low latency network to work reliably.
 
Hi Chris

I attached the mentioned log files.
I understand that cluster network over VPN isn't best practice.

Could it be a NAT problem?
The packet probably is rejected because the source ip is not from my proxmox node?

Bash:
Packet rejected from 192.168.130.1:5405

Best regards,
Janick
 

Attachments

  • corosync_SystemA.txt
    538 bytes · Views: 16
  • corosync_SystemB.txt
    536 bytes · Views: 5
  • pvecm_status_SystemA.txt
    682 bytes · Views: 8
  • pvecm_status_SystemB.txt
    688 bytes · Views: 6
Hi Chris

I attached the mentioned log files.
I understand that cluster network over VPN isn't best practice.

Could it be a NAT problem?
The packet probably is rejected because the source ip is not from my proxmox node?

Bash:
Packet rejected from 192.168.130.1:5405

Best regards,
Janick
Hi,
yes the corosync packets are rejected because they don't belong to any known nodes IP address. Also, node SystemA seems to think it is quorate, as the expected votes is set to 1. Did you set this manually?

Anyway, the setup will not work as is, I would recommend creating the cluster network within its own subnet, no NAT and ideally also no VPN in between. Have a look at the corresponding section in the docs for a more detailed description [0].
Further, you will have to add an external vote device [1] to the cluster, in order to keep the cluster working if one of the nodes is not reachable.

[0] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_requirements
[1] https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_corosync_external_vote_support
 
  • Like
Reactions: Janick
Yes, I indeed set quorate manually on SystemA.
I think I know how to troubleshoot from here.

Thank you for your support.

BR,
Janick
 
Yes, I indeed set quorate manually on SystemA.
I think I know how to troubleshoot from here.

Thank you for your support.

BR,
Janick
Hello friend, I have the same problem, were you able to solve it? thank you
 
@
Hello friend, I have the same problem, were you able to solve it? thank you

Update. I did some more testing with another (3rd) Proxmox Hypervisor and got it to work.
Site A ist connected with Site B using a Wireguard S2S tunnel without doing NAT over the tunnel.
(See https://youtu.be/2oe7rTMFmqc?t=1487)

Joining the cluster did work (almost) without any issues.
As there already were VMs on the 3rd hypervisor joing was not possible.

The following workaround did work for me:
https://forum.proxmox.com/threads/joining-a-cluster-with-already-created-guests-vm.81064/
On node1 (with guests)
Create a new cluster or get join information.

On node2 (with guests)
scp -r /etc/pve/nodes/* to node1:/etc/pve/nodes
rm -r /etc/pve/nodes/*
Join cluster.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!