So here's a clean thread to tackle the issues mentioned on that post, to which I quote the relevant text to the question now. You can see the whole thing, with some rants regarding the lack of support for LDAP with authenticated bind and others here.
I would also kindly ask to user/proxmox member @tom to stop blocking threads just because he doesn't have enough quality or knowledge to give proper replies. If you are "out of here" then f*** off. I've had enough of your prepotence and ignorance, and I'm not talking to you. I understand when you see long topics your head gets messed up, but that's your problem, not a community's problem, when we're talking about serious things. So drop the pigeon playing chess act, making a few comments and blocking the threads. That is just sad, but I imagine that's just who you are.. Good luck with that.
For what really matters, the issue is simple:
PVE firewall retrieves the addresses getaddrinfo call. (there are many posts on this forum with it, where it continuously causes issues).
Given an example where localnet is identified as IPv6, this causes the following issues:
- Communication is lost between Proxmox nodes (API HTTP/S port8006 traffic as is going over IPv6 and no rules match that on pve-firewall implementation);
- SSH communication between the nodes is also lost;
Corosync traffic is safe as I have three private links for corosync traffic and they're all ipv4, the rules have been created correctly.
Quoting the same interactions @tom was very eager to make vanish:
"I'm not sure why it's picking the IPv6 address."
"temporarily commenting"
"edit it manually"
As you can see from these answers and related issues, getaddrinfo and taking preferences from RFC3484 (AS IS SUPPOSED!) is good for general-purpose apps not for an app like Proxmox that requires exquisite control between its member hosts. The proxmox team solutions is to hammer around, send IPv6 communication to second plan, but that isn't a proper solution, not for me.
The last stance here is wrong. Is not only used for doing the api call to exchange the info required to join, but is used for many, many things, like for the nodes to communicate among each other (using the API) when clustered.
If the address is picked from getaddrinfo calls then it is not defined by the admin and is a volatile implementation, a poor implementation, which can be seen by the number of issues that come around due to THIS in particular. My first issue was due to an upstream IPv6 issue, where it would try to use the IPv6 as a result of the getaddrinfo call, no fallback mechanism, no admin control over which network to use for API traffic between hosts.
In the end of the day, what stands is: the supposed automatic rules don't work out of the box, the documentation is also poor, you can have a properly working cluster, you add an IPSet called management for management purpose as described is for remote access, when you enable the firewall, the Proxmox nodes can't communicate with each other.
does not work.
did not work for intra-cluster api communication,
did not work for proxying nothing from one node to another
did not work for ssh tunnelling from one to another
- management is described on the documentation as remote access hosts, not proxmox nodes among each other, since when I activate the firewall proxmox api comms break, seems a clear issue to me right here;
- v4 ? but you use getaddrinfo which looks for ai_family which can be af_inet or af_inet6.
By the way, where are the Ceph rules?
I would also kindly ask to user/proxmox member @tom to stop blocking threads just because he doesn't have enough quality or knowledge to give proper replies. If you are "out of here" then f*** off. I've had enough of your prepotence and ignorance, and I'm not talking to you. I understand when you see long topics your head gets messed up, but that's your problem, not a community's problem, when we're talking about serious things. So drop the pigeon playing chess act, making a few comments and blocking the threads. That is just sad, but I imagine that's just who you are.. Good luck with that.
For what really matters, the issue is simple:
PVE firewall retrieves the addresses getaddrinfo call. (there are many posts on this forum with it, where it continuously causes issues).
Given an example where localnet is identified as IPv6, this causes the following issues:
- Communication is lost between Proxmox nodes (API HTTP/S port8006 traffic as is going over IPv6 and no rules match that on pve-firewall implementation);
- SSH communication between the nodes is also lost;
Corosync traffic is safe as I have three private links for corosync traffic and they're all ipv4, the rules have been created correctly.
Quoting the same interactions @tom was very eager to make vanish:
hmm not sure why it's picking up the ipv6 address.
temporarily commenting the ipv6 entry in /etc/hosts is also worth a shot (however i wasn't able to reproduce your problem)
if that doesn't change anything you can try to edit it manually to use the ipv4 after pasting the join info for now.
"I'm not sure why it's picking the IPv6 address."
"temporarily commenting"
"edit it manually"
Oh, and the preference of IPv6 comes from the RFC 3484 which defines the default order for geaddrinfo calls.
You could also edit/etc/gai.conf
add (or uncomment) the line
precedence ::ffff:0:0/96 100
And restart pve-cluster afterwards, sorry for that confusion here, we mostly use getaddrinfo_all nowadays where this doesn't matters.
As you can see from these answers and related issues, getaddrinfo and taking preferences from RFC3484 (AS IS SUPPOSED!) is good for general-purpose apps not for an app like Proxmox that requires exquisite control between its member hosts. The proxmox team solutions is to hammer around, send IPv6 communication to second plan, but that isn't a proper solution, not for me.
For the Join the IP from a single getaddrinfo call is used, as that is the one the system admin prefers for public destination address for this node, and can be managed using gai.conf
But, this is only a recommendation to have one preselected without forcing the admin to always enter a specific IP, most of the time that works out - as it's really only used for doing the API call to exchange the info required for join.
The last stance here is wrong. Is not only used for doing the api call to exchange the info required to join, but is used for many, many things, like for the nodes to communicate among each other (using the API) when clustered.
If the address is picked from getaddrinfo calls then it is not defined by the admin and is a volatile implementation, a poor implementation, which can be seen by the number of issues that come around due to THIS in particular. My first issue was due to an upstream IPv6 issue, where it would try to use the IPv6 as a result of the getaddrinfo call, no fallback mechanism, no admin control over which network to use for API traffic between hosts.
In the end of the day, what stands is: the supposed automatic rules don't work out of the box, the documentation is also poor, you can have a properly working cluster, you add an IPSet called management for management purpose as described is for remote access, when you enable the firewall, the Proxmox nodes can't communicate with each other.
local_network is only used for the intra-cluster API communication:
- pveproxy proxying from one node to another
- spiceproxy/vncproxy proxying from one node to another
- SSH tunneling from one node to another (e.g., for migration)
- plain-text migration tunnelling from one node to another
does not work.
did not work for intra-cluster api communication,
did not work for proxying nothing from one node to another
did not work for ssh tunnelling from one to another
Code:
2586 if ($localnet && ($ipversion == $localnet_ver)) {
2587 ruleset_addrule($ruleset, $chain, "-d $localnet -p tcp --dport 8006", "-j $accept_action"); # PVE API
2588 ruleset_addrule($ruleset, $chain, "-d $localnet -p tcp --dport 22", "-j $accept_action"); # SSH
2589 ruleset_addrule($ruleset, $chain, "-d $localnet -p tcp --dport 5900:5999", "-j $accept_action"); # PVE VNC Console
2590 ruleset_addrule($ruleset, $chain, "-d $localnet -p tcp --dport 3128", "-j $accept_action"); # SPICE Proxy
2591 }
Code:
root@promox-01 ~ # iptables-save | grep 8006
-A PVEFW-HOST-IN -p tcp -m set --match-set PVEFW-0-management-v4 src -m tcp --dport 8006 -j RETURN
root@promox-01 ~ # iptables-save | grep 3128
-A PVEFW-HOST-IN -p tcp -m set --match-set PVEFW-0-management-v4 src -m tcp --dport 3128 -j RETURN
- management is described on the documentation as remote access hosts, not proxmox nodes among each other, since when I activate the firewall proxmox api comms break, seems a clear issue to me right here;
- v4 ? but you use getaddrinfo which looks for ai_family which can be af_inet or af_inet6.
Code:
root@proxmox-01 ~ # pve-firewall localnet
local hostname: proxmox-01
local IP address: 2a01:xxxx:xxxx:xxxx::1
network auto detect: 2a01:xxxx:xxxx:xxxx:0000:0000:0000:0000/64
using detected local_network: 2a01:xxxx:xxxx:xxxx:0000:0000:0000:0000/64
accepting corosync traffic from/to:
- proxmox-02: 10.xxx.xxx.2 (link: 0)
- proxmox-02: 10.xxx.xxx.2 (link: 1)
- proxmox-02: 10.xxx.xxx.2 (link: 2)
By the way, where are the Ceph rules?