Wrong Join Information IP

maverickws

Member
Jun 8, 2020
50
2
8
Hi all,

I installed Proxmox and selected "Create Cluster".
I had selected 4 addresses: 1 public v4, 1 public v6, 2 private v4.

In the meanwhile I decided not going to use the public IPv6 address. So I followed this comment by fabian:
Proxmox VE 6 - Removing cluster configuration

https://pve.proxmox.com/pve-docs/pve-admin-guide.html#pvecm_separate_node_without_reinstall minus the last few steps.

Code:
systemctl stop pve-cluster corosync
pmxcfs -l
rm /etc/corosync/*
rm /etc/pve/corosync.conf
killall pmxcfs
systemctl start pve-cluster

should do the trick.

Completed with success.
Created Cluster again with the right links. No IPv6.
Now when I go to "Join Information" the "IP Address" is always that IPv6.

Why? How come removing the Cluster and re-adding, with new links, with the correct IP's, and I still have this IPv6?

I see it on /etc/pve/.members, but I cannot edit the file.

How to fix? thanks.
 
/etc/hosts has both
IPv4 hostname fqdn
IPv6 hostname fqdn

I have configured the links with ip's not hostnames.
 
what about corosync config in /etc/pve/corosync.conf ? (remove/mask public IPs)
 
Hi oguz thanks for your replies.

Code:
 # cat /etc/pve/corosync.conf 
logging {
  debug: off
  to_syslog: yes
}

nodelist {
  node {
    name: proxmox-01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: public_ipv4
    ring1_addr: 10.11.49.1
    ring2_addr: 10.5.49.1
  }
}

quorum {
  provider: corosync_votequorum
}

totem {
  cluster_name: MyCluster
  config_version: 1
  interface {
    linknumber: 0
  }
  interface {
    linknumber: 1
  }
  interface {
    linknumber: 2
  }
  ip_version: ipv4-6
  link_mode: passive
  secauth: on
  version: 2
}
 
hmm not sure why it's picking up the ipv6 address.

temporarily commenting the ipv6 entry in /etc/hosts is also worth a shot (however i wasn't able to reproduce your problem)

if that doesn't change anything you can try to edit it manually to use the ipv4 after pasting the join info for now.

you said you see the ipv6 in /etc/pve/.members ?

if you can reproduce this issue reliably, please create a bug report in https://bugzilla.proxmox.com
 
hmm not sure why it's picking up the ipv6 address.

temporarily commenting the ipv6 entry in /etc/hosts is also worth a shot (however i wasn't able to reproduce your problem)

So just to put it out there, what I did the first time was:
on this server I started the Proxmox Cluster Config.
The first time it asked me to select links, I selected:
link0: public v4
link1: public v6
link2: private v4

Then as the IPv6 link was causing issues (something I'll go through later) but not really of interest here, so I thought of just removing the IPv6 link.
So I removed the second node normally (this second node has since been formatted actually) and used the instructions above to remove the Cluster.

if that doesn't change anything you can try to edit it manually to use the ipv4 after pasting the join info for now.

I've noticed that together with the IP and the Join Information string, there's also a Fingerprint. I was wondering if I changed the IP to join the cluster on the node, would that work or hit something?

you said you see the ipv6 in /etc/pve/.members ?

Code:
{
"nodename": "proxmox-01",
"version": 3,
"cluster": { "name": "MyCluster", "version": 1, "nodes": 1, "quorate": 1 },
"nodelist": {
  "pmx-01": { "id": 1, "online": 1, "ip": "public_ipv6"}
  }
}

if you can reproduce this issue reliably, please create a bug report in https://bugzilla.proxmox.com

To reproduce this issue I would have to get new machines and re-do all this. I am very sorry but I do not have enough time on my hands right now, specially looking at the possibility that there will be no solution and I'll have to reinstall this machine (as it seems reinstall is the #1 fix around here) which will take me even more time, but I can't repeat the wrong configurations which then would make me lose even more time.

If anyone is available to do and do the same steps: create a cluster config with link0: public_v4 link1: public_ipv6 then destroying that conf and re-doing it then can see if its reproducible or not I guess.

In the meanwhile, I am going to comment the IPv6 entry on /etc/hosts on the server and reboot and see if it keeps showing me that IPv6, will provide feedback in a few minutes.
 
Last edited:
@oguz

I commented all the IPv6 entries on /etc/hosts and rebooted.
The Join Information came with the IPv4

.members with IPv4 IP.

Removed the comments and rebooted;
Join Information with IPv6 again !?!?

I don't understand this. So I have no control over the link, changing the links or choose what do I want?

Has no one here ever had IPv4 OK and IPv6 with upstream problems?
My cluster must fail and struggle with issues because Proxmox apparently uses links and connectivity as it pleases? ?!?!?
 
Last edited:
if that doesn't change anything you can try to edit it manually to use the ipv4 after pasting the join info for now.

you said you see the ipv6 in /etc/pve/.members ?

Changing manually does not work. actually you can't change the IP manually.
I've added the node.

All the private networks are connected via private vSwitch.

Frequently I get this warning "got timeout (500)" and a spining wheel.
Started Ceph Cluster with 1st node, created the monitor and manager automatically without issues.
Tried to add Ceph Monitors/Managers on the second node:

ERROR: Got timeout

v2:
ERROR: monitor address '10.1.49.2' already in use (500)

Address is in use by a failed monitor. This was a brand new install.
As I mentioned previously I'm evaluating virtualisation software and Proxmox was having some hype, so I'm really trying to get to know this.

Code:
# systemctl status ceph-{mon,mgr}@proxmox-02
● ceph-mon@proxmox-02.service - Ceph cluster monitor daemon
   Loaded: loaded (/lib/systemd/system/ceph-mon@.service; disabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mon@.service.d
           └─ceph-after-pve-cluster.conf
   Active: inactive (dead)

● ceph-mgr@proxmox-02.service - Ceph cluster manager daemon
   Loaded: loaded (/lib/systemd/system/ceph-mgr@.service; disabled; vendor preset: enabled)
  Drop-In: /usr/lib/systemd/system/ceph-mgr@.service.d
           └─ceph-after-pve-cluster.conf
   Active: inactive (dead)

Create OSD on second node. OSD shows as down/out.
Node 1 says OSD Type bluestore, node 2 says old type filestore ?
Btw I would like to say, this second node had the 2 drives shredded and was a clean new install of Proxmox.

What's this problem with Ceph and monitors/managers not coming up?

Why can't an administrator change the links that proxmox uses I didn't even understand this part very well all I really got was being forced to destroy a cluster and all the config files when I just wanted to replace a network with other. I honestly did not understand this step and I am still a bit appalled that was the only solution I found for this. Links can't be edited?
So if, let's say, I have 100 proxmox servers and some technical or business decision makes the ip's to have to be changed, the cluster must be destroyed and redone manually?

Ceph Log:

Code:
2020-06-15 20:11:37.944 7f2702727280  0 set uid:gid to 64045:64045 (ceph:ceph)
2020-06-15 20:11:37.944 7f2702727280  0 ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable), process ceph-mon, pid 13366
2020-06-15 20:11:37.944 7f2702727280 -1 monitor data directory at '/var/lib/ceph/mon/ceph-proxmox-02' does not exist: have you run 'mkfs'?
2020-06-15 20:11:48.044 7f73fa13c280  0 set uid:gid to 64045:64045 (ceph:ceph)
2020-06-15 20:11:48.044 7f73fa13c280  0 ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable), process ceph-mon, pid 13523
2020-06-15 20:11:48.044 7f73fa13c280 -1 monitor data directory at '/var/lib/ceph/mon/ceph-proxmox-02' does not exist: have you run 'mkfs'?
2020-06-15 20:11:58.296 7f04b4050280  0 set uid:gid to 64045:64045 (ceph:ceph)
2020-06-15 20:11:58.296 7f04b4050280  0 ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable), process ceph-mon, pid 13676
2020-06-15 20:11:58.296 7f04b4050280 -1 monitor data directory at '/var/lib/ceph/mon/ceph-proxmox-02' does not exist: have you run 'mkfs'?
2020-06-15 20:12:08.536 7f0b330a0280  0 set uid:gid to 64045:64045 (ceph:ceph)
2020-06-15 20:12:08.536 7f0b330a0280  0 ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable), process ceph-mon, pid 13772
2020-06-15 20:12:08.536 7f0b330a0280 -1 monitor data directory at '/var/lib/ceph/mon/ceph-proxmox-02' does not exist: have you run 'mkfs'?
2020-06-15 20:12:18.796 7f92fd4bc280  0 set uid:gid to 64045:64045 (ceph:ceph)
2020-06-15 20:12:18.796 7f92fd4bc280  0 ceph version 14.2.9 (bed944f8c45b9c98485e99b70e11bbcec6f6659a) nautilus (stable), process ceph-mon, pid 13899
2020-06-15 20:12:18.796 7f92fd4bc280 -1 monitor data directory at '/var/lib/ceph/mon/ceph-proxmox-02' does not exist: have you run 'mkfs'?

Run mkfs where? what? what is this error?
 
Last edited:
/etc/hosts has both
IPv4 hostname fqdn
IPv6 hostname fqdn

I have configured the links with ip's not hostnames.

in what order? The first one matters.

Removed the comments and rebooted;
Join Information with IPv6 again !?!?

I don't understand this. So I have no control over the link, changing the links or choose what do I want?
Changing manually does not work. actually you can't change the IP manually.

That's by design, it's a read only in memory file.
It's assembled here:
https://git.proxmox.com/?p=pve-clus...d16f68141f091a7d457dc4b614f315cf;hb=HEAD#l301

The IP comes from an initial lookup from the hostname when pmxcfs (pve-cluster.service) starts up.
Again, the first one counts here.

What's this problem with Ceph and monitors/managers not coming up?

That's for an other thread, please to not mix topics.
 
in what order? The first one matters.

That's by design, it's a read only in memory file.
It's assembled here:
https://git.proxmox.com/?p=pve-clus...d16f68141f091a7d457dc4b614f315cf;hb=HEAD#l301

The IP comes from an initial lookup from the hostname when pmxcfs (pve-cluster.service) starts up.
Again, the first one counts here.

The first one is, and always was, the IPv4.
Actually the hosts file has comments and the top section is IPv4 and the bottom is IPv6.
All IPv4 first
All IPv6 after

About the "by design" "read only memory file" I don't see any advantage or interest in this design which from what google tells causes more issues and reinstalls than Windows ME (which iirc was the last version I ever used).
By design what would be well thought was any way of managing the links. I simply don't understand this "by design"...

But definitely would like to see some kind of solution or interest about an erratic behaviour of the application, and the handicap on link management.

That's for an other thread, please to not mix topics.

Very well, I will open a new topic for this subject then. Thanks.
 
"read only memory file"
Do you even know what this means? It's just showing status about how things are seen, makes no sense to have this writeable.
This is a general concept and just means the data isn't backed on the filesystem but in program memory, used in some way by almost all programs - whatever you searched on google and telling problems you will find as much "problems" for almost any other basic programming concept as people run into all sorts of problems when learning to use things..

Also, just because join info proposes the IPv6 one, you can still tick off that checkbox and edit it to the desired one anyway, I'd suggest that for now as quick way to avoid you issue.
As the IPv6 address proposed would only be used for the joining API calls, not really for the cluster network afterwards.

In the meanwhile I decided not going to use the public IPv6 address. So I followed this comment by fabian:

For the next time, splitting up cluster for such a change isn't really required - you just could have removed the respective IPv6 "ringX_addr" entries from the node sections and the respective link entry from the totem section in the corosync configuration and be done.

Side note, I actually plan to extend the ip info here a bit to have all non-loopback addresses in there, not just one - to avoid such and other issues and/or confusions.
 
Oh, and the preference of IPv6 comes from the RFC 3484 which defines the default order for geaddrinfo calls.

You could also edit /etc/gai.conf add (or uncomment) the line
precedence ::ffff:0:0/96 100

And restart pve-cluster afterwards, sorry for that confusion here, we mostly use getaddrinfo_all nowadays where this doesn't matters.
 
Do you even know what this means?

Man I don't mean to be rude but sometimes some people make it a bit hard. :) So taking a deep breath and no I have no idea of what that means, right?

It's just showing status about how things are seen, makes no sense to have this writeable.

Of course, it makes much more sense having THESE definitions set on memory files generated by software that can't be edited, so that we are subject to erratic behaviours, like the one we're having here, where something that imo should be easily managed by the admin, but they're not.

My /etc/hosts DOES NOT HAVE IPv6 above IPv4 (mattered so much a couple of replies ago, but not really).

This is a general concept and just means the data isn't backed on the filesystem but in program memory, used in some way by almost all programs - whatever you searched on google and telling problems you will find as much "problems" for almost any other basic programming concept as people run into all sorts of problems when learning to use things..

"sure" & "thanks for the elucidating lecture".

Also, just because join info proposes the IPv6 one, you can still tick off that checkbox and edit it to the desired one anyway, I'd suggest that for now as quick way to avoid you issue.
As the IPv6 address proposed would only be used for the joining API calls, not really for the cluster network afterwards.

For sure, and I bet this is why I am having all the timeouts from one node to another with constant "got timeout (500)" errors, is because the IPv6 is only for joining, not for the cluster network.

For the next time, splitting up cluster for such a change isn't really required - you just could have removed the respective IPv6 "ringX_addr" entries from the node sections and the respective link entry from the totem section in the corosync configuration and be done.

I am sorry that you missed the posts before yours. For example, the post where I pasted my corosync.conf
Code:
nodelist {
  node {
    name: proxmox-01
    nodeid: 1
    quorum_votes: 1
    ring0_addr: public_ipv4
    ring1_addr: 10.11.49.1
    ring2_addr: 10.5.49.1
  }
}

No IPv6 "ring addr" anywhere. Proxmox determines it wants to use IPv6 and does. Period.

Side note, I actually plan to extend the ip info here a bit to have all non-loopback addresses in there, not just one - to avoid such and other issues and/or confusions.
Great, many features are needed regarding the links management.



Honestly what I find more interesting (in the negative sense) is all the wrong answers I get around here.
Its /etc/hosts, its corosync.conf, its by design in-memory files. Its all and none of the above at the same time.
Proxmox gives an administrator no control over links and traffic flows, and this causes issues, naturally. By design, but no one ever said it was a good design.
 
Last edited:
Oh, and the preference of IPv6 comes from the RFC 3484 which defines the default order for geaddrinfo calls.

You could also edit /etc/gai.conf add (or uncomment) the line
precedence ::ffff:0:0/96 100

And restart pve-cluster afterwards, sorry for that confusion here, we mostly use getaddrinfo_all nowadays where this doesn't matters.
Ok. Anyway this is a hack, not a fix.

Proxmox networking is like spending some hours fully configuring an HPE or Cisco router with all things proper: networks, routes, interfaces etc, and then you connect it and the router says "fu, I'll use whatever connection I like because I can.".
Proper app networking is: you select a link, the app uses that link. The admin edits the link, the app uses the new link. Loses connectivity to other nodes? Go fix for the proper link on the other nodes.
 
Last edited:
No IPv6 "ring addr" anywhere. Proxmox determines it wants to use IPv6 and does. Period.

I said that to your initial post where you still had IPv6, just check where I quoted from... If you asked about your initial issue instead of just following some seemingly related solution we could have directed you to drop the IPv6 instead of fully dissasembling a cluster and then complaining that the address proposed to join over (as said not to be used for cluster traffic) is proposed in an field which can be switch to editable..

If that is your corosync config then it won't use IPv6 for cluster traffic as there's no IPv6...

Ok. Anyway this is a hack, not a fix.

No, you say you want to prefer IPv4, this is the way. https://manpages.debian.org/testing/manpages/gai.conf.5.en.html

Of course, it makes much more sense having definitions set on memory files that can't be edited,

As said, this is only showing some status in read only fashion, it isn't meant to be changed as for that the /etc/hosts and /etc/gai.conf are there. It's similar to status files in /proc like cpuinfo or meminfo, which are also in memory files just showing the status - makes no sense that they can be written.

Further, as also said:
sorry for that confusion here, we mostly use getaddrinfo_all nowadays where this doesn't matters.
And:
Side note, I actually plan to extend the ip info here a bit to have all non-loopback addresses in there, not just one - to avoid such and other issues and/or confusions.
So I acknowledged that limitation and told you how to address it for now - either by editing the address field or by switiching preference over to IPv4, with a correct way to do so.

For sure, and I bet this is why I am having all the timeouts from one node to another with constant "got timeout (500)" errors, is because the IPv6 is only for joining, not for the cluster network.

Those timeouts have nothing to do with the cluster network, they are on API level. Cluster network backs corosync which backs pmxcfs distribute configuration filesystem. API calls go over HTTPS to the node your connected to and from that one will be proxied to the destination node also over HTTPS if it's not determined to be for the local one. Two completely different things.
 
I said that to your initial post where you still had IPv6, just check where I quoted from... If you asked about your initial issue instead of just following some seemingly related solution we could have directed you to drop the IPv6 instead of fully dissasembling a cluster and then complaining that the address proposed to join over (as said not to be used for cluster traffic) is proposed in an field which can be switch to editable..

If that is your corosync config then it won't use IPv6 for cluster traffic as there's no IPv6...

Not even destroying the cluster and re-doing solved it. Just changing the corosync would for sure have solved this issue and for sure that I wouldn't be here now with the same issue. But ok, editing the corosync.conf would have had the same effect. Which was none, as it uses the IPv6 anyway.

No, you say you want to prefer IPv4, this is the way. https://manpages.debian.org/testing/manpages/gai.conf.5.en.html

wtf man. I want to use the links I selected. I don't want to use IPv6 BECAUSE I HAVEN'T CHOSEN ANY IPv6 LINK TO USE, SO THE APPLICATION SHOULD USE THE LINKS THAT HAVE BEEN SELECTED. THIS IS A FLAW OF THE APPLICATION.

As said, this is only showing some status in read only fashion, it isn't meant to be changed as for that the /etc/hosts and /etc/gai.conf are there. It's similar to status files in /proc like cpuinfo or meminfo, which are also in memory files just showing the status - makes no sense that they can be written.

Ok, you're right. The problem is not that information being on memory-files, the problem is from where and what information gets there. The problem is upstream.


Further, as also said:
sorry for that confusion here, we mostly use getaddrinfo_all nowadays where this doesn't matters.
And:
Side note, I actually plan to extend the ip info here a bit to have all non-loopback addresses in there, not just one - to avoid such and other issues and/or confusions.
You are right. You posted that while I was typing the other.
About the first and the method, its an issue about conception and design. There's no problem of using getaddrinfo_all, there's a problem that there are no rules regarding what links to use and Proxmox takes over as it deems fit.
As I mentioned, the administrator should have full control on what links to use, chose a configuration and that configuration should stick until someone decides to change it. The links chosen should be the links used, with the address families selected by the admin.

So I acknowledged that limitation and told you how to address it for now - either by editing the address field or by switiching preference over to IPv4, with a correct way to do so.
Thank you and indeed it is a "correct" way of doing so, don't think it is the correct way from the perspective of a Proxmox admin for whom the correct way is using the app to select the links (as it happens now on the GUI when one selects Create Cluster) and chooses what addresses to use, and the application should comply.

Those timeouts have nothing to do with the cluster network, they are on API level. Cluster network backs corosync which backs pmxcfs distribute configuration filesystem. API calls go over HTTPS to the node your connected to and from that one will be proxied to the destination node also over HTTPS if it's not determined to be for the local one. Two completely different things.

Yeah actually I have a question about this:
So I select three links.
Lets imagine one of these is ONLY, and FULLY DEDICATED to corosync traffic.
How do I set that link for corosync traffic?
And how do I select another specific link just for API calls? Why isn't this under control of the admin?
 
wtf man. I want to use the links I selected. I don't want to use IPv6 BECAUSE I HAVEN'T CHOSEN ANY IPv6 LINK TO USE, SO THE APPLICATION SHOULD USE THE LINKS THAT HAVE BEEN SELECTED. THIS IS A FLAW OF THE APPLICATION.
Then select the links you want, that works.

How do I set that link for corosync traffic?

Just select it as link 0, you can select any network CIDR configured:
Screenshot_2020-06-16 dev6 - Proxmox Virtual Environment(1).png

And how do I select another specific link just for API calls?

Proxmox VE normally listens on all addresses, for the one to the node you're connected to it obviously uses the one you used to open the webinterface. For request to other cluster nodes the proxy uses the hostname of that node for the proxied HTTP request.

For the Join the IP from a single getaddrinfo call is used, as that is the one the system admin prefers for public destination address for this node, and can be managed using gai.conf
But, this is only a recommendation to have one preselected without forcing the admin to always enter a specific IP, most of the time that works out - as it's really only used for doing the API call to exchange the info required for join.
And it can be changed, after pasting the join info you can tick-off the assisted join checkbox at the top and you can edit the "Peer Address" field:
Screenshot_2020-06-16 dev6 - Proxmox Virtual Environment(3).png

You can in general also specify where VM live-migration traffic is sent over:
Screenshot_2020-06-16 dev6 - Proxmox Virtual Environment(2).png

Why isn't this under control of the admin?

So an admin should be able to have that control just fine.
 
Then select the links you want, that works.

I selected Link0: public_v4
The address used to exchange informations should be the one selected, not what proxmox wants.
And that's what happened, and you say "select the links you want, that works" then no IPv6 would have ever appeared or used to exchange any information. But yes I've understood where you do your getaddrinfo_all comes from and its a public ip to exchange information whatnot. Ok.
Not a fan but ok.


Just select it as link 0, you can select any network CIDR configured:
View attachment 17925
As mentioned, I did.
Taking that example, I selected as link0 the vmbr0 with a public IPv4 associated.
If later the application uses the IPv6 on that interface, it is a flaw of the application. The admin selected an interface AND an IPv4 family address.
That's what must be used. If other address family is being used, is wrong.

Proxmox VE normally listens on all addresses, for the one to the node you're connected to it obviously uses the one you used to open the webinterface. For request to other cluster nodes the proxy uses the hostname of that node for the proxied HTTP request.

For the Join the IP from a single getaddrinfo call is used, as that is the one the system admin prefers for public destination address for this node, and can be managed using gai.conf
But, this is only a recommendation to have one preselected without forcing the admin to always enter a specific IP, most of the time that works out - as it's really only used for doing the API call to exchange the info required for join.
Again I'm really sorry this is flawed. I select link0 with ipv4 address that's what muse be used. I selected a public ipv4. Proxmox can listen on all interfaces, for sure. No problem. But to send traffic, it must use the selected interfaces and addresses. That does not happen.
If people select links and addresses, those are to be used. I don't see how conceptually this is difficult to understand. People select links those should be used. If another public address is needed for join info exchange, ask for what address to use, or configure it and write it to a conf somewhere.

And it can be changed, after pasting the join info you can tick-off the assisted join checkbox at the top and you can edit the "Peer Address" field:
View attachment 17926

Disabling assisted join. Got that.

You can in general also specify where VM live-migration traffic is sent over:
View attachment 17927

Alright.
So ...
Just select it as link 0, you can select any network CIDR configured:

Link0 is for corosync traffic.
Options allow to select the network for live-migration traffic.

What are links 1 and 2 etc after link 0 used for and how?
Are they not used for corosync?

Do I have to put any of the public ip's on these links, or can them all be private?
Will that interfere with something?

So an admin should be able to have that control just fine.
You'd think right? Hopefully in the future there will be more concise options to manage the links.
In the meanwhile, thank you for your replies have been definitely helping to understand a few things about how proxmox works.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!