Missing Node IP address in /cluster/status API call

sin3vil · Jul 8, 2024

Hello all,

I'm managing a couple of separate 3-node clusters. I have a service that pulls the /cluster/status via API, but I've noticed that in some cases the IP for a node is missing, usually when that node is offline.

Problematic status:

Code:

root@pve2:~# pvesh get /cluster/status
┌───────────┬──────┬─────────┬───────────────┬───────┬───────┬────────┬───────┬────────┬─────────┬─────────┐
│ id        │ name │ type    │ ip            │ level │ local │ nodeid │ nodes │ online │ quorate │ version │
╞═══════════╪══════╪═════════╪═══════════════╪═══════╪═══════╪════════╪═══════╪════════╪═════════╪═════════╡
│ cluster   │ UNI  │ cluster │               │       │       │        │     3 │        │ 1       │       3 │
├───────────┼──────┼─────────┼───────────────┼───────┼───────┼────────┼───────┼────────┼─────────┼─────────┤
│ node/pve1 │ pve1 │ node    │               │       │ 0     │      1 │       │ 0      │         │         │
├───────────┼──────┼─────────┼───────────────┼───────┼───────┼────────┼───────┼────────┼─────────┼─────────┤
│ node/pve2 │ pve2 │ node    │ 169.254.0.102 │       │ 1     │      2 │       │ 1      │         │         │
├───────────┼──────┼─────────┼───────────────┼───────┼───────┼────────┼───────┼────────┼─────────┼─────────┤
│ node/pve3 │ pve3 │ node    │ 169.254.0.103 │       │ 0     │      3 │       │ 1      │         │         │
└───────────┴──────┴─────────┴───────────────┴───────┴───────┴────────┴───────┴────────┴─────────┴─────────┘

vs working status (with node2 actually offline in this case too)

Code:

root@pve1:~# pvesh get /cluster/status
┌───────────┬──────┬─────────┬───────────────┬───────┬───────┬────────┬───────┬────────┬─────────┬─────────┐
│ id        │ name │ type    │ ip            │ level │ local │ nodeid │ nodes │ online │ quorate │ version │
╞═══════════╪══════╪═════════╪═══════════════╪═══════╪═══════╪════════╪═══════╪════════╪═════════╪═════════╡
│ cluster   │ UNI  │ cluster │               │       │       │        │     3 │        │ 1       │       9 │
├───────────┼──────┼─────────┼───────────────┼───────┼───────┼────────┼───────┼────────┼─────────┼─────────┤
│ node/pve1 │ pve1 │ node    │ 169.254.0.101 │       │ 1     │      1 │       │ 1      │         │         │
├───────────┼──────┼─────────┼───────────────┼───────┼───────┼────────┼───────┼────────┼─────────┼─────────┤
│ node/pve2 │ pve2 │ node    │ 169.254.0.102 │       │ 0     │      2 │       │ 0      │         │         │
├───────────┼──────┼─────────┼───────────────┼───────┼───────┼────────┼───────┼────────┼─────────┼─────────┤
│ node/pve3 │ pve3 │ node    │ 169.254.0.103 │       │ 0     │      3 │       │ 1      │         │         │

These clusters are all the same, so I'm unsure what could be causing this.
The manual for /cluster/status says that the IP is "[node] IP of the resolved nodename.". Cluster manual says that it's using getaddrinfo() to resolve the hostname. I've confirmed that it's resolvable, using getent.
For the record, on each cluster, all nodes are in /etc/hosts.

corosync.conf always contains all nodes with their IP addresses etc.

Can someone shed some light here?

fabian · Jul 8, 2024

this information is only available if the node has been online and broadcasted it, that is likely the difference between your two examples..

sin3vil · Jul 8, 2024

Hello Fabian,

This is probably the case, I can see the "working" cluster has a much longer uptime, probably before the problematic node went offline.
Still, why this design? I'd expect this information to be read from corosync.conf or something.

fabian · Jul 8, 2024

it is broadcasted when pmxcfs on that node starts up (and then re-broadcasted on cluster membership changes).

the contents of the corosync config are queryiable using other means, and might contain other IP addresses anyway (the corosync links aren't necessarily identical to the resolved addresses of the hostnames of the nodes after all)

sin3vil · Jul 9, 2024

Fabian, I'm slightly confused.

On one hand you say that the information is broadcasted by the node, on the other that hostnames are resolved.
I mean, the hostname is available, what info is needed to be broadcast by the node that then gets resolved into an IP?

Using the API, how would I be able to get the IP address for a node that isn't currently online (and hasn't broadcast back for a while)?

fabian · Jul 9, 2024

sin3vil said:
On one hand you say that the information is broadcasted by the node, on the other that hostnames are resolved.
I mean, the hostname is available, what info is needed to be broadcast by the node that then gets resolved into an IP?

when the pmxcfs service is started, it will resolve the hostname/node name and broadcast the result (or fail if there is no non-loopback IP that it resolves to).

sin3vil said:
Using the API, how would I be able to get the IP address for a node that isn't currently online (and hasn't broadcast back for a while)?

you can't if no node has this information. that information might not be accurate anyhow if the node is not online.

sin3vil · Jul 9, 2024

Understood. Thanks Fabian for your time.

Search

Search

Missing Node IP address in /cluster/status API call

sin3vil

New Member

fabian

Proxmox Staff Member

sin3vil

New Member

fabian

Proxmox Staff Member

sin3vil

New Member

fabian

Proxmox Staff Member

sin3vil

New Member

We value your privacy