Corosync authkey never changes?

esi_y

Renowned Member
Nov 29, 2023
2,221
374
68
github.com
Why e.g. pvecm delnode does not immediately change corosync authkeys across the rest of the cluster?

It would be a pretty basic precaution against a "rogue" node that one cannot prevent from trying to re-join, e.g. machine that got split off due to network issue and is not even reachable over IPMI, but needs replacing.
 
I wonder if it makes sense to file in Bugzilla or there's some smart answer that it can be safely ignored, a dangling zombie node potentially.
 
Why e.g. pvecm delnode does not immediately change corosync authkeys across the rest of the cluster?

It would be a pretty basic precaution against a "rogue" node that one cannot prevent from trying to re-join, e.g. machine that got split off due to network issue and is not even reachable over IPMI, but needs replacing.
a removed node couldn't rejoin because the node is removed from corosync.conf && config_version in the corosync.conf is incremented.
 
a removed node couldn't rejoin because the node is removed from corosync.conf && config_version in the corosync.conf is incremented.

What is the reason for this in the formal docs [1]?

As mentioned above, it is critical to power off the node before removal, and make sure that it will not power on again (in the existing cluster network) with its current configuration. If you power on the node as it is, the cluster could end up broken, and it could be difficult to restore it to a functioning state.

[1] https://pve.proxmox.com/wiki/Cluster_Manager#_remove_a_cluster_node
 
mmm,I wonder if the node is not going to flood the cluster with auth requests.
(but it should be the same if you change the authkey on the running cluster)
I wonder too now. Or maybe it was from the time during multicast corosync?
 
because corosync doesn't support live key rotations, and taking the whole cluster offline and hoping it will come back up is not something you want to do automatically. but documenting how to do it manually is probably good.
 
because corosync doesn't support live key rotations, and taking the whole cluster offline and hoping it will come back up is not something you want to do automatically. but documenting how to do it manually is probably good.

Thanks for the reply. Do I get it right that the API authkeys were considered a security risk to be left without rotating regularly [1], but corosync authkeys are not risky even after e.g. hardware is decommissioned (just because it was not implemented upstream)?

I understand it might sound somehow brave to live rotate, but it essentially means distributing the new authkey and when present on all nodes and all nodes (minus the one being deleted) are online and only then, trigger the restart of service (worst case it would not restart)? And if not, start logging as error that it could not be rotated.

[1] https://bugzilla.proxmox.com/show_bug.cgi?id=2079
 
it's not somewhat brave, it's very risky - if corosync doesn't come up again, the node will possibly fence itself, taking down all guests on it.

exploiting a leaked corosync auth key is also a lot harder:
- you need to know about the internals of pmxcfs and corosync to do anything with it
- you require access to the corosync network (which is usually internal)

compared to what you can do with a leaked API auth key...
 
it's not somewhat brave, it's very risky - if corosync doesn't come up again, the node will possibly fence itself, taking down all guests on it.

I will try this as a POC myself, I am not arguing, but as for fencing, it could be opt-in / subject to prompt-only (after pvecm) on HA clusters.

EDIT: If this is HA indeed and this would have happened, then in a twist because it is HA setup, they would be restarted on a healthy node. It's not like there's a reason to believe corosync would not restart on substantial number of nodes. In fact this would mean it is risky to restart the service on its own which would be terribly fragile setup.

exploiting a leaked corosync auth key is also a lot harder:

The API authkey without rotating is also quite secure, yet the rotating was implemented.

- you need to know about the internals of pmxcfs and corosync to do anything with it

This would be considered security-through-obscurity take on the comparison of the two.

- you require access to the corosync network (which is usually internal)

Well this could be the same argument with API - some will have it on the very same network even.

compared to what you can do with a leaked API auth key...

But rather than back and forth on how likely / dangerous this could be, I really wondered - is this (never rotated corosync auth key) the only reason for the docs remark quoted above?
 
Last edited:
there's nothing to PoC there - the problem is with the failure case and consequences, not with what is needed in the "works" case. I'm not going to argue this further, but if you don't see the difference between the two keys and their exploitability then I can't help you. there is a big one, both in the "how much work is it" (and yes, this matters when doing risk analysis, even if both can be exploited, it is not related to "obscurity" at all), and in terms of pre-conditions/requirements (because even if the IP addresses of the API and corosync links are identical, corosync will reject traffic from unknown sources by default for example). we will never have an automated corosync key rotation mechanism unless corosync itself gains support for hot reloading that works reliably.

and yes, restarting corosync after modifying its config is dangerous, there is a non-zero chance of the config somehow being invalid or inconsistent and the cluster breaking. we only do it ourselves when adding or removing nodes to a cluster for a reason, and limit ourselves to very basic changes.
 
there's nothing to PoC there - the problem is with the failure case and consequences, not with what is needed in the "works" case. I'm not going to argue this further [...]

I am okay with the reasoning, actually.

we will never have an automated corosync key rotation mechanism unless corosync itself gains support for hot reloading that works reliably.

Noted.
 
Last edited:
mmm,I wonder if the node is not going to flood the cluster with auth requests.
(but it should be the same if you change the authkey on the running cluster)

I now think it's because if one brings into the cluster a new node by the same name, the old one with valid authkey can start causing quite a havoc. I am not sure if having different IPs (be it old or new) is sufficient to have the cluster inert to the old node's traffic.
 
it should be nowadays (unless you manually tweaked the corosync config). but that was not the case in earlier corosync versions.
 
that only affects the corosync authkey though, there might be other remnants that still cause problems (now, or cause of changes in the future). it's simply not something we test or want to support.
 
that only affects the corosync authkey though, there might be other remnants that still cause problems (now, or cause of changes in the future). it's simply not something we test or want to support.

Fair enough, but could it be documented what is the reason (the other "remnants", I suppose replication jobs) for that specific piece of advice (generally). Usually a requirement would have an owner, a technical writer would keep an eye when something like this e.g. becomes stale advice. It's easier for anyone to spot anytime later if reasons are spelt out. Also easier for someone taking the risk to understand what exact risk they are taking.

I mean it totally sounds like the whole corosync would break apart due to that:

As mentioned above, it is critical to power off the node before removal, and make sure that it will not power on again (in the existing cluster network) with its current configuration. If you power on the node as it is, the cluster could end up broken, and it could be difficult to restore it to a functioning state.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!