Hi all,
I'm currently running a cluster with 15 nodes and I plan to add more in the near future. As for Ceph I have 5 monitors, 5 managers and 5 metadata servers which currently manage 60+ OSDs.
Do you advice to add more monitors/mangers/mds? Should I stick with odd numbers because of quorum...
Hi everyone,
sorry I'm a bit late to the party.
I'm running 7.2-11 and few weeks ago I had to pin the kernel to version 5.13.19-6-pve because of some live-migration issues.
Do you know if I can safely upgrade to 7.3? Do you advice upgrading to the provided 5.15.74-1 kernel or better sticking...
@fabian just one last question, I promise! :)
I compared the old logs (from 10/19 and 10/20) with the most recent ones, and I'm pretty sure I might now add new nodes without problems, and even without the extra precaution of disabling/reenabling HA.
Can you confirm?
Thank you.
Hi @fabian ,
I successfully added the new node to the cluster with HA disarmed. Attached the corosync logs from an existing node (proxnode01) and from the new node (proxnode18).
A few minutes have passed and I don't see new retransmit list so it looks OK to me. Can you notice any critical...
Ok, thanks. I'll do it by stopping the daemons just like yesterday (pve-ha-lrm first and pve-ha-crm after) on all nodes.
Once I'm sure corosync stays up, I can re-enable them.
Right?
Hi @fabian ,
just to be on the safe side, do you think I might disable the HA daemons on the existing nodes (or at least some of them) before trying to add the new node to the cluster? So only some of them will reboot in case things go wrong? What's your advice?
Thank you
Hi @fabian , thank you so much. I was able to put it back online out of the cluster.
Anyway, because of several misaligned configurations involving Ceph (among other things) I prefer reinstalling the node from scratch, to be 100% sure there are no leftovers.
I'll keep you updated.
Thanks again
Hi @fabian,
I followed the procedure and successfully restarted corosync and the HA daemons on all nodes as stated. So far, so good.
Now I tried to edit /etc/pve/corosync.conf on the offline node but /etc/pve is in read-only mode.
I suppose that it will automatically update itself when I put...
Hi @fabian,
ok, understood. Thank you. So, while HA is disarmed and corosync is restarted everything is supposed to stay up except if a node goes down the VMs will not auto-migrate to another node, right?
Also, I was considering removing the new node (which is still offline) from the cluster...
Hi @fabian, sorry for the delay but I was out of office for a few weeks.
I'm ready to apply the new configuration, my /etc/pve/corosync.conf.new looks like this (config_version increased by 1):
totem {
netmtu: 1397
config_version: 15
...
}
So next steps are:
cp...
hi,
ok I understand but how did it work for the first 13 nodes without any hassle? and how come I'm having missing heartbeat packets only in the last two days, with the exact same configuration I had for the last two years?
By examining the logs we noticed that from time to time some corosync links are going down and up after a few seconds, without any reason. Network interfaces never really go down.
It looks like it started happening yesterday after the first incident.
I fear the problem is corosync side but I...
Thanks @fabian ! You can find all the logs attached for both corosync and pve-cluster on all nodes. Let me know if you need any further information
@alexmolon yes, indeed! how strange o_O
Related to this other thread, tagging @fabian as requested.
I currently have a cluster with 13 nodes running. Everything is updated to the latest versions (except for the kernel which is pinned to 5.13.19-6-pve on all nodes because of some issues with live-migration on different CPUs). All the...
It happened the same to me right now. Everything rebooted as soon as I added the new node to the cluster.
Now I'm afraid to restart the new node and I don't know how to remove it from the cluster configuration.
Edit: I have to specify that I'm currently running a cluster with 13 nodes...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.