Search results

  1. G

    [SOLVED] Optimal number of Ceph monitor/manager/MDS

    Hi all, I'm currently running a cluster with 15 nodes and I plan to add more in the near future. As for Ceph I have 5 monitors, 5 managers and 5 metadata servers which currently manage 60+ OSDs. Do you advice to add more monitors/mangers/mds? Should I stick with odd numbers because of quorum...
  2. G

    Proxmox VE 7.3 released!

    Hi everyone, sorry I'm a bit late to the party. I'm running 7.2-11 and few weeks ago I had to pin the kernel to version 5.13.19-6-pve because of some live-migration issues. Do you know if I can safely upgrade to 7.3? Do you advice upgrading to the provided 5.15.74-1 kernel or better sticking...
  3. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    @fabian just one last question, I promise! :) I compared the old logs (from 10/19 and 10/20) with the most recent ones, and I'm pretty sure I might now add new nodes without problems, and even without the extra precaution of disabling/reenabling HA. Can you confirm? Thank you.
  4. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Hi @fabian , sorry for the late reply. I confirm everything went fine. I'm marking the thread as [SOLVED]. Thank you very much again and again! :)
  5. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Hi @fabian , I successfully added the new node to the cluster with HA disarmed. Attached the corosync logs from an existing node (proxnode01) and from the new node (proxnode18). A few minutes have passed and I don't see new retransmit list so it looks OK to me. Can you notice any critical...
  6. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Ok, thanks. I'll do it by stopping the daemons just like yesterday (pve-ha-lrm first and pve-ha-crm after) on all nodes. Once I'm sure corosync stays up, I can re-enable them. Right?
  7. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Hi @fabian , just to be on the safe side, do you think I might disable the HA daemons on the existing nodes (or at least some of them) before trying to add the new node to the cluster? So only some of them will reboot in case things go wrong? What's your advice? Thank you
  8. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Hi @fabian , thank you so much. I was able to put it back online out of the cluster. Anyway, because of several misaligned configurations involving Ceph (among other things) I prefer reinstalling the node from scratch, to be 100% sure there are no leftovers. I'll keep you updated. Thanks again
  9. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Hi @fabian, I followed the procedure and successfully restarted corosync and the HA daemons on all nodes as stated. So far, so good. Now I tried to edit /etc/pve/corosync.conf on the offline node but /etc/pve is in read-only mode. I suppose that it will automatically update itself when I put...
  10. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Ok, thank you again. I'll provide feedback as soon as possible.
  11. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Hi @fabian, ok, understood. Thank you. So, while HA is disarmed and corosync is restarted everything is supposed to stay up except if a node goes down the VMs will not auto-migrate to another node, right? Also, I was considering removing the new node (which is still offline) from the cluster...
  12. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Hi @fabian, sorry for the delay but I was out of office for a few weeks. I'm ready to apply the new configuration, my /etc/pve/corosync.conf.new looks like this (config_version increased by 1): totem { netmtu: 1397 config_version: 15 ... } So next steps are: cp...
  13. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    thank you @fabian, I'll give it a try as soon as I can. I have to restart corosync on all nodes, am I right?
  14. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    hi, ok I understand but how did it work for the first 13 nodes without any hassle? and how come I'm having missing heartbeat packets only in the last two days, with the exact same configuration I had for the last two years?
  15. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    By examining the logs we noticed that from time to time some corosync links are going down and up after a few seconds, without any reason. Network interfaces never really go down. It looks like it started happening yesterday after the first incident. I fear the problem is corosync side but I...
  16. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    it looks like I can't attach more than 10 files to a single reply, so here are the remaining ones
  17. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Thanks @fabian ! You can find all the logs attached for both corosync and pve-cluster on all nodes. Let me know if you need any further information @alexmolon yes, indeed! how strange o_O
  18. G

    Sudden reboot of multiple nodes while adding a new node

    thanks @fabian, I just created a new thread and tagged you as suggested
  19. G

    [SOLVED] Cluster with redundant Corosync networks reboots as soon as I join a new node

    Related to this other thread, tagging @fabian as requested. I currently have a cluster with 13 nodes running. Everything is updated to the latest versions (except for the kernel which is pinned to 5.13.19-6-pve on all nodes because of some issues with live-migration on different CPUs). All the...
  20. G

    Sudden reboot of multiple nodes while adding a new node

    It happened the same to me right now. Everything rebooted as soon as I added the new node to the cluster. Now I'm afraid to restart the new node and I don't know how to remove it from the cluster configuration. Edit: I have to specify that I'm currently running a cluster with 13 nodes...