I ran into the 15000 character limit, so I'm attaching two log excerpts from 16:00 to 17:32 for pve-ha-lrm and pve-ha-crm, respectively. Is that far enough into the past?
Oh, and of course this might be good to know: 16:25 is when proxmox3 went online after its planned reboot. 17:32 is when it went online after its unexpected reboot.
Same time span, different node:
Oct 6 17:29:00 proxmox0 systemd[1]: Starting Proxmox VE replication runner...
Oct 6 17:29:01 proxmox0 systemd[1]: pvesr.service: Succeeded.
Oct 6 17:29:01 proxmox0 systemd[1]: Started Proxmox VE replication runner.
Oct 6 17:29:58 proxmox0 corosync[6718]...
If it's simply the journal on the unexpectedly rebooting node between the end of the planned reboot and the end of the unexpected reboot, this is it:
Oct 6 17:26:52 proxmox3 pmxcfs[5842]: [status] notice: received log
Oct 6 17:27:00 proxmox3 systemd[1]: Starting Proxmox VE replication...
Absolutely. How do I generate the output you need?
From a node which got upgraded:
proxmox-ve: 6.4-1 (running kernel: 5.4.128-1-pve)
pve-manager: 6.4-13 (running version: 6.4-13/9f411e79)
pve-kernel-5.4: 6.4-5
pve-kernel-helper: 6.4-5
pve-kernel-5.3: 6.1-6
pve-kernel-5.4.128-1-pve: 5.4.128-2...
I'm asking because this old answer appears to be incorrect.
We have a cluster with nine nodes. HA is enabled, with three rings. When it's time to upgrade packages, we do the following:
Remove a node from all HA groups
Wait for it to be fully evacuated
apt update && apt upgrade
Reboot and wait...
Does active service count mean number of VPSes/containers or simply the number of active processes?
Would you mind pointing me in the right direction to find where in the code this check happens? I'd like to study it and see if we can find a workaround in-house.
I have been experimenting a bit with how a cluster behaves in terms of migrations when a node is removed from an HA group. As far as I can tell, when the removed node is evacuated by HA, the target node seems to be selected based on which remaining node has the lowest CPU usage at the time of...
The nodes have four SFP+ 10Gb links, arranged into two LACP bonds plugged into two switches, and they also have a Gigabit Ethernet link each which goes to a third switch. We have two corosync networks; ring0 goes over one of the 10Gb bonds and is not dedicated, but ring1 goes over the GbE link...
Apr 22 23:32:25 node6 corosync[4057]: [MAIN ] Completed service synchronization, ready to provide service.
Apr 23 09:51:38 node6 corosync[4057]: [TOTEM ] A new membership (1.176) was formed. Members left: 4
Apr 23 09:51:38 node6 corosync[4057]: [TOTEM ] A new membership (1.176) was...
The cluster in question has seven nodes. Each node has one vote. They were all seemingly in quorum before one node was rebooted and the rest decided to take a dive. Here are the corosync logs for each of the nodes around the incident. That gap between 09:52 and 09:55 is when everything rebooted...
Very occasionally when we reboot a node, the whole cluster reboots. Is there a recommended shutdown procedure we're missing? Is there a way to tell corosync “this node will go offline for a while now, so please don't panic”?
Thank you.
To be honest, I sort of assumed that this was known and intentional for some reason. Yes, if this behaviour were changed it would change a lot.
Again, thank you.
About 500 or so. The cluster contains seven nodes, but most of the VMs are kept on five. We can go down to four in an emergency without disruption of service, if things are kept balanced. Something I've spent quite a few code-hours trying to automate, and it mostly works fine, but there are a...
Is there a way to view this migration queue, then? Besides fetching a list of all the services with state migrate, I mean, which doesn't show any additional information such as destination. Where is the queue stored internally?
Well … the last time it happened, we decided to do just that, and...
Let's say you have a cluster with HA enabled. If you have an “oh, crap” moment and accidentally trigger a mass migrate of a large number of virtual machines, placing them all in HA state migrate and queued for migration, is there a way to clear this queue so they don't get migrated?
If there's...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.