Thank you for the feedback above, it gave me the kick I needed to shut down the server and check the hardware cases you mentioned above.
It turned out to be a motherboard issue (probably what I get for trying to run 128GB ram on a consumer...
Hi!
A bad page with a non-zero reference counter usually means bad memory, a bad disk (if swap - e.g. hibernating - is used), or a kernel bug. Does the system use swap?
Does this happen with other processes? In which kernel version did this...
Hi!
Memory hot-unplugging (as well as any other kind of hot-plugging/hot-unplugging) needs support from the guest operating system and it seems that it doesn't have that capability here. Does the VM have all the virtio-win drivers from the most...
Right, that could definitely be improved by using "Keep Together (positive)" and "Keep Separate (negative)" in the web interface or using only the "positive" and "negative" names there too.
I'm not sure about including the rule name/description...
From the old resources.cfg from Nov 12 it seems that the HA groups were never fully migrated. Was the ` at the end an artifact of embedding it as code int he forum or was that part of the file?
Either way, great to hear that your problem has...
I'm sorry, but I cannot reproduce this issue with the information you gave me. There's one newer version of pve-ha-manager on trixie/pve-test (5.1.0), but the bug fix in that version shouldn't be related to this here.
Can you share the...
If the HA status report from above is the exact same as it was before setting pve1 in maintenance mode (the HA status report's timestamp is later than the syslog), then I cannot reproduce it with this configuration either. Are you sure that the...
Thanks!
If it's possible, it would be great to have a more complete reproducer for this to investigate the issue. The names can be changed, the only important part is that the changed names have the same alphabetical ordering (e.g. SN140 ->...
Hi!
I have recreated your exact setup as described by the status output and rules config above and couldn't reproduce this either.
What should happen is that as soon as pve1 is put in maintenance mode, the vm:102 will select a new node (it will...
See my reply above, the syslog on the current HA Manager (master) node should show a regular error that the HA groups couldn't be migrated and the reason why. This should point you in the direction what is missing (e.g. maintenance mode, not...
Hi!
The versions of the node seem a bit out of date, pve-manager 9.0.5 is from mid-August. Could you try upgrading the nodes to see if this is fixed in a more current version?
The HA groups should have been migrated with the upgrade from PVE 8...
Was there any console output for the qmstart task for the VM 410? Is there anything shown at boot on the machine? Is the boot disk (scsi0) still intact (e.g., no fs corruption, system files are readable)?
Else I could not see anything off from...
From the error message it seems like that a random byte sequence was introduced... When you restart your machine / restart the pveproxy/pvedaemon services, then the Perl files are recompiled. I have never seen something similar before and can...
Hi!
Have you installed any additional software, which actively uses the landlock kernel module? This warning is only reported, if one of the landlock syscalls (landlock_create_ruleset, landlock_add_rule, landlock_restrict_self) are trapped...
Hello @dakralex, the nodes thinkserver and pvrserver were delete many months ago and they were never part of the v9 upgrade. When starting to troubleshoot this problem I found that the folders in /etc/pve/nodes did infact include thinkserver and...
Hi!
LLMs are inherently incapable of maintaining any semantic or logical relationships and will interpolate beyond facts. libpve-guest-common-perl 6.0.2 is the correct version for Proxmox VE 9.1.
Has the source file...
Hi!
The HA group to HA node affinity rules migration is done every 6 HA rounds, i.e., around every minute with an HA round lasting ~10 seconds. To be sure that everything is fine, it will only do the migration if the cluster is quorate, all...