Yes, but I have a HA cluser ... so they should simply move then and the machine should reboot ... :-)
So the question is how fencing would work when only softdog is used? If fencing and moving vms/cts works basically also in this case that it "hangs" then i could work around
Yes and no ... I...
I have the same effect with containers on glusterfs (directory storage on mounted glusterfs mountpoint). I hae this also when starting containers. After some time it fails and gets restarted and so it works in the end, but delayed. And yes with proxmox 6
basically the host was running without reboot since I entered the command ... How and where I would need to add it? Could you please tell me that and I will try it.
hey and sorry for the "a bit off topic" post here. I use Proxmox successfully and were on NUC5PPYH so far and happy. My HA setup is working great.
But NUC5PPYH is limited to 8GB RAM and in fact EOL so I decided to start upgrading and so I ended up on NUC8i5BEH2 ... but here the watchdog is not...
I'm now on 5.3.10 kernel too with pve 6.1 ... before that I had again such a case with not only the messages but also a "ethernet restart" ... lets see if it is different now
The bugs seems fixed in 5.2.2 ... but pve6 is on 5.0.x ... maybe the Proxmox guys could patch it themself in their kernel version? Maybe open an issue in their Bugtracker?
Could it be that the updated Qemu, kvm and such in PVE 6 reacts better to such cases as the ones included in PVE 5?
root@pm1:~# gluster volume info
Volume Name: gv0
Type: Distributed-Replicate
Volume ID: 64651501-6df2-4106-b330-fdb3e1fbcdf4
Status: Started
Snapshot Count: 0
Number of Bricks...
Sure: Make sure the glusterfs looses client-quorum :-)
So when you have 3 node cluster (setup as replica 2 or such) and then turn off two machines ... then it looses client quorum and everything should get blocked. Then (in my cases because it were just watchdog reboots but bad timing) the...
Yes correct. I tried "Stop" ... also "Reset" after some time ... but the process was not ended till I manually killed it (or 20-30 mins was not enough time)
And BTW: I use HA feature, so HA manager is involved too on stopping the vm
So I have 24h of stability so far ... and only 3 cases of "Token Retransmit list" cases the wole day with the new version and settings (was much more with old config and 2.x)