Hi guys. I need some help with proxmox-4. I have the issue with my 4 nodes in a cluster. They have the same version of pve. Corosync has some problems with transmit
[TOTEM ] Retransmit List: 298b1 298b2 298b
3 298b4 298b5 298b6 298b7 298b8 298b9 298ba 298bb 298bc 298bd 298be 298bf 298c0
298c1
I see it in my logs before my servers go down. They just reboot. I started to dig and I know that watchdog/softdog is the reason. I did not have any problem with version 3. But now It becomes a real problem. We have our servers rebooted one in 5 days or frequently. It's posible to have them rebooted 2 times a day. We tryed to switch our servers in one separed switch in case of network problems but It did not help. We use HA with the only purpose - to have the common management interface for the all servers in a cluster. Other options we do not use. Is there any chance to turn off the watchdog/softdog? Or may be other option could help us. Because reboot drives us crazy.
dpkg --list | grep pve-
ii libpve-access-control 4.0-9 amd64 Proxmox VE access control library
ii libpve-common-perl 4.0-36 all Proxmox VE base library
ii libpve-storage-perl 4.0-29 all Proxmox VE storage management library
ii pve-cluster 4.0-24 amd64 Cluster Infrastructure for Proxmox Virtual Environment
ii pve-container 1.0-21 all Proxmox VE Container management tool
ii pve-firewall 2.0-13 amd64 Proxmox VE Firewall
ii pve-firmware 1.1-7 all Binary firmware code for the pve-kernel
ii pve-ha-manager 1.0-13 amd64 Proxmox VE HA Manager
ii pve-kernel-4.2.3-2-pve 4.2.3-22 amd64 The Proxmox PVE Kernel Image
ii pve-libspice-server1 0.12.5-2 amd64 SPICE remote display system server library
ii pve-manager 4.0-57 amd64 The Proxmox Virtual Environment
ii pve-qemu-kvm 2.4-12
corosync-pve 2.3.5-1 amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync4-pve 2.3.5-1 amd64 Standards-based cluster framework (libraries)
Thank you in advance
[TOTEM ] Retransmit List: 298b1 298b2 298b
3 298b4 298b5 298b6 298b7 298b8 298b9 298ba 298bb 298bc 298bd 298be 298bf 298c0
298c1
I see it in my logs before my servers go down. They just reboot. I started to dig and I know that watchdog/softdog is the reason. I did not have any problem with version 3. But now It becomes a real problem. We have our servers rebooted one in 5 days or frequently. It's posible to have them rebooted 2 times a day. We tryed to switch our servers in one separed switch in case of network problems but It did not help. We use HA with the only purpose - to have the common management interface for the all servers in a cluster. Other options we do not use. Is there any chance to turn off the watchdog/softdog? Or may be other option could help us. Because reboot drives us crazy.
dpkg --list | grep pve-
ii libpve-access-control 4.0-9 amd64 Proxmox VE access control library
ii libpve-common-perl 4.0-36 all Proxmox VE base library
ii libpve-storage-perl 4.0-29 all Proxmox VE storage management library
ii pve-cluster 4.0-24 amd64 Cluster Infrastructure for Proxmox Virtual Environment
ii pve-container 1.0-21 all Proxmox VE Container management tool
ii pve-firewall 2.0-13 amd64 Proxmox VE Firewall
ii pve-firmware 1.1-7 all Binary firmware code for the pve-kernel
ii pve-ha-manager 1.0-13 amd64 Proxmox VE HA Manager
ii pve-kernel-4.2.3-2-pve 4.2.3-22 amd64 The Proxmox PVE Kernel Image
ii pve-libspice-server1 0.12.5-2 amd64 SPICE remote display system server library
ii pve-manager 4.0-57 amd64 The Proxmox Virtual Environment
ii pve-qemu-kvm 2.4-12
corosync-pve 2.3.5-1 amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync4-pve 2.3.5-1 amd64 Standards-based cluster framework (libraries)
Thank you in advance