I use Supermicro server with PVE 6.2. I did the watchdog setup this way:
options ipmi_watchdog action=power_cycle panic_wdt_timeout=10
Describing here a problem encountered. noteworthy is previously there were hugepages configured with the VM running and these were subsequently removed. Proxmox did not catch this imho.
Booting a VM where the CPU configuration had pdpe1gb flag enabled this appeared to have resulted in initially...
I'm testing new setup where I have 3x storage boxes(with multiples drives) and 6x compute boxes (with two drives + more RAM) all are part of same cluster.
I installed ceph on 3x storage nodes and added all the free drives from it as OSDs. ceph is up and running and I can use mounted...
I have added a new node to my cluster today, then I realized the new node's network configuration might have an issue that it cannot communicate with the CEPH IP ranges, I have restarted the network service using "systemctl restart networking"
after this, the disaster happened and I...
I'm running PVE cluster on 6 nodes.
In total 2 different server models are used, but all are from Lenovo.
In the server configuration I can define 3 types of server timeouts:
Enable Power Off Delay
I read here that by default all hardware watchdog modules are...
I'm having some issues with my Proxmox host. Sometimes it will randomly lockup and freeze everything. The consol will still apear but won't accept any entered keys, SSH and the GUI also don't work.
I am passing through a Nivdia GT710 to a Windows 10 Client. I thought this was the...
I'm trying to force a windows guest reset on BSOD using the qemu watchdog functionality.
After adding a virtual watchdog device
to Windows VM it appears in the device manager as
Intel(R) 6300ESB Watchdog timer - 25A...
I moved to Proxmox 5 in a dev environment and was wondering how to setup the hardware watchdog.
On the same hardware running Proxmox 4, a kernel module ipmi_watchdog has been loaded.
Now I can only find the following modules.
lsmod |grep ipmi
ipmi_ssif 24576 0...
So I just upgraded to 5.0-23 tonight.
When I try to boot the system, I now receive this error:
[28.072000] NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [swapper/0:1]
The machine will keep spitting out variations of that error (only thing that changes is the timing. Instead of 23s it...
In Proxmox 3.x I setup fencing using apc pdus.
I did not have any HA VMs setup but if one of the Proxmox nodes locked up or crashed the node would fenced.
Is it possible to replicate this behavior In 4.x and 5.x?
I'm fine with the watchdog as the method of fencing just don't see a way to make...
I have Proxmox 4.4 installed in a new Dell R630 with watchdog configured as explained in the Proxmox wiki which is just as the 6th comment in this other thread suggests:
At the end of that comment it says:
As we have already talked about reboots, here's one fresh. From 4.4-5 to 4.4-13. Reboot is at the end.
Apr 12 14:55:47 srv-01-szd systemd: Stopped Corosync Cluster Engine.
Apr 12 14:55:47 srv-01-szd systemd: Starting Corosync Cluster Engine...
Apr 12 14:55:47 srv-01-szd corosync...
We have a tree node cluster and our datacenter is having little networking issues those days. It means that part of the traffic is lost (resulting pings loss) and the watchdog on each node, which is the default software softdog, sees that the machine cannot contact the others and shutdown the...
There is a cluster of two nodes (Proxmox 4.3, HP DL380 G9 Servers), HP MSA 2040 SAN connected via FC 8Gb. LVM shared for both nodes. Online and offline migration goes perfectly.
The problem is when we are testing HA and restart to the one node or make disconnection to the...
We created a VM for Windows 7 64bit. We didn't notice that we had the VM in kvm64 default mode until everything was fully installed and running. (We usually use "host" mode, because 3 of our nodes have the same specifications.)
Now, if I shut down the Windows 7 guest, and configure it to...
A few days ago we had strange case – all three nodes restarted nearly at the same time, with just a few seconds difference.
Around that time our provider had a DDos attack, that did not affect our cluster, but some servers in the same VLAN were affected.
The DDos started at around 3:40...
I'm trying to set up our cluster of 3 Dell R730's & I'm following the wiki here: https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#Dell_IDrac_.28module_.22ipmi_watchdog.22.29
I do have OMSA installed so I edited out dcwddy64.ini: