[SOLVED] Watchdog fence for physical nodes

e100 · May 12, 2017

In Proxmox 3.x I setup fencing using apc pdus.
I did not have any HA VMs setup but if one of the Proxmox nodes locked up or crashed the node would fenced.

Is it possible to replicate this behavior In 4.x and 5.x?
I'm fine with the watchdog as the method of fencing just don't see a way to make it work without having an HA VM setup on the node.

I tried setting up a group that contained only one Proxmox node and then assign one VM as HA on that node into that group. But when the node comes back up the HA VM is in error state requiring manual intervention.

dietmar · May 12, 2017

e100 said:
But when the node comes back up the HA VM is in error state requiring manual intervention.

Should work if the group is unrestricted. Please can you post your group configuration?

e100 · May 12, 2017

I changed restricted tag so my group looks like this:

Code:

group: vm16
        nodes vm16
        nofailback 0
        restricted 0

This *might* work except I only have local storage.
VM16 node got fenced, HA tries to start the VM on VM17 node but fails because it does not have the disks.
Then HA tries to do a migrate to another node, that fails because it wants to clone the local disks that don't exist.

Code:

Task started by HA resource agent
May 12 13:40:08 starting migration of VM 107 to node 'vm16' (x.x.x.x)
May 12 13:40:08 found local disk 'local-zfs:vm-107-disk-1' (in current VM config)
May 12 13:40:08 found local disk 'local-zfs:vm-107-disk-2' (in current VM config)
May 12 13:40:08 copying disk images
cannot open 'rpool/data/vm-107-disk-1': dataset does not exist
usage:
snapshot|snap [-r] [-o property=value] ... <filesystem|volume>@<snap> ...

For the property list, run: zfs set|get

For the delegated permission list, run: zfs allow|unallow
May 12 13:40:08 ERROR: Failed to sync data - command 'zfs snapshot rpool/data/vm-107-disk-1@__migration__' failed: exit code 2
May 12 13:40:08 aborting phase 1 - cleanup resources
May 12 13:40:08 ERROR: found stale volume copy 'local-zfs:vm-107-disk-1' on node 'vm16'
May 12 13:40:08 ERROR: migration aborted (duration 00:00:01): Failed to sync data - command 'zfs snapshot rpool/data/vm-107-disk-1@__migration__' failed: exit code 2
TASK ERROR: migration aborted

After repeated migration failures it ends up in error condition.

dietmar · May 12, 2017

e100 said:
This *might* work except I only have local storage.

Shared storage is a requirement for HA. Or you use a VM without any disk.

e100 · May 12, 2017

Each node has a dedicated group that looks like this:

Code:

group: NodeName
        nodes Node_Name
        nofailback 0
        restricted 0

Each node has a diskless VM like this:

Code:

bootdisk: scsi0                                  
cores: 1                                        
freeze: 1                                        
ide2: none,media=cdrom                          
memory: 1                                        
name: NodeName-HA                                    
numa: 0                                          
ostype: l26                                      
scsihw: virtio-scsi-pci                          
smbios1: uuid=6c1ab0d6-2ab3-46e4-9677-74f7e60894d8
sockets: 1

Then each diskless VM is setup as an HA resource like this:

Code:

vm: 916
        comment Server HA NodeName
        group NodeName
        state started

Now the node gets fenced when it loses quorum and when it starts back up the diskless VM is moved to it.

While this works it is not an ideal solution.
Would be nice if Proxmox had a simple way to setup physical server fencing without needing to setup a diskless fake VM to do so.

dietmar · May 12, 2017

e100 said:
Would be nice if Proxmox had a simple way to setup physical server fencing without needing to setup a diskless fake VM to do so.

I guess this would be easy to implement, but I am quite unsure if many people want that feature ...

Search

Search

[SOLVED] Watchdog fence for physical nodes

e100

Renowned Member

dietmar

Proxmox Staff Member

e100

Renowned Member

dietmar

Proxmox Staff Member

e100

Renowned Member

dietmar

Proxmox Staff Member