PVE 4.2 2-node cluster fencing on one node restart

igorc

New Member
Aug 31, 2015
9
0
1
Hi all,

I have a test setup of the latest non-subscription pve-4.2 cluster of 2 nodes running pve-kernel-4.4.8-1.

root@proxmox01:~# pveversion -v
proxmox-ve: 4.2-49 (running kernel: 4.4.6-1-pve)
pve-manager: 4.2-4 (running version: 4.2-4/2660193c)
pve-kernel-4.4.6-1-pve: 4.4.6-48
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.4.8-1-pve: 4.4.8-49
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-39
qemu-server: 4.0-74
pve-firmware: 1.1-8
libpve-common-perl: 4.0-60
libpve-access-control: 4.0-16
libpve-storage-perl: 4.0-50
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-15
pve-container: 1.0-63
pve-firewall: 2.0-26
pve-ha-manager: 1.0-31
ksm-control-daemon: 1.2-1
glusterfs-client: 3.7.2-1
lxc-pve: 1.1.5-7
lxcfs: 2.0.0-pve2
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve9~jessie
fence-agents-pve: 4.0.20-1
openvswitch-switch: 2.3.2-3

root@proxmox01:~# pvecm status
Quorum information
------------------
Date: Tue May 10 10:15:44 2016
Quorum provider: corosync_votequorum
Nodes: 2
Node ID: 0x00000001
Ring ID: 388
Quorate: Yes

Votequorum information
----------------------
Expected votes: 2
Highest expected: 2
Total votes: 2
Quorum: 2
Flags: Quorate

Membership information
----------------------
Nodeid Votes Name
0x00000001 1 192.168.0.185 (local)
0x00000002 1 192.168.0.186

Now what I see is when shutting down one of the nodes the second one gets fenced. The only way to prevent this is to execute "pvecm expect 1" on the running node when shutting down the other one but this is not an option if the node actually crashes instead being stopped/restarted. Is there any way to make this permanent or tell pve that this is a 2-node cluster so it is ok to continue with one vote? When using pacemaker with corosync for example I set:

quorum {
provider: corosync_votequorum
two_node: 1
}

Thanks,
Igor
 
Hi,

HA is not recommended on 2 nodes setups.
This is the reason why there is not permanent way to set this.

You can modify the corosync.conf but I would really recommended you to disable HA on this setup.
 
And never use two_node=1 with watchdog fencing - that is dangerous and simply makes no sense.
 
Hi wolfgang,

Thanks for your reply. Unfortunately sometimes 3 node cluster is not possible from financial point of view (lets say third party DC with node types that are too expensive to be used as arbiter only) so we have to work with what we have available. I plan keeping the HA active since I need VM live migration and migration upon node failure. I have setup DRBD in dual primary with 2 volumes for VM disk storage which should provide "relatively" safe split brain recovery. Wish DRBD v9.1 was already out since that's what I would like to settle on eventually. The failover and migration tests are successful for now but still have some more to come.

Cheers,
Igor
 
And never use two_node=1 with watchdog fencing - that is dangerous and simply makes no sense.

I agree but need this as POV. By the way the test servers are VM's running on nested KVM Proxmox server. The prod servers will have IPMI fencing.
 
Just for completeness, here is my current corosync config:

logging {
debug: off
to_syslog: yes
}

nodelist {
node {
name: proxmox02
nodeid: 2
quorum_votes: 1
ring0_addr: 192.168.0.186
ring1_addr: 10.10.1.186
}

node {
name: proxmox01
nodeid: 1
quorum_votes: 1
ring0_addr: 192.168.0.185
ring1_addr: 10.10.1.185
}

}

quorum {
provider: corosync_votequorum
}

totem {
cluster_name: proxmox
config_version: 2
ip_version: ipv4
rrp_mode: passive
secauth: on
version: 2

interface {
bindnetaddr: 192.168.0.185
ringnumber: 0
}

interface {
bindnetaddr: 10.10.1.185
ringnumber: 1
}

}

I'm using dual ring for corosync on 2 separate networks, as I do with any 2-node cluster I've build in the past (not PVE though, this is first time testing). When this is applied in a DC with whole infrastructure being redundant (UPS's, power supplies, switches etc.), a major disaster really should strike for the nodes to loose communication between each other :)
 
And never use two_node=1 with watchdog fencing - that is dangerous and simply makes no sense.

Hi Dietmar.

Please, let me to do a question (maybe something silly):
With PVE 4.2, is recommended enable "watchdog" in the BIOS of the servers in production environments?, ie, PVE works in very stable mode with this option of hardware enabled?.

Best regards
Cesar
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!