Proxmox VE 4.0 beta1 released!

wolfgang · Jun 25, 2015

All new Mainboards have hardware watchdogs so there is normally no need for extra device. If not softdog is a part of the kernel and works.

amartin · Jun 25, 2015

wolfgang said:
All new Mainboards have hardware watchdogs so there is normally no need for extra device. If not softdog is a part of the kernel and works.

Would the hardware watchdog work in conjunction with shared storage fencing, e.g how Pacemaker uses sbd?

In any of these cases, how do the other nodes know for certain that the rogue node has been killed? Do they simply wait for the watchdog timer period and then after that assume that the node has been killed?

wolfgang · Jun 25, 2015

It doesn't matter if the the node is not reachable from the cluster or off.
consider if the rest cluster has quorum then the cluster know everything is ok and the one node what is missing is not ok.
so after a time widows (dependence on different thinks) the cluster will believe the node is down (self fencing).
if no one has quorum it doesn't matter how the real state is, there are no node how can make decision.

dietmar · Jun 25, 2015

amartin said:
In any of these cases, how do the other nodes know for certain that the rogue node has been killed?

We use a distributed locking mechanism, combined with the watchdog feature.

amartin · Jun 25, 2015

dietmar said:
We use a distributed locking mechanism, combined with the watchdog feature.

wolfgang said:
It doesn't matter if the the node is not reachable from the cluster or off.
consider if the rest cluster has quorum then the cluster know everything is ok and the one node what is missing is not ok.

The concern I have is if the rogue node were to "recover" from the failed state and then start writing bad data to the shared storage (e.g DRBD, ceph, NFS, etc) at the same time as the new node that took over its VMs. If this were to occur, wouldn't it result in two nodes writing data to the same VM image files at once and thus corrupting them?

dietmar · Jun 25, 2015

amartin said:
[/COLOR]The concern I have is if the rogue node were to "recover" from the failed state and then start writing bad data to the shared storage (e.g DRBD, ceph, NFS, etc) at the same time as the new node that took over its VMs. If this were to occur, wouldn't it result in two nodes writing data to the same VM image files at once and thus corrupting them?

Our software solves exactly that problem. Or what do you think the software is for?

amartin · Jun 25, 2015

dietmar said:
Our software solves exactly that problem. Or what do you think the software is for?

The mechanism that it uses to solve that problem is generic in that it can work with any shared storage backend (DRBD, NFS, Ceph, Gluster, iSCSI, etc)? Does it intercept I/O requests to the shared storage or how does it work?

pixel · Jun 26, 2015

whats wrong with the ceph packages? do the jessie ones not work? [edit] just saw the lack of them http://ceph.com/docs/v0.69/install/debian/

dietmar · Jun 26, 2015

amartin said:
Does it intercept I/O requests to the shared storage or how does it work?

It uses a watchdog device to reboot nodes.

haxxa · Jun 26, 2015

Is it still recommended to use a 32bit OS in an LXC container in terms of resources?

ivix · Jun 26, 2015

It is possible to place LXC container on Ceph RBD storage ?

dietmar · Jun 26, 2015

ivix said:
It is possible to place LXC container on Ceph RBD storage ?

we plan to implement that soon.

sengo · Jun 26, 2015

I see in the docs that the minimum number of nodes for an ha confiuration is 3. Is a 2 node cluster no longer supported?

dietmar · Jun 26, 2015

sengo said:
I see in the docs that the minimum number of nodes for an ha confiuration is 3. Is a 2 node cluster no longer supported?

No, that is not supported.

tom · Jun 26, 2015

sengo said:
I see in the docs that the minimum number of nodes for an ha confiuration is 3. Is a 2 node cluster no longer supported?

Just to note, a two-node cluster was never a recommended or fully supported configuration.

See the old HA docs on:
http://pve.proxmox.com/wiki/High_Availability_Cluster#System_requirements

sengo · Jun 26, 2015

What a pity. Two node cluster are the most common configuration for us...

sengo · Jun 26, 2015

Not recommended is a thing. Unsupported and impossible to enable is another...

wolfgang · Jun 26, 2015

How I wrote before Corosync has no quorumdisk anymore. That the point.
And it is not impossible but not reliable. Or where is the decision breaker when you have a 2 node cluster split?

sengo · Jun 26, 2015

Do you mean using:quorum {provider: corosync_votequorumtwo_node: 1}I'm no cluster expert, but how does other softwares deal with 2 node clusters (eg VMware) without fencing and quorum disks?

wolfgang · Jun 26, 2015

I don't know what other softwares (eg VMware) do.
But I know one thing for sure it is not possible to make a reliable cluster with ha with less then 3 decision makers.
This can be different implemented eg shared Storage.
see https://en.wikipedia.org/wiki/Byzantine_fault_tolerance

Proxmox VE 4.0 beta1 released!

Proxmox Retired Staff

New Member

Proxmox Retired Staff

Proxmox Staff Member

New Member

Proxmox Staff Member

New Member

Renowned Member

Proxmox Staff Member

Renowned Member

Renowned Member

Proxmox Staff Member

Member

Proxmox Staff Member

Proxmox Staff Member

Member

Member

Proxmox Retired Staff

Member

Proxmox Retired Staff

We value your privacy