PVE 4 HA and redundant ring protocol (RRP)

alitvak69 · Oct 16, 2015

I am testing on a real cluster so I decided to open a new thread to avoid the confusion.

I have RRP (two different networks) configured on corosync. After testing HA in case of network failure I wonder now if it makes sense at all.

When I stop one of the interfaces on node, corosync declares the ring as faulty which is correct.

I have vm running on the same node with a single Ethernet that was bridged to a now faulty ring. Whence I no longer have access to vm.

As result of this test nothing happens with vm even though it lost connection to the world.

What in my opinion should happen, vm needs to be fenced and started other node, since it is no longer accessible on the current one.

Am I wrong ?

adamb · Oct 16, 2015

alitvak69 said:
I am testing on a real cluster so I decided to open a new thread to avoid the confusion.

I have RRP (two different networks) configured on corosync. After testing HA in case of network failure I wonder now if it makes sense at all.

When I stop one of the interfaces on node, corosync declares the ring as faulty which is correct.

I have vm running on the same node with a single Ethernet that was bridged to a now faulty ring. Whence I no longer have access to vm.

As result of this test nothing happens with vm even though it lost connection to the world.

What in my opinion should happen, vm needs to be fenced and started other node, since it is no longer accessible on the current one.

Am I wrong ?

By the sounds of it you are using your cluster network as your LAN to. Is that correct? As far as I know, fencing takes place at the host level not the VM level.

alitvak69 · Oct 16, 2015

Is support for RRP not implemented at all?

It seems so. I cannot migrate vm:101 via second ring.

Issuing migrate to alternative node name fails with (node virt2n1-la-int is not online)

The only method forked for me is to login into the test node add static route to a primary ring network via secondary ring and then issue migrate command to a primary node host name of the target node (virt2n1-la) .

alitvak69 · Oct 16, 2015

By the sounds of it you are using your cluster network as your LAN to. Is that correct? As far as I know, fencing takes place at the host level not the VM level.

Adamb

It sounds correct. However when my physical network card on the node dies it affects VMs using it as well. I may be using completely different network card for cluster network and in that case VMs bridging to it will be affected too.

Wouldn't be migrating / relocating VMs to be a good idea in this case?

I may be wrong but pacemaker with corosync allow to do that.

Also the same issue would affect LXCs exposed to outside world.

adamb · Oct 16, 2015

alitvak69 said:
Adamb

It sounds correct. However when my physical network card on the node dies it affects VMs using it as well. I may be using completely different network card for cluster network and in that case VMs bridging to it will be affected too.

Wouldn't be migrating / relocating VMs to be a good idea in this case?

I may be wrong but pacemaker with corosync allow to do that.

Also the same issue would affect LXCs exposed to outside world.

Yea I don't think RRP is implemented. It is suggested to use a separate network for the cluster network. Im not a fan of fail over's based on network conditions on the LAN. There are just to many variables which could happen and cause fail overs.

dietmar · Oct 16, 2015

alitvak69 said:
It sounds correct. However when my physical network card on the node dies it affects VMs using it as well. I may be using completely different network card for cluster network and in that case VMs bridging to it will be affected too.

We always suggest to make all components redundant, i.e. use a bond for the VM network.

alitvak69 said:
I may be wrong but pacemaker with corosync allow to do that.

Yes, pacemaker is a different approach - more flexible, but really complex to configure.

dietmar · Oct 16, 2015

adamb said:
Im not a fan of fail over's based on network conditions on the LAN. There are just to many variables which could happen and cause fail overs.

I want to agree here.

alitvak69 · Oct 16, 2015

dietmar said:
I want to agree here.

I agree with your observation . However any interface may and will go down, i.e. bond can go down effectively cutting off vms. If there is a supported RRP feature, would not it make sense to have a migration option?

As far as complexity, a good documents would help to learn as long as solution works. I better take that then no solution.

Sent from my SM-G900V using Tapatalk

dietmar · Oct 16, 2015

alitvak69 said:
However any interface may and will go down, i.e. bond can go down effectively cutting off vms. If there is a supported RRP feature, would not it make sense to have a migration option?

Please note that we added the RRP feature 2 days ago. So yes, I would make sense, but it is simply not implemented.

alitvak69 · Oct 18, 2015

Is there a way to vote for migration over RRP ? Also do you have a plan for monitoring resources, i.e. VMs in general. It would be a great feature migrate or reboot VM if it is not accessible by monitor. It can be optional but nevertheless.

Romsch · May 21, 2019

Hi all!
I have nearly the same question like @alitvak69
I tested rrp in our test environment with ceph. Two physical separate network switches, one for cluster network (the nodes 10.10.10.0) and one for ceph network (172.16.0.0).
I had the same experiences like the others described; if cluster network on one node is broken or offline, the vm still are running (thats good) but not available at the moment. Also the HA configured VMs are not changed to another host - is this possible now to configure? It would be very good, if HA configured VMs automatically move to a node wich is available in the cluster network.

With the ceph network if one ceph connection is broken or offline, the running VMs on this node are available, but in read only mode - is here also maybe a way, that the configured HA VMs automatically change to another host?

Best regards,
Roman

Search

Search

PVE 4 HA and redundant ring protocol (RRP)

alitvak69

Renowned Member

adamb

Famous Member

alitvak69

Renowned Member

alitvak69

Renowned Member

adamb

Famous Member

dietmar

Proxmox Staff Member

dietmar

Proxmox Staff Member

alitvak69

Renowned Member

dietmar

Proxmox Staff Member

alitvak69

Renowned Member

Romsch

Well-Known Member

We value your privacy