Hi there,
Please forgive my ignorance, but I'm rather new to proxmox. After much trial and error I would like to check something with you, maybe I am hoping for too much.
I'm tasked to set up a HA cluster, where nodes can just fail and containers running on said nodes get automatically fired up on the remaining nodes. So, High Availability.
Now, I've set up a 3 node cluster with the latest proxmox version (pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)), got fencing and HA working, etc.
To test the clustering, I can online migrate (locally hosted) CTs around with zero downtime, migration taking about 40 seconds or so total.
Next I've created a 4th machine which runs a GlusterFS server (v 3.4.2), and added one of its volumes as share to my proxmox cluster.
I've then created another container, hosted on this shared storage. I can also online migrate this container around, although it takes longer (and the downtime in particular is much longer). But that's another thing to sort out.
More crucially --- if I pull the power cord on the node running that second container (living on the Gluster share...), the container does not get fired up somewhere else by the resource group manager. Worse, I can't even manually "migrate" --- or just fire up at all --- that container. If I try to do that, I get something along the lines of "no route to host".
So what am I doing wrong? I of course would like to see the proxmox cluster recognising the node failure and fire up that container on one of the two remaining nodes ASAP!
Thanks for helping me out!
Please forgive my ignorance, but I'm rather new to proxmox. After much trial and error I would like to check something with you, maybe I am hoping for too much.
I'm tasked to set up a HA cluster, where nodes can just fail and containers running on said nodes get automatically fired up on the remaining nodes. So, High Availability.
Now, I've set up a 3 node cluster with the latest proxmox version (pve-manager/3.2-4/e24a91c1 (running kernel: 2.6.32-29-pve)), got fencing and HA working, etc.
To test the clustering, I can online migrate (locally hosted) CTs around with zero downtime, migration taking about 40 seconds or so total.
Next I've created a 4th machine which runs a GlusterFS server (v 3.4.2), and added one of its volumes as share to my proxmox cluster.
I've then created another container, hosted on this shared storage. I can also online migrate this container around, although it takes longer (and the downtime in particular is much longer). But that's another thing to sort out.
More crucially --- if I pull the power cord on the node running that second container (living on the Gluster share...), the container does not get fired up somewhere else by the resource group manager. Worse, I can't even manually "migrate" --- or just fire up at all --- that container. If I try to do that, I get something along the lines of "no route to host".
So what am I doing wrong? I of course would like to see the proxmox cluster recognising the node failure and fire up that container on one of the two remaining nodes ASAP!
Thanks for helping me out!