Sorry to revive a dead thread, but since no one else answered, and I had to figure this out for myself, maybe this will help someone else.
I just got this working, here's the process:
On the iSCSI target, make sure that iSER is setup.
login to each PVE node and run the following:
Then create...
Well, I do see one issue with your NFS export. You aren't supposed to use fsid=0 as that has special meaning in NFSv4 (pseudo root export), and further, that special meaning is deprecated (it's now recommended to just use NFSv3 style exports). Even if you use the deprecated semantics, you're...
I had all sorts of issues with NFSv4 until I upgraded the NFS server's kernel to > 3.6. (I'm running a 3.7 kernel now). I wouldn't expect to see such issues if you're mounting ver=3, but it's worth a try.
From the "getting started" guide:
If this is done, is it going to break anything in Proxmox? I assume Proxmox uses the standard OVZ toolchain, so I would expect this to be transparent?
My nodes have two interfaces:
1) bonded GigE (pubic IP, PVE services)
2) 10Gb Infiniband with IPoIB (private subnet, used for accessing NFS storage)
When I take a node offline, it would seem more efficient to use the IPoIB address for migrating any running containers to another node. Is...
My nodes consist of Supermicro dual node servers (two servers in a 1U case with a shared PSU). Because two nodes share a PSU, it's not practicable to utilize a switched PDU for fencing (it would power off two nodes). That means I have to use IPMI for fencing, which will clearly fail if a node...
Thanks for the reply. I'm not seeing any important differences between your failover specification and mine (aside from the fact I only have one domain). I assume you are able to create a new CT and set it to HA from the web UI? That's what breaks for me. I suppose it doesn't matter much since...
I've read this til my eyes bleed. It answers none of my questions:
1) what is the behavior when there is no failover domain (e.g. the default)? I would assume it's the same as unordered+unrestricted, but it doesn't say.
2) what is the behavior with unordered+unrestricted with regards to node...
It doesn't help in any case. Once I added the above stanza, I can no longer add any new CT's with HA. The UI gives "unknown error 500" when I try to commit. Without the failover domains, it works.
There's no features in PVE for that. Either your application needs to be written to do so (e.g. using OpenMPI) or you want a platform such as OpenSSI, which makes nodes appear to be one large system.
So it appears that manually adding the CT's to a failover domain changes the behavior:
<rm>
<pvevm autostart="1" vmid="100" domain="cluster1-default"/>
<pvevm autostart="1" vmid="101" domain="cluster1-default"/>
<pvevm autostart="1" vmid="102" domain="cluster1-default"/>...
Also, if a solution is possible via event scripting, is this document accurate?
https://fedorahosted.org/cluster/wiki/EventScripting
It claims that central_processing = '1' needs to be set in cluster.conf and that the overall operation of rgmanager is affected.
I setup HA on a four-node cluster. On node1 I created 3 CT's and configured them for HA. I then stopped rgmanager on node1 to test relocation. All three CT's were migrated to node2. This seems suboptimal. Is there any way to cause them to be migrated round-robin (first CT to node2, second to...
Interesting you mention this. For my storage I built a 2-node NFS cluster (corosync/pacemaker, failover mode) on top of a shared fibre channel loop, and I was considering the possibility of putting the NFS servers into the same cluster as the PVE nodes, since that would also provide quorum to...
If I need to move the machines to a different facility or power is lost, rebooting them one at a time isn't an option. Nevertheless, I think I've found my answer.
Actually, I'm realizing the problem (and I provided bad information in my post). I didn't actually bring up all 8 nodes, I only...
I set up an 8 node cluster (pve 2.x) in the lab. At some point I needed to reboot all the machines, so I did (I was logged into all of them with ssh and so just issued reboot command).
After they all came back up, I'm hit with the loss of quorum. Each time I reboot a node, it informs me that...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.