> Yes. Just use ZFS as storage backend. But that only works for local storage.
Exactly. Not an option for a HA-setup.
Is that only because of the backup functionality (snapshots)? Too bad that wasn't described in the wiki, as this changes quite a lot.
https://pve.proxmox.com/wiki/High_Availability_Cluster_4.x#HA_Cluster_Maintenance_.28node_reboots.29 states:
If you need to reboot a node, e.g. because of a kernel update, you need to migrate all VM/CT to another node or disable them.
However, the "More" > "Migrate all VMs" tool seems to...
No answers?
Anway, in case someone else stumbles upon this: it works. I just did it and could upgrade the cluster by re-installing each node.
I just had the problem that multicast didn't want to work at first, even though it worked before with the Proxmox 3.4 cluster.
With my Netgear 24 port...
I have a 3-node Proxmox 3.4 HA cluster running since a few months and need to upgrade to Proxmox 4.0.
I'm not very happy with uprading Wheezy to Jessie in-place (bad experience), so I'd prefer to reinstall each server one by one using a clean Jessie installation (in order to sleep better...
How comes nobody really cares about this problem? IMHO what I'm experiencing *could* indicate that there is some serious bug in Proxmox...
FYI, today the cluster crashed once again after rebooting two nodes (not at the same time). Once again rgmanager crashed (see below) and I had to...
To me, bonding in this case is just a workaround.
Bonding reduces the possibility of a total network outage, but the original problem still persists: The Proxmox cluster becomes unusable when the network is inoperable for a short amount of time. I still wish somebody could give me an answer how...
okay, the network is working fine with bonding and also isn't affected by switch reboots.
could not test single nic failures, though (no physical access)
Hmmm, ok, it should work as long as the connection between the two switches is working.
Again the same example:
Node #2 can't reach the primary switch and will switch to the backup switch.
Node #1 is not aware of that (since both links of node #1 are up) and will still try to reach node #2...
Please correct me if I'm wrong. But IMHO bonding won't help here.
Bonding just selects one of two interfaces as the active one. It can't understand what is happening behind the switches.
It will just help in case one of the switches completely fail (which, yes, matches the scenario when a...
Well, I just tried again and it is exactly reproducible.
After network goes down I get a "#1: Quorum Dissolved" on all nodes and all VMs are being shut down.
When network is up again after two minutes, the cluster is still inoperable.
None of the nodes is able to shut down properly (it...
I tried only once.
As said, I would try it again (it's just about rebooting the master switch), but I'd like to have some plan what I should monitor/try during that process, besides syslog.
So, if you have any directions or idea, please let me know and I'll try again trying to collect all...
It was functional, after rebooting the nodes at least.
I didn't check right after the network failure since to me missing Quorum was the main problem.
Why?
Syslog of node #3 follows:
Aug 8 11:54:28 metal3 kernel: tg3 0000:01:00.1: eth1: Link is down <========= master switch reboots
Aug 8 11:54:28 metal3 kernel: vmbr0: port 1(eth1) entering disabled state
Aug 8 11:54:28 metal3 kernel: vmbr0: topology change detected...
Good point!
Syslog of node #1: link
Here is node #2:
Aug 8 11:54:28 metal2 kernel: tg3 0000:01:00.1: eth1: Link is down <========= master switch reboots
Aug 8 11:54:28 metal2 kernel: vmbr0: port 1(eth1) entering disabled state
Aug 8 11:54:28 metal2 kernel: vmbr0...
Everything in that directory is also included in the "syslog" file.
I could do another test (reboot that switch again) and see what happens.
In that case please let me know what I should watch (syslog? send ICMP PINGs between the nodes? some multicast test? ...?)
Note that each change of the root bridge causes a network outage due to how STP works. This means that my test (reboot of the master switch) caused the network to go down, up, down again and up again. Could that explain the "strange leave/join messages" ?
Fencing does not need the local LAN as...
I thought that Proxmox is (deliberately) sensitive to network outages (somewhere in the Proxmox docs I read that a highly reliable network is very important)..
Isn't a node expected to stop all HA services when it is out of quorum?
Anyway, here is an annotated excerpt of syslog (node...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.