Went through, migrated everything over, deleted the second node, and now all appears to be working fine.
I guess I'll figure out if I'm going to buy another node or not and figure out where to go from there. Thanks guys, sorry for not reading well enough.
Obligatory H/W info for 2x identical boxes:
Dell R610 (Gen 1)
Proxmox 4.3-1/e7cdc165
8 x Intel(R) Xeon(R) CPU E5507 @ 2.27GHz (2 Sockets)
64GB DDR3 ECC RAM
Storage is NFS to a Synology DS1815+ w/ Read/Write SSD Caching
Units running in HA
Okay, so I'm well aware I'm crazy, but I cannot for...
Okay, so I've finally isolated it 100% to a multicast issue. I'm having the same issue across multiple switches, both managed and unmanged, but what I've done is edit the corosync file (per https://pve.proxmox.com/wiki/Troubleshooting_multicast,_quorum_and_cluster_issues#PVE_.E2.89.A5_4.x) to...
Okay, so I've narrowed it down to this: when the HA fails, I can still navigate both boxes from the single GUI interface. If I restart the corosync service on both boxes within 10 seconds or so, the HA group goes back to normal for 3-8 minutes.
It definitely looks like a multicast issue, but...
Not a really stupid question' :) Yes, I'm mixing my cluster and service networks. It's in my homelab; less than 20 clients at any time.
I'm going to try this right now and see how it operates.
The LACP is to best communicate with my Synology DS1815+ running in 4GB LACP at the moment as NFS...
Okay, I've got it all working if I take LACP out of the picture; single ethernet cable on each server going to the same switch in a standard network mode seems to keep the HA group online just fine.
Going to do my best to rule out these HP switches, but worst case scenario I put them out a...
I've gone through and verified the following:
Nodes are on the same subnet
Nodes can unicast each other just fine far below 1% loss
Nodes can resolve each others hostnames just fine
For the life of me, I can't get omping working. I enter the node names and it tells me "omping: Can't find local...
Not 3 minutes later, the HA failed.
I'm well aware that I'm crazy, but seriously...
EDIT: I have not changed anything digitally or physically from the last post until now (#9 to #10)
Okay, just rebooted both boxes and unplugged (not disabled) 3 of the 4 ethernet cables on each box.
Everything looks fine in this configuration, I figure I'm going to connect cables one by one and see when it breaks.
Sorry, mis-typed. From the first box, the GUI shows the second offline, and from the second box the GUI shows the first offline. The two units over CLI can get to each other (ping & ssh).
Yup, SSH works fine. Takes a second for the password field to appear, but works fine and fast after that.
No pings appear to drop. Out of nearly 100,000 pings I ran, I dropped 6 packets.
I have it set up like this:
Proxmox 1
bond0 | Linux Bond | Active: Yes | Autostart: Yes | Port/Slaves...
Morning everyone.
I've set up two Dell R610's with Proxmox running in HA and 4gb LACP to HP v1910-16's (see image below).
When the Proxmox boxes turn on, they see each other, and they act as though they're in HA. The logs say otherwise. The two boxes never lose connection with each other...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.