I realize this is my first post, because I tend to avoid asking questions if I can find the answer on my own. Currently I am unable to find a solution so:
I was running one system with Proxmox and recent got some servers for cheap, so I decided to test out clusters and making my VMs HA enabled.
I currently run FreeNAS with NFS shares as the storage for my backups and VM images. Each NFS share is a unique drive pool to avoid bottle necking a pool.
I have a R610 which I created the cluster on, and a R710 which I mirrored my original Proxmox install to see if it would have problems joining. It did(401 unauthorized ticket error after I put in the password for the server, if I remember correctly) and I had to reset the Corosync and other cluster files to even be able to access the R610's fresh install, but I couldn't salvage the R710 so I did a fresh install.
I was then able to join the cluster and I moved some backups of my VMs over. I setup HA on one of them and tested to see if it would push the VM over if the node it was running on was shutdown by mistake or whatever. It worked, but then the VM got stuck in a "fenced" state and wouldn't start back up when shutdown, which I was able to figure out how to re-enable it and it started again. I tested it again and the VM moved back and forth between nodes when one was shutdown without issue, until today.
So, I was shutting down the R610 to install some more memory and after about 1 minute, the R710 reboots. I finished installing the memory and let them both come up. I shutdown the R610 again, and the same thing happened with the R710. So I tried it the other way to see if the R710 had the same effect. The first 2 times it didn't, but now when I shutdown/restart/power reset either machine then the other one will reboot within a minute without shutting down and the active VM doesn't seem to be attempting to move over at all before the system shuts down. I'm getting more parts to upgrade the systems and I'd like to be able to install them without bringing everything down during the day. It just seems like any loss of communication between the two nodes kills the other.
I've included dmesg as .txt files from each server and pictures, but please let me know if there is anything else I can include to help someone point me in the right direction.
Also, I'm not sure if it is relevant, but the "ACPI: SPCR: Unexpected SPCR Access Width" and the "FS-Cache Duplicate cookie detected" are new errors since I upgraded today. I wasn't able to find a fix for the SPCR one, and the FS-Cache just seems to be from remounting the NFS share. I was previously on 6.1-5. and the only thing I changed config-wise after the update was to add the MAC addresses of each machine's primary network card so I can use the WOL feature from the web interface.
BTW, weirdly enough, when I just pull the network cable from one the other doesn't reboot. Then I can shutdown the system and once one of the systems shuts down, then I can re-plug the cable and nothing changes. It is almost like the machine being shutdown is telling the other to hard reset for some reason after it gets powered off.
Thanks in advance for any help that anyone can give.
I was running one system with Proxmox and recent got some servers for cheap, so I decided to test out clusters and making my VMs HA enabled.
I currently run FreeNAS with NFS shares as the storage for my backups and VM images. Each NFS share is a unique drive pool to avoid bottle necking a pool.
I have a R610 which I created the cluster on, and a R710 which I mirrored my original Proxmox install to see if it would have problems joining. It did(401 unauthorized ticket error after I put in the password for the server, if I remember correctly) and I had to reset the Corosync and other cluster files to even be able to access the R610's fresh install, but I couldn't salvage the R710 so I did a fresh install.
I was then able to join the cluster and I moved some backups of my VMs over. I setup HA on one of them and tested to see if it would push the VM over if the node it was running on was shutdown by mistake or whatever. It worked, but then the VM got stuck in a "fenced" state and wouldn't start back up when shutdown, which I was able to figure out how to re-enable it and it started again. I tested it again and the VM moved back and forth between nodes when one was shutdown without issue, until today.
So, I was shutting down the R610 to install some more memory and after about 1 minute, the R710 reboots. I finished installing the memory and let them both come up. I shutdown the R610 again, and the same thing happened with the R710. So I tried it the other way to see if the R710 had the same effect. The first 2 times it didn't, but now when I shutdown/restart/power reset either machine then the other one will reboot within a minute without shutting down and the active VM doesn't seem to be attempting to move over at all before the system shuts down. I'm getting more parts to upgrade the systems and I'd like to be able to install them without bringing everything down during the day. It just seems like any loss of communication between the two nodes kills the other.
I've included dmesg as .txt files from each server and pictures, but please let me know if there is anything else I can include to help someone point me in the right direction.
Also, I'm not sure if it is relevant, but the "ACPI: SPCR: Unexpected SPCR Access Width" and the "FS-Cache Duplicate cookie detected" are new errors since I upgraded today. I wasn't able to find a fix for the SPCR one, and the FS-Cache just seems to be from remounting the NFS share. I was previously on 6.1-5. and the only thing I changed config-wise after the update was to add the MAC addresses of each machine's primary network card so I can use the WOL feature from the web interface.
BTW, weirdly enough, when I just pull the network cable from one the other doesn't reboot. Then I can shutdown the system and once one of the systems shuts down, then I can re-plug the cable and nothing changes. It is almost like the machine being shutdown is telling the other to hard reset for some reason after it gets powered off.
Thanks in advance for any help that anyone can give.