Hi
I'm trying to understand HA and if / how it fits my needs. To learn more, I set up a testbed consisting of 3 nodes. I don't have any real server lying around - so for testing some SFF Desktops need to suffice. (HP Prodesk 400 G3 and a Dell Optiplex with identical specs, each with two drives, one for proxmox, one for ceph. Unfortunately I can't fit additionaly network cards into these devices but for testing the single 1Gb interface should work (at least I think so).
I succesfully configured ceph for shared storage and installed some VM's. Networking is done via SDN and works as expected. One of the VM's is an Opensense for connectivity to the outside world, the other ones are a typical small windows domain (2 DC, 1 generic server, 1 client). I did not test any linux vm's yet but I expect don't any issues / differences there.
I can migrate VM's without issue. Also, when I command a node to shutdown, it migrates all VM's to other nodes as expected.
What I don't understand:
I simulated a node failing by unplugging the network cable. It quickly showed as offline in the Web-GUI.
Now either another node should simply take over (if RAM is synced over the net) or at least the VM's should have been restarted on another node.
But in fact: After several minutes, nothing happened. I replugged the node and everything went back to normal. I could tolerate the VM's rebooting somewhere else but several minutes downtime and only coming back after manual intervention (=fixing the fault) is a bit too long for me.
I think I did something wrong but neither the docs nor google gave me a hint. Did I overlook some config, did I missunderstood proxmox HA entirely or something else?
Not sure which info about the system might help - just let me know, I'm happily providing anything needed.
Greetings,
Dura
I'm trying to understand HA and if / how it fits my needs. To learn more, I set up a testbed consisting of 3 nodes. I don't have any real server lying around - so for testing some SFF Desktops need to suffice. (HP Prodesk 400 G3 and a Dell Optiplex with identical specs, each with two drives, one for proxmox, one for ceph. Unfortunately I can't fit additionaly network cards into these devices but for testing the single 1Gb interface should work (at least I think so).
I succesfully configured ceph for shared storage and installed some VM's. Networking is done via SDN and works as expected. One of the VM's is an Opensense for connectivity to the outside world, the other ones are a typical small windows domain (2 DC, 1 generic server, 1 client). I did not test any linux vm's yet but I expect don't any issues / differences there.
I can migrate VM's without issue. Also, when I command a node to shutdown, it migrates all VM's to other nodes as expected.
What I don't understand:
I simulated a node failing by unplugging the network cable. It quickly showed as offline in the Web-GUI.
Now either another node should simply take over (if RAM is synced over the net) or at least the VM's should have been restarted on another node.
But in fact: After several minutes, nothing happened. I replugged the node and everything went back to normal. I could tolerate the VM's rebooting somewhere else but several minutes downtime and only coming back after manual intervention (=fixing the fault) is a bit too long for me.
I think I did something wrong but neither the docs nor google gave me a hint. Did I overlook some config, did I missunderstood proxmox HA entirely or something else?
Not sure which info about the system might help - just let me know, I'm happily providing anything needed.
Greetings,
Dura
Last edited: