Deciding between Proxmox and VMWare

I do want to just GENERALLY note that proxmox (like other clustered software) does not look at individual nodes as important. the principal of "cattle not pets" applies here- individualistic elements such as names and IPs are not really important. The "proper" way to deal with any changes is to get rid of a node and add a new one, and I agree that means that a lot of customization is left to user provided automation means. In that respect, other clustered options (Xen, VMware) maintain a dedicated management instance that doesnt need changing has benefit over managing your api heads seperately; I also agree with you that using name/ips as unique identifiers is not ideal.

While I am one of those that does like the mgmt instance (or a few put where suitable, not necessarily on the nodes even), it would be perfectly fine to have all nodes equal in this which is how the GUI w/API was built. I would even IP range scan for open 8006s if need be. :D But I really have not seen it documented anywhere that it is VITAL to avoid ever reusing a node name / address. Even if it was documented, there's leftover SSH key remnants in the cluster after dead nodes. The direction is to move towards the SSL API and ditch the SSH altogether, fine. Then again the proxy is serving both REST API (no CSFR check) and requests coming from the JS SPA (needs to prevent CSFR) in one's browser. Odd. Was told it won't be split. It's whole separate topic though.I would totally accept if a dead node cannot ever get the same name/ID as new node, but then the API should reject joining a new node with same name or auto-assign names. It has to be built in. It would be okay in the sense at least historical syslog records would be attributable to machines past and present, but again it should be for names/IDs not IPs. For now the pvecm updatecerts is broken which manifests when you introduce a new node with same name/IP as old (even by accident).

But having said all that, once you understand those design criteria, the system is completely workable and doesnt pose any ACTUAL limits. There isnt any NEED to have replacement nodes named the same as prior- they're cattle, not pets.

Correct, but it's a bit like saying there's no need to put lids on manholes, just pedestrians need to stop sleepwalking staring into their cellphones.

This is true FOR ANY CLUSTER. why would you ever do that?! dont move clusters. stand them up anew.

Sometimes you do not move cluster, sometimes it gets "moved" by your helpful network folks because those are separate people from you. If it's environment where you have full control and you needed to move it, you can even SNAT/DNAT the addresses, but if you are put in front of a situation that from tomorrow your network segment is to be different prefix oh and btw no worries the DHCP/RA will be dishing it out accordingly ... but then you realise yours is a PVE cluster... :)
 
Correct, but it's a bit like saying there's no need to put lids on manholes, just pedestrians need to stop sleepwalking staring into their cellphones.
I would disagree on the characterization. reductio ad absurdum. Generally speaking, software development is the art of prioritizing whats necessary over whats nice to have. Perhaps if enough people with your usecase kept falling into those manholes the devs would be inclined to "patch them" (pun intended ;) I'm guessing my usecase is different from yours.

Alternatively, as proxmox is an open source project, you can provide the patch we'd all enjoy. This is not an option for VMWare...

Sometimes you do not move cluster, sometimes it gets "moved" by your helpful network folks because those are separate people from you.
This isnt a proxmox issue. its a control issue. No software can be designed to address that particular problem (at least not in my awareness.)
 
I would disagree on the characterization. reductio ad absurdum.

On purpose. :)

Generally speaking, software development is the art of prioritizing whats necessary over whats nice to have. Perhaps if enough people with your usecase kept falling into those manholes the devs would be inclined to "patch them" (pun intended ;) I'm guessing my usecase is different from yours.

The bug reports (3 of them) have been all open some months ago, the "solution" could have also been to put big ALL CAPS on any cluster joining that your node needs to have been "never seen" by that name or IP.

Come to think of it, this is another fundamental problem that if (the quorum approved surviving rest of ) the cluster fs can't cleanse itself after a node loss to the point that a previously configured node suddenly can disrupt it, how does one handle situations when a rogue machine goes on a rampage reboots. Historically, though that's years ago, there even was a bug dealt with that the limited cluster fs was getting overfilled. First question was "why do you need so many entries."

Software development is also the art of catering for graceful failures because everything fails all of the time, which is the very reason one runs clusters ...

Alternatively, as proxmox is an open source project, you can provide the patch we'd all enjoy. This is not an option for VMWare...

I don't think I will get it pulled in - based on the answers I was getting so far. Sure I can patch for myself but if you patch too much, you know you might as well get all new clothes. ;)

This isnt a proxmox issue. its a control issue. No software can be designed to address that particular problem (at least not in my awareness.)

Well, that's what DHCP and SLAAC are for, or so I had thought. Even if they were shouting aroun mDNS at the beginning ... anything, really anything. I have not HAD to assign static IPs to something since 90s ...

EDIT: I static assign the leases.
 
Last edited by a moderator:
This is not what I meant. I meant a node that actually dies (you can't plan for that), so you delete it from the (rest of) the cluster, successfully, you install a new node but give it same name/IP and everything gets messed up.
How does that happen? Never experienced it and we changed EVERYTHING including the mainboard and we never had a problem bringing the node back to live without any problem. Maybe if you're running on a single machine from your nearest discounter with one crappy consumer SSD without any backups.

If you wish to have a singular endpoint to the management api, that is correct, but I would restate it "Getting a single api endpoint is as simple as creating a one to many NAT. you can even load balance multiple API requests across multiple nodes."
Running everything behind a reverse proxy with sticky node cookies works like a charm. If one node goes down, you'll be redirected in the background and need to authenticate again. Besides that, you will not have any problems running it.
 
How does that happen? Never experienced it and we changed EVERYTHING including the mainboard and we never had a problem bringing the node back to live without any problem. Maybe if you're running on a single machine from your nearest discounter with one crappy consumer SSD without any backups.
This was doing it on purpose as a test. Like when they turned off power supply to the cooling pumps in Chernobyl to see if the backup system works well ...

I don't know how it will happen when it happens. Not sure what the "single machine" meant, I was describing a cluster node going down.
 
I don't know how it will happen when it happens. Not sure what the "single machine" meant, I was describing a cluster node going down.
Error between brain and keyboard ... I meant "simple", a cheap machine from the discounter in which you cannot source any parts to repair it. Yet even there, you would get it up again from if the disk is not corrupt. It's just Linux, so it's very change-forgiving.

Altough technically possible, I find your "readd a node and f*ck up everything" construct very unlikely, yet others have said so too.
 
Error between brain and keyboard ... I meant "simple", a cheap machine from the discounter in which you cannot source any parts to repair it. Yet even there, you would get it up again from if the disk is not corrupt. It's just Linux, so it's very change-forgiving.
It's not really about having a dead node, or even dead drive that's the problem. Missing node is not a problem. Problem is adding node a running ssh-keygen -R :D

Altough technically possible, I find your "readd a node and f*ck up everything" construct very unlikely, yet others have said so too.

https://bugzilla.proxmox.com/show_bug.cgi?id=4252#c11

Not gonna say any more on this ... ;)
 
It's not really about having a dead node, or even dead drive that's the problem. Missing node is not a problem. Problem is adding node a running ssh-keygen -R :D

https://bugzilla.proxmox.com/show_bug.cgi?id=4252#c11

Not gonna say any more on this ... ;)
Yes, I read the bug report and IMHO it is a rare corner case for people advanced enough to do things the pvecm is not anticipating.
 
Yes, I read the bug report and IMHO it is a rare corner case for people advanced enough to do things the pvecm is not anticipating.
I think every newbie runs into this because they launch cluster, then remove something from cluster (mind you they do follow the docs and do not bring up the node to life again), but then then add fresh node with same name/IP. I have to admit I was about to have a PXE deployment script that was about to take first "unoccupied" name/IP and add itself to the cluster before I ran into this. I hope no one runs into it with big cluster already existing.
 
I think every newbie runs into this because they launch cluster, then remove something from cluster (mind you they do follow the docs and do not bring up the node to life again), but then then add fresh node with same name/IP. I have to admit I was about to have a PXE deployment script that was about to take first "unoccupied" name/IP and add itself to the cluster before I ran into this. I hope no one runs into it with big cluster already existing.
Yes, maybe I forgot how you would explore this as a rookie.
 
Yes, maybe I forgot how you would explore this as a rookie.
I just do not like how it is undocumented. Also it is a bug, maybe little ROI on it for the folks who are looking to replace SSH completely later on, but it should be documented. There also should be a script for anyone who needs to refresh all their keys (because it really is not necessary to dismantle the cluster). Felt like no one cares, instead we go arguing whether ssh-keygen is broken or not. I saw at least one post on this pitfall every week since I came like ... 3 weeks ago. Good thing is it made me read through the code and well, it's Perl ... ;)
 
Maybe it's exactly what OP wanted? A vivid discussion about a problem with one aspect that is not that smooth.
LOL, @LnxBil is right. I'm enjoying the open discussion but admittedly, hard pressed to keep up with the posts. You guy's are quick! But I certainly have reading tonight. So @Esiy no apologies needed at all! :)

I also do agree, searching on VMware community posts about folks trying to move from Proxmox VE to VMware would give the flipside perspective. Credit to who mentioned that and apologies, lost it in the messages now.

Was about to start to install PVE (Correct me if I'm using this short acronym for Proxmox VE) but going to setup ZFS on the test AlmaLinux 9 OS I have there, to see how it performs against the tests I ran so far before installing PVE.
 
So @Esiy no apologies needed at all! :)
Ok I am relieved now. :D

I also do agree, searching on VMware community posts about folks trying to move from Proxmox VE to VMware would give the flipside perspective.

It's not like there's only these two. Someone else mentioned Xen generally before, so I would maybe wonder what the French have to say about that. It's really not like one is better than the other, it really depends on what one runs on it in what kind of environment.

EDIT: Yes, I know someone would rush to say the GUI needs compiling and the networking is just "different" there. ;)

Was about to start to install PVE (Correct me if I'm using this short acronym for Proxmox VE) but going to setup ZFS on the test AlmaLinux 9 OS I have there, to see how it performs against the tests I ran so far before installing PVE.

PVE it is! No more hijacking then. ;)
 
Last edited by a moderator:
I also do agree, searching on VMware community posts about folks trying to move from Proxmox VE to VMware would give the flipside perspective. Credit to who mentioned that and apologies, lost it in the messages now.
If you discover show stopper threads over there, feel free to discuss those posts here if you feed they might affect you.

It's not like there's only these two. Someone else mentioned Xen generally before, so I would maybe wonder what the French have to say about that. It's really not like one is better than the other, it really depends on what one runs on it in what kind of environment.
Exactly. I'm running VMware and Hyper-V inside of PVE, because some customers need a VM on those machines. It's good to know them all and their limits. For me, PVE offers the low-level-possibility to do what I want the way I want it all with Debian, which none of the others offeres. Hyper-V is totally fine for just running a couple of Windows hosts, but if you want to run Linux over there, I/O performance is not really great.
 
Exactly. I'm running VMware and Hyper-V inside of PVE, because some customers need a VM on those machines. It's good to know them all and their limits. For me, PVE offers the low-level-possibility to do what I want the way I want it all with Debian, which none of the others offeres. Hyper-V is totally fine for just running a couple of Windows hosts, but if you want to run Linux over there, I/O performance is not really great.
Or the other way around. Actually I was getting at the *-ng product, not the dinosaurs. But I do like Debian as a base (for lots of things), it's why I gave PVE a try. LXD is also rushing ahead, maybe not with as much glitter, but it feels e.g. well integrated with ZFS backstore. It's really about what one prioritizes in each partcicular use case.
 
LXD is also rushing ahead, maybe not with as much glitter, but it feels e.g. well integrated with ZFS backstore. It's really about what one prioritizes in each partcicular use case.
Yes. Looking forward to ZFS dataset management inside of LX(C) containers with OpenZFS 2.2+. The live-migration of containers is still a problem, therefore we almost exclusively use KVM. (just keep a little bit of the original topic: VMware does not have containers up to now ... so a big + for PVE)
 
Yes. Looking forward to ZFS dataset management inside of LX(C) containers with OpenZFS 2.2+. The live-migration of containers is still a problem, therefore we almost exclusively use KVM. (just keep a little bit of the original topic: VMware does not have containers up to now ... so a big + for PVE)
On the surface yes, then given some of the security implications (and if you plan other users), it might not be unwise to simply run a VM for (any particular set of) containers on a hypervisor (which does not support them). You could even just systemd-nspawn your things (and if it's not that kind of things then there's k8s).
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!