New Cluster for VMware migration

admin55

New Member
Dec 29, 2025
8
0
1
We are all in on Proxmox. Purchased subscriptions for a 3 host cluster. Question I have is should we stick with iscsi or go NFS?

Current Vmware cluster
3 Dell Hosts with 25gb nics doing MPIO through 2 25g switches
Doing iscsi to an IBM 5200 SAN. This is been flawless for almost 3 years with amazing disk performance and reliability.

We have a new 3 node proxmox cluster setup basically the same way as above.

Question is do we stick with iscsi which we are leaning towards even without thin disk or do we do NFS? Space on the SAN isn't much of a concern even doing thick disks for our VM's

Any insight would be appreciated.
 
Hi @admin55,

Congratulations on making the move. Either option is fine, frankly. Both protocols have been around for decades. It depends on your requirements, management familiarity, etc. Why not try both? Just create two storage pools and see which one works better for you.

Also, keep in mind that its not about protocol - it is about implementation and infrastructure. You could ask IBM for their recommendation.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: Johannes S
I've done this for one of our mid sized clusters, you need to ask a few very important questions. Are you wanting to maintain the same configuration environment of SAN + Shared LUN + VM's with snapshot / etc? Are you wanting to migrate to a different architecture entirely? Are you wanting to sacrifice snapshot and other abstraction support? Each of those answers requires a different solution and those solutions have varying levels of effort and knowledge required.
 
Generally yes I am trying to maintain the same configuration. If I go iscsi shared storage in proxmox am i loosing snapshots? If that is the case then how do a backup all my VM's with Veeam? Which is how I do it now on my vmware cluster.
 
Usually you would use LVM/thick on a ISCSI or fibre-attached SAN: https://pve.proxmox.com/wiki/Storage:_LVM
Before PVE9 you couldn't use snapshots on it though. Beginning with PVE9 a new feature was introduced which allows qcow-based-snapshot-chains as "technology preview".

They have some caveats though and are (as said) still in preview status so I wouldn't relie on it for production.

@bbgeek17 wrote some great writeups on using LVM and the snapshot feature, which you might want to read:
https://kb.blockbridge.com/technote/proxmox-lvm-shared-storage
https://kb.blockbridge.com/technote/proxmox-qcow-snapshots-on-lvm
https://kb.blockbridge.com/technote/proxmox-tuning-low-latency-storage
https://kb.blockbridge.com/technote/proxmox-qemu-cache-none-qcow2

Basically it boils down that if you configure everything correctly LVM/thick (with raw instead of qcow) you will give you a rock-solid storage but without thin-provisoning and without snapshots. With the new feature you will have snapshots but they have some caveats ( including a severe performance pennalty) and still need to mature.
For backups however this doesn't matter: The native backup features of ProxmoxVE don't need snapshot support on the storage since the virtualization layer (qemu/kvm) has it's own internal mechanism to snapshot the vm'state which can be used for backup. ProxmoxVE has two native backup features:
- Creating a full backup to an archive file which can be on any supported file system. One popular approach is to use a NAS as target for them. Their advantage is, that you don't need any other software except ProxmoxVE to restore them. Their disadvantage is that they will always full backups so will eat up some space. It might still be a good idea to have some of them (maybe once a month or something like that) for your most important vms, where even an older state is better than none
- Creating a backup to a ProxmoxBackupServer. PBS has a quite smart deduplication mechanism and offers several options to implement ransomware protection and offsite backup (Tape, external USB storage ("removable datastore" or syncing Backups between Proxmox Backup Servers). The last version (4.1.) also introduced support of S3-cloud storage for offsite backup, but it's still in technology preview (so good enough for testing, but nothing for a sole offsite backup).

Since both of them use the internal mechanism of qemu they don't need storage support for backups. The same mechanism can also be used by other backup software to implement ProxmoxVE support. In fact Veeam did this in their ProxmoxVE plugin, so the lack of snapshots is not a problem if you want to continue using Veeam. What would makes me weary, that the reports in this forum (you can search for it) still seems to indicate that it still needs to mature and that the support for application-aware backups (like for domain-controllers or SQL databases) still lacks compared to their vmware integration. Treating the VMs as baremetal servers and installing the veeam agent inside of them seems to work however.
So I would consider switching to ProxmoxBackupServer for your regular VM backup and enjoy it's good integration in the HypervisorUI and just licence the necessary minimum to run Veeam agents for application-specific stuff. Of course depending on your budget and other constraints that might turn out to not be a viable option. But if you need to do a migration anyhow for the hypervisor now why not also rethinking your backup approach?
The BackupServer is (like PVE) completely OpenSource and has no features locked away behind a paywall. You still need to pay to disable the nag screen, get access to the enterprise repo and payed support, but for doing an evaluation you don't need to pay a penny.
 
  • Like
Reactions: Onslow and bbgeek17
Generally yes I am trying to maintain the same configuration. If I go iscsi shared storage in proxmox am i loosing snapshots? If that is the case then how do a backup all my VM's with Veeam? Which is how I do it now on my vmware cluster.

So LVM is not a cluster aware volume manager and the way they get around that is only one host will have write capability per LVM thick volume shared over iSCSI. You end up giving up thin provisioning and snapshots, if you give a VM 250GB then that VM is going to use 250GB of storage. To get VMWare like behavior out of shared storage you will need to configure and use a cluster aware filesystem, which is what VMFS. GFS2 and OCFS2 are what is available in the Linux space with OCFS2 the easier of the two to setup. I've migrated many VMWare shared storage VM's to this exact setup, enough to have written in house guides on how to do it, it works well but is a solution for legacy SAN based infrastructure.
 
Hey thanks for the reply. So i can do without snapshots thats fine. But are you saying that i can't do LVM in a cluster with shared iscsi disk on my san? Its probably time I just put a ticket in with support..
 
Hey thanks for the reply. So i can do without snapshots thats fine. But are you saying that i can't do LVM in a cluster with shared iscsi disk on my san? Its probably time I just put a ticket in with support..

LVM + ISCSI means you must do think provisioning. LVM is a volume manager not a file system, LVM itself doesn't have a method for multiple hosts to coordinate read / write locking and cache coherency resulting in data corruption. The way to get around that is each VM gets a separate dedicated volume and only the host the VM is running on will interact with that volume. It's very similar to Raw Disk Mapping from VMWare. When a VM migrates from one host to another, the first host gives up control of the LVM volume and the second host takes it over. Thick provisioning ensures that no hosts fight over writing to the same blocks. This is also why snapshots do not work.

This spells out the limits of LVM.

https://kb.blockbridge.com/technote...#key-limitations-of-lvm-shared-storage-in-pve
 
Last edited:
LVM + ISCSI means you must do think provisioning. LVM is a volume manager not a file system, LVM itself doesn't have a method for multiple hosts to coordinate read / write locking and cache coherency resulting in data corruption. The way to get around that is each VM gets a separate dedicated volume and only the host the VM is running on will interact with that volume. It's very similar to Raw Disk Mapping from VMWare. When a VM migrates from one host to another, the first host gives up control of the LVM volume and the second host takes it over. Thick provisioning ensures that no hosts fight over writing to the same blocks. This is also why snapshots do not work.

This spells out the limits of LVM.

https://kb.blockbridge.com/technote...#key-limitations-of-lvm-shared-storage-in-pve
Ok i got it now.. Thank you so much for spelling this out for me.. Very helpful
 
If your storage supports NFS (which is a big if see alexskysilk question) you can of course use it. The benefit would be that then you can use snapshots and thin provisioning and it will (mostly) feel like vmfs.

These benefits come with some costs though:
- First since you basically have another layer between the storage and the vm this will have some performance penalty. Block storage tends to perform better. This doesn't need to be a problem though if it's still good enough for you. So prepare to do some benchmarking for iscsi and nfs
- Second NFS isn't a secure protocol at all, but to be fair this is true for iscsi too. By default both have no encryption or other modern security features. This isn't a problem however if you run the connections between your storage and your ProxmoxVE nodes on their own, dedicated storage network where no other machine has access too.

In fact I saw a talk by Alexander Wirt from Proxmox Partner Company Credativ on FrosCON (German Linux and Opensource conference) where he mentioned, that NFS is his to-go-setup for smaller clusters in migration projects. On the other hand pros in this forum pros (like @LnxBil from Inett (another Proxmox partner) or @Falk R. (who does a lot of consulting gigs for companys migrating from Vmware to ProxmoxVE) mentioned that in their experience LVM/thick is quite solid for existing SANs, the lack of snapshots can be mitigated through backups and more performant depending on the workloads.

So as said: Be prepared to run some benchmarks ;)
 
  • Like
Reactions: Onslow
These benefits come with some costs though:
- First since you basically have another layer between the storage and the vm this will have some performance penalty. Block storage tends to perform better. This doesn't need to be a problem though if it's still good enough for you. So prepare to do some benchmarking for iscsi and nfs
- Second NFS isn't a secure protocol at all, but to be fair this is true for iscsi too. By default both have no encryption or other modern security features. This isn't a problem however if you run the connections between your storage and your ProxmoxVE nodes on their own, dedicated storage network where no other machine has access too.

These are why we ended up just going with FCoE / ISCSI + OCFS2 and then adding the storage to Proxmox as type "directory". Let us reuse the very expensive Dell / EMC storage with UCS blades and get off VMWare. I've been using OCFS for nearly two decades and its' very stable, it was designed for massive Oracle Database clusters so definitely meets production ready standard. Linux kernel has built in support for it, Debian ships with the required tools and it's a grand total of two configuration files that must be identical on all systems (sounds perfect for Corosync).
 
Ocfs is not officially supported by Proxmox Server Solutions though, which might be a problem if you ever need their support. And other people in this forum reported quite different experiences regarding it's stability
 
Ocfs is not officially supported by Proxmox Server Solutions though, which might be a problem if you ever need their support.

It is supported by Debian and Linux kernel though.

https://packages.debian.org/stable/admin/ocfs2-tools
https://manpages.debian.org/trixie/ocfs2-tools/ocfs2.7.en.html

Proxmox team doesn't test that configuration and therefor can not validate it, which is a shame because it's an easy replacement for those wanting a drop in replacement for VmWare on the exact same expensive hardware they already purchased. Afterwards they can work on migrating to Ceph or other non-VmWare centric infrastructure.

And other people in this forum reported quite different experiences regarding it's stability

That is purely a documentation issue and not understanding how OCFS works. OCFS was made by Oracle as a way to support Oracles super expensive RAC software on Linux based clusters instead of the customer having to use Solaris or other Unix based OS's with built in clustering support. It's why I wrote our own internal quick start and it's really not hard. Most times I've fixed people's setups was due to either incorrect configuration (using disk block devices instead of mpath block devices) or trying to do it using commands instead of building the cluster.conf file and distributing it around. That or a FUBARd ISCSI configuration.

OCFS2 is old, like really old by linux standards, though just kinda normal by unix standards. It achieved stability and production readiness almost two decades ago as of today, ZFS is approximately the same age. Being old and stable means it doesn't natively support features like compression and deduplication and instead expects the SAN provider to manage that. So this is really something you use when you have to deal with SAN's, which is why I asked OP about keeping the same setup or rearchitecting.
 
Last edited:
It is supported by Debian and Linux kernel though.

https://packages.debian.org/stable/admin/ocfs2-tools
https://manpages.debian.org/trixie/ocfs2-tools/ocfs2.7.en.html

Proxmox team doesn't test that configuration and therefor can not validate it, which is a shame because it's an easy replacement for those wanting a drop in replacement for VmWare on the exact same expensive hardware they already purchased. Afterwards they can work on migrating to Ceph or other non-VmWare centric infrastructure.



That is purely a documentation issue and not understanding how OCFS works. OCFS was made by Oracle as a way to support Oracles super expensive RAC software on Linux based clusters instead of the customer having to use Solaris or other Unix based OS's with built in clustering support. It's why I wrote our own internal quick start and it's really not hard. Most times I've fixed people's setups was due to either incorrect configuration (using disk block devices instead of mpath block devices) or trying to do it using commands instead of building the cluster.conf file and distributing it around. That or a FUBARd ISCSI configuration.

OCFS2 is old, like really old by linux standards, though just kinda normal by unix standards. It achieved stability and production readiness almost two decades ago as of today, ZFS is approximately the same age. Being old and stable means it doesn't natively support features like compression and deduplication and instead expects the SAN provider to manage that. So this is really something you use when you have to deal with SAN's, which is why I asked OP about keeping the same setup or rearchitecting.
I had trid ocfs2 10year ago during 1~2 year, and I had a lot of lock problem if 1node of the cluster was going down. (like full cluster lock during 1~2min, the time that the lock was released). I have done benchmark test last year, and write performance was not super great with new block allocation. (but same with gfs2), not sure than thin provisionning can works fine with shared FS because of locks during the dynamic new block allocation.
 
  • Like
Reactions: Johannes S
I had trid ocfs2 10year ago during 1~2 year, and I had a lot of lock problem if 1node of the cluster was going down. (like full cluster lock during 1~2min, the time that the lock was released). I have done benchmark test last year, and write performance was not super great with new block allocation. (but same with gfs2), not sure than thin provisionning can works fine with shared FS because of locks during the dynamic new block allocation.


I have it working right now, thin via qcows no LVM involved. Yes cluster files systems have lower write performance then direct, it's the trade off for shared storage and you'll see the same with all of them including VMFS. OCFS use's a slot design so that each member is guaranteed time if there is contention, good practice is to have the slots be twice as much as the number of nodes. The deadlock behavior you described indicates a misconfiguration with o2cb.

https://docs.oracle.com/en/operating-systems/oracle-linux/6/adminsg/ol_locks_ocfs2.html
https://lwn.net/Articles/402287/

It's also crucial you mount it with noatime to ensure your not modifying metadata with every read and that using a sane value for O2CB_HEARTBEAT_THRESHOLD and O2CB_KEEPALIVE_DELAY_MS.

"_netdev,noatime,defaults 0 0"

If a system has lock on metadata and goes down suddenly (non graceful), then the others will wait for O2CB_KEEPALIVE_DELAY_MS * (O2CB_HEARTBEAT_THRESHOLD +1) before declaring that node dead. Not specifying noatime will cause every read to also be a write to the metadata, which can impact performance and cause issues in larger configurations, NFS has this concern also btw. The O2CB defaults are from the age of LUNs running on spinning rust and might be too lose for modern flash storage over faster I/O. VMFS on ESXI has this behavior also, as does every clustered file system. VMWare just has better GUI / Wizards with sane defaults for setting this all up.

These are common stuff for old hat unix admins cause unix clustering with SAN has always been this way. It's why I wish Proxmox had a GUI with sane defaults to configure and would manage the files for you so as to prevent mistakes.
 
Last edited: