Hi everyone,
I currently have a Proxmox cluster with 3 hosts and a NAS acting as an iSCSI target.
The NAS is connected to the hosts through a dedicated switch.
Right now, the iSCSI target is only attached to one node, and the other nodes access it indirectly.
This creates a single point of failure: when that main node goes offline, the storage becomes unavailable for the rest of the cluster.
Any real-world examples, configuration tips, or best practices would be highly appreciated.
I want to eliminate the single point of failure and make the storage setup more robust for HA workloads.
Thanks in advance!
I currently have a Proxmox cluster with 3 hosts and a NAS acting as an iSCSI target.
The NAS is connected to the hosts through a dedicated switch.
Right now, the iSCSI target is only attached to one node, and the other nodes access it indirectly.
This creates a single point of failure: when that main node goes offline, the storage becomes unavailable for the rest of the cluster.
Current setup:
- 3 × Proxmox hosts
- NAS as iSCSI target
- Connected through a switch
- NAS file system: Btrfs
- Exported via iSCSI → appears in Proxmox as raw
- Assigned as LVM storage in Proxmox
- VMs are stored on this LVM, shared across hosts (but dependent on the main host being online)
- Shared storage required for HA and live migration
What I want to achieve:
- Have all nodes independently connect to the NAS iSCSI target.
- Ensure that the shared storage remains available even if one node goes down.
- Maintain stable access for HA and VM migrations.
Questions:
- What’s the recommended way to connect the iSCSI target to all nodes directly?
- Can a single iSCSI LUN safely be accessed by multiple initiators simultaneously, or would I need a clustered file system (e.g., Ceph, ZFS, OCFS2, etc.)?
- Is using Btrfs over iSCSI with LVM safe for multiple hosts, or could it lead to corruption?
- Should I consider switching to ZFS for better HA and replication support?
- Should I configure multipath (MPIO) for redundancy, and if so, what’s the recommended approach?
- Are there any known caveats or gotchas with this type of setup?
Any real-world examples, configuration tips, or best practices would be highly appreciated.
I want to eliminate the single point of failure and make the storage setup more robust for HA workloads.
Thanks in advance!