Proxmox with StarWind VSA as HA iSCSI storage

waltar · Dec 31, 2024

stevehughes said:
Apart from achieving consistency of backups, the other scenario where we would use a snapshot is when making an update to a VM, so that we can quickly roll back if something goes awry. Can Proxmox handle this using the QEMU mechanism you describe above, without requiring the storage itself to be snapshot capable?

bbgeek17 said:
It looks similar on the surface, but the underlying technology is completely different. Specifically, ESXi has VMFS, which is a specialized cluster aware filesystem. The data in ESXi (in 99% of cases) is stored as files (vmdk). These can be roughly compared to qcow. So the snapshot technologies can be somewhat compared between qcow and vmdk.

However, the QEMU Fleecing is different. I dont think its meant to be a long term type of snapshot, nor does it have ability to have multiple snapshots. Admittedly, I have not studied the design docs in details and could be mistaken here.

bbgeek17 said:
No, see above. The special Backup integration is not meant to be long term repeatable snapshot, nor does it have roll-back capability. The Fleecing tech has very particular specific use-case - Backups.

Of course Proxmox can handle this using the QEMU mechanism without requiring the storage itself to be snapshot capable, still done this and rollback a few times with a qcow vm image but this will "explode" the qcow and it's performance wise not the best. Normally we do get reflink copies by cron which could be backup'ed so let the qcow not expand with snapshots ... but still works if wanted.

bbgeek17 · Jan 1, 2025

waltar said:
Of course Proxmox can handle this using the QEMU mechanism without requiring the storage itself to be snapshot capable, still done this and rollback a few times with a qcow vm image but this will "explode" the qcow and it's performance wise not the best.

I think there might have been a bit of a misunderstanding with the quoting in your previous post. It seems to take the 8-months old conversation slightly out of context. Just to clarify, if your disks are stored as QCOW, then QCOW is the snapshot-supporting storage type. It doesn’t matter whether the QCOW files are stored on ZFS, NFS, or FAT32 - PVE will rely on the QCOW snapshot mechanism.
The conversation back in April was specifically about Fleecing storage mechanism.

Happy New Year

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

slalaure · Jan 3, 2025

hello @bbgeek17 ,
Thanks for your answer.

You mentioned using iSCSI to serve your client VMs, so it seems unclear where NFS VM snapshots fit into the workflow. Perhaps I missed this in your description?

--> I am planning to use PBS to ask for a VM snapshot to be store on my witness server ( also running proxmox) exposing an NFS share.

3/ Is there any mechanism in proxmox to wait for a VM behing before starting another one ? Like the dependencies in services we have with systemd on Linux ?

4/ my setup is currently quite high end for a 2 node + witness configuration, with dedicated 25Gbps network cards for the VSAN and a shopfloor connectivity over dedicated fiber channel for minimum latency.
Each of my 2 node has :

800GB SSD SAS ISE, Mixed Use, up to 24Gbps 512e 2.5in Hot-Plug, AG Drive	2
3.84TB SSD SAS Read Intensive 24Gbps 512e 2.5in Hot-Plug 1DWPD , AG Drive	4
Intel E810-XXV Dual Port 10/25GbE SFP28 Adapter	2

This was the standard setup for VmWare vSAN.

I guess proxmox could be running on preemt rt enabled kernel for Debian ?

What is the minimum failover latency that we could expect ? I am running Rockwell automation software for historian (time series database of the automation controllers datapoints) and several dedicated batch recipe dedicated VM who need to query underlying automation network every 1 minute . According to your experience would it be possible to detect a failure and start the VM on the second server within 10 to 20 sec ?
Note that my VM are taking up to 40 sec to operate again from a cold start, but resume occurs within 5 second for now.

TIA for your help understanding what is really possible to achieve.

bbgeek17 · Jan 3, 2025

slalaure said:
I am planning to use PBS to ask for a VM snapshot to be store on my witness server ( also running proxmox) exposing an NFS share.

Do you mean PBS doing a backup in a snapshot mode while using Fleecing storage that points to NFS? If you do - it may work but you are likely to experience performance degradation during the backup.
Backup Snapshots are not the same thing as Storage Snapshots. Backup Snapshots are short lived for the duration of the backup and are handle by QEMU rather than storage.

slalaure said:
Is there any mechanism in proxmox to wait for a VM behing before starting another one ? Like the dependencies in services we have with systemd on Linux ?

You can set start priority via PVE GUI/CLI. However, it does not guarantee that the higher priority VM is fully booted when a lower priority is started. PVE has no view into the VM boot state.

You can use HookScripts to try to achieve your goal: https://pve.proxmox.com/pve-docs/pve-admin-guide.html#_hookscripts
Possible phases and examples can be found here: /usr/share/pve-docs/examples/guest-example-hookscript.pl

slalaure said:
I guess proxmox could be running on preemt rt enabled kernel for Debian ?

PVE uses Ubuntu Kernel provided by PVE repository. Anything else is at your own risk.

slalaure said:
According to your experience would it be possible to detect a failure and start the VM on the second server within 10 to 20 sec ?

If this is a Live VM transfer - yes. If its a hypervisor node failure you may be in trouble. There is no ESX Fault Tolerance equivalent in PVE today. You may be better of moving your HA into application layer.

Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox

slalaure · Jan 3, 2025

Thanks to your answer .

I add a better look to what seems possible and it looks like my witness server should :

- run qdevice for quorum
- PBS to host my VM backups from PVE like presented in this video for example : https://www.youtube.com/watch?v=KxPl8SHREcE
--> therefore I would probably setup PVE on this node and run PBS inside a VM or LXC. What do you think ?

VM startup dependency and hookscript may be a solution for the initial boot phase. ONce one of the node is booted, the exposed iSCSI drive should appear even if only one node is ready

preemt rt is not enabled by default on ubuntu 6.8 kernel nor qemu KVM configuration so we will probably test that on our own.

my industrial VM filesystem residing on a vSAN iSCSI disk exposed by Starwind vsan VMs, I just need to have VE HA well configured to triger the VM start on the second node. I don't catch your point " If its a hypervisor node failure you may be in trouble" : if I still have one healthy node, it should node be a problem, the workload will just run in it until I fix the other one isn't it ? Some of my VM have HA at application layer but not all of them and it would also require additional licences for that.

Search

Search

Proxmox with StarWind VSA as HA iSCSI storage

waltar

Renowned Member

bbgeek17

Distinguished Member

slalaure

New Member

bbgeek17

Distinguished Member

slalaure

New Member

We value your privacy