Proxmox and iSCSI storage issues

southwalesowl

New Member
Apr 4, 2026
6
5
3
I have 2 Proxmox hosts configured and an additional QDevice so that I can configure HA. Each host has 4 physical NICS. I have 2 of them configured with static I.Ps, on different subnets, for iSCSI and I have multipathing configured. I have HA configured and 4 Windows VMs. 1 of the VMs has been built direct on the Proxmox cluster and then other 3 have been imported from a VMWare ESXi host.

I am running through a test plan as this is a POC at the moment. I place all 4 of the windows VMs on host 1 and then remove the power, simulating a hardware failure. HA kicks in and brings the VMs up on host 2 and they boot fine.

The moment that I bring host 1 back up, I lose all access to the storage from both hosts and it appears that the vmdata becomes corrupt. If I try and run a check from either host, I get the following -
Check of pool solidfire-vg/vmdata failed (status:64). Manual repair required!
I have followed every repair guide that I can find but all of them fail.

I have attached the I.P and multipath outputs. I am sure that I have something configured wrong here. Any help would be greatly appreciated.
 

Attachments

  • Like
Reactions: Jeffthomson890
@j.theisen. I have just read that LVM-Thin is NOT supported/recommended for shared iSCSI in a cluster (especially with HA). Metadata is not cluster-aware and this kind of corruption is a known risk. I am running LVM-Thin in this configuration. Should I just move to standard LVM?
 
  • Like
Reactions: Jeffthomson890
Thanks for sharing the outcome. This is a useful finding for anyone testing HA with shared iSCSI storage.

One additional recommendation for POC environments is to perform repeated failover and failback testing after changing the storage configuration. In some cases, the initial failover appears successful, but storage path recovery and metadata consistency issues only become visible when nodes rejoin the cluster.

It may also be worth validating multipath status, storage locking behavior, and HA recovery under different failure scenarios (node reboot, network interruption, and storage path loss) before moving into production. Those tests can help identify edge cases early and provide greater confidence in the final design.

Please consider posting your results after migrating to standard LVM, as they could be helpful for others evaluating shared SAN storage with Proxmox HA.