HA migration issue with Linux VMs on Proxmox 9.0.10 (FC LVM datastore)

el_vagokz

New Member
Sep 25, 2025
2
0
1
Hello,


We are testing Proxmox 9.0.10 and faced the following issue:


  • Datastore for VMs: FC-based LVM
  • All VMs have QEMU guest agents installed (Linux and Windows)
  • HA is enabled across the cluster

Behavior:


  • When a node is powered off, all VMs (Linux and Windows) migrate successfully to the remaining nodes.
  • However, when the FC connection is lost on a node:
    • Windows VMs migrate to other nodes as expected
    • Linux VMs do not migrate and show I/O errors in the console

Question:
Is it possible to configure Proxmox so that Linux VMs also migrate automatically when the node loses access to the FC datastore?


Thanks in advance for any advice.
 
Hi @el_vagokz , welcome to the forum.

There is no special sauce in PVE to enable automatic HA migration of VM nodes when storage connectivity is lost. Have you looked at the Windows logs during the failure window? Is it possible Windows shuts down (powers off), which causes the HA system to pick it up elsewhere?

There should be information in hypervisor log about HA events, what initiated them, etc. I'd recommend looking at that sequence - it may be helpful.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
PVE is not able to mitigate a storage failure and migrate the VMs, storage HA has to be provided by the storage layer.
I can only see this happen if the VM is powered off, then PVE HA will try to start it, potentially detect storage outage (as you correctly noted there seems to be no redundancy there), and then start it on another host.

Linux Kernel will try IO indefinitely and the VM is not shutdown by any "smart" process.

I could be missing something, of course. Op's analyses of VM and hypervisor logs should bring more clarity here.
Obviously, having redundancy for FC is proper way to address this situation.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
We have connection over multiple cards and switches.
But we want to simulate different scenarios.
So it seems that this can only be handled using external tools like a watchdog.
 
HA is a complex topic.

If you have a cluster of nodes and one node becomes isolated, the remaining nodes holding the majority quorum can vote the isolated node out and take over services. This is normal PVE HA operation.

If a single node loses storage connectivity, for HA to act it would ideally need to determine that other nodes have a healthier access to that storage. PVE does not provide that cross-node storage-health service. It also can’t reliably determine if one node has better client network connectivity than others (assuming your cluster network is on a dedicated healthy link). Determining whether an application inside a VM is actually alive is also outside PVE’s purview.

Even if PVE could detect a single storage link issue, you then face trade-offs: would issuing an HA operation be better than riding out a 1–2 second link flap that the kernel’s IO retry would recover from? What if the failure is rolling or asymmetric: Node B looks fine now but may fail in 5 seconds, while Node A could recover faster?

The best place to handle application availability during underlying infrastructure outages is at the application layer. A load balancer can detect that an app on Node A is not responding but is OK on Node B, and redirect traffic more efficiently than anything inside PVE can.

Finally, if you have multiple switches and HBAs, deliberately reducing redundancy to a single path and then simulating a double failure by taking that path away is unusual. Your monitoring system should already notify you of a link down on the switch.

Regarding the watchdog, its normal use is to monitor whether the kernel is responsive. You can create a custom script, but be warned - there are many edge cases and potential unintended consequences.


Blockbridge : Ultra low latency all-NVME shared storage for Proxmox - https://www.blockbridge.com/proxmox
 
  • Like
Reactions: waltar