Just wondering if anybody's got any ideas here. We've got a 3 node Proxmox cluster, with a Synology UC3200 as shared storage. The UC3200 has dual controllers in an active-active SAN configuration, so if one controller craps out, the workload won't be interrupted. The nodes and the UC3200 are all connected via 10 gig (2x10 gig for storage, 2x10 gig for Corosync, 2x10 gig for VM traffic).
Everything's rock solid and stable as long as my VMs are all on one node. If I move a VM to another node, even if it's just sitting there idling...the UC3200 gets really unstable, to the point where if there's two idling VMs running on two nodes, the UC3200 will crash every day...day and a half. If I'm doing something with even moderate writes, the UC3200 will crash within a couple minutes of that activity. When the UC3200 crashes, both controllers go offline and reboot within 1 second of each other
I reached out to Synology and they sent us another unit, which had the same issues. They escalated to their dev/engineering dept, sent us another unit, and it's exhibiting the same issues, so it's pretty unlikely that the UC3200 hardware is still faulty, so I'm left with either assuming the UC3200 has a software issue, or Proxmox is somehow doing something to make the SAN lose its mind...
Anybody have any suggestions, theories, or wild speculation?
Everything's rock solid and stable as long as my VMs are all on one node. If I move a VM to another node, even if it's just sitting there idling...the UC3200 gets really unstable, to the point where if there's two idling VMs running on two nodes, the UC3200 will crash every day...day and a half. If I'm doing something with even moderate writes, the UC3200 will crash within a couple minutes of that activity. When the UC3200 crashes, both controllers go offline and reboot within 1 second of each other
I reached out to Synology and they sent us another unit, which had the same issues. They escalated to their dev/engineering dept, sent us another unit, and it's exhibiting the same issues, so it's pretty unlikely that the UC3200 hardware is still faulty, so I'm left with either assuming the UC3200 has a software issue, or Proxmox is somehow doing something to make the SAN lose its mind...
Anybody have any suggestions, theories, or wild speculation?