Configure Proxmox to allow for 2 minutes of shared storage downtime?

Oct 20, 2020
2
1
3
50
Our cluster uses a single TrueNAS as shared, primary storage for all Proxmox nodes, including 80+ Linux VMs which have to run 24/7. We also need to install several TrueNAS updates all at once. Although the NAS has dual storage controllers for failover, applying the updates will cause the NAS to be offline for one or more periods of about 90-100 seconds each. Our understanding is the momentary storage downtime would normally cause problems for the running VMs. But an engineer at iXsystems said he thinks there is a configuration setting we can change in Proxmox so when the NAS is down a couple minutes, Proxmox can handle it and the running VMs won't be corrupted or have other problems.

Is there anything we can configure in Proxmox to avoid problems with our running VMs while the shared primary NAS is down a few seconds? Or will we have to shut down all VMs before the NAS becomes momentarily unavailable?

Thank you.
 
But an engineer at iXsystems said he thinks there is a configuration setting we can change in Proxmox so when the NAS is down a couple minutes, Proxmox can handle it and the running VMs won't be corrupted or have other problems.
No, if the backing storage disappears your VMs will certainly experience IO errors. There's no way around that, where would you expect the data to come from on reads or go to on writes?

You can always live-migrate VMs to unaffected cluster nodes before the update, or do a "move-disk" to an unnaffected storage, both options with zero downtime. Afterwards just migrate/move them back and do the same for the other storage.

Considering you mention 24/7 uptime, you should have some form of HA set up anyway, so you should have the capacity for that - I'd hope ;)
 
  • Like
Reactions: wisper
Our cluster uses a single TrueNAS as shared, primary storage for all Proxmox nodes, including 80+ Linux VMs which have to run 24/7. We also need to install several TrueNAS updates all at once. Although the NAS has dual storage controllers for failover, applying the updates will cause the NAS to be offline for one or more periods of about 90-100 seconds each. Our understanding is the momentary storage downtime would normally cause problems for the running VMs. But an engineer at iXsystems said he thinks there is a configuration setting we can change in Proxmox so when the NAS is down a couple minutes, Proxmox can handle it and the running VMs won't be corrupted or have other problems.

Is there anything we can configure in Proxmox to avoid problems with our running VMs while the shared primary NAS is down a few seconds? Or will we have to shut down all VMs before the NAS becomes momentarily unavailable?

Thank you.
Oh and I considered them as a true HA storage provider, but a software upgrade takes whole HA cluster down... not really HA then.
Even my home brewed cman/pacemaker/corosync/DRBD/iscsi storage can be upgraded without outage. A few years back I even went from pacemaker to cman without a second of downtime because EL decided to start using cman (f*****g technology preview). :-)

Please consider the following text with a grain of salt, but nonetheless here it is..
However TrueNAS engineers might be talking about soft and hard NFS mount types and retrans parameters I think, but no matter how you set this, VMs will still occur problems, at the very least they will have load average increased, have bunch of kernel errors and nothing will work,.. if you are lucky it will continue to work after some time, and if you don't have special luck, it will remount disks read only while running.
 
  • Like
Reactions: wisper
Our cluster uses a single TrueNAS as shared, primary storage for all Proxmox nodes, including 80+ Linux VMs which have to run 24/7. We also need to install several TrueNAS updates all at once. Although the NAS has dual storage controllers for failover, applying the updates will cause the NAS to be offline for one or more periods of about 90-100 seconds each. Our understanding is the momentary storage downtime would normally cause problems for the running VMs. But an engineer at iXsystems said he thinks there is a configuration setting we can change in Proxmox so when the NAS is down a couple minutes, Proxmox can handle it and the running VMs won't be corrupted or have other problems.

Is there anything we can configure in Proxmox to avoid problems with our running VMs while the shared primary NAS is down a few seconds? Or will we have to shut down all VMs before the NAS becomes momentarily unavailable?

Thank you.
How are you connecting to the TrueNas storage array?
iscsi, nfs, ZFS over iSCSI?

iscsi normally can have the time outs adjusted to be more or less, default time out on VMware is about 20 sec.

normally with dual controller modules active/ passive or active/active the failover between controllers is approx 20 sec and doesn’t get noticed by VM’s.

iscsi does this better then nfs, nfs tends to time out and not reconnect hanging the host.

ive seen the TrueNas systems failover controllers its very quick and non disruptive and never done at the same time together so not sure where this info is coming from.

normally the passive controller is upgraded and failed over
then the next one is done and failed over to the running controller so that you don’t need to wait for the reboot to complete.

how are you connecting to the TrueNas?

””Cheers
G
 
How are you connecting to the TrueNas storage array?
iscsi, nfs, ZFS over iSCSI?

iscsi normally can have the time outs adjusted to be more or less, default time out on VMware is about 20 sec.

normally with dual controller modules active/ passive or active/active the failover between controllers is approx 20 sec and doesn’t get noticed by VM’s.

iscsi does this better then nfs, nfs tends to time out and not reconnect hanging the host.

ive seen the TrueNas systems failover controllers its very quick and non disruptive and never done at the same time together so not sure where this info is coming from.

normally the passive controller is upgraded and failed over
then the next one is done and failed over to the running controller so that you don’t need to wait for the reboot to complete.

how are you connecting to the TrueNas?

””Cheers
G

How are you connecting to the TrueNas storage array?
We're using NFS. Below, the first screenshot is from Proxmox > Datacenter > Storage. Second pic is from inside the TrueNAS Z20. Appreciate your note saying failover is usually smooth. Since we're using NFS, though, it sounds like the safest option is to shut down all our Proxmox VMs before upgrading the TrueNAS.

1606316458115.png

-----

1606316572783.png
 
  • Like
Reactions: velocity08
Well, can't you test by failing over a test NFS share? Or use a test device?
Upgrading HA storage controllers should not present a downtime to clients. Unless it is not HA. NFS states and everything should be failed over.
 
  • Like
Reactions: wisper
How about NFS hard mount option and pause all vm-s before nfs server upgrade/reboot. After NFS server boots up resume all vms?
 
  • Like
Reactions: wisper
We're using NFS. Below, the first screenshot is from Proxmox > Datacenter > Storage. Second pic is from inside the TrueNAS Z20. Appreciate your note saying failover is usually smooth. Since we're using NFS, though, it sounds like the safest option is to shut down all our Proxmox VMs before upgrading the TrueNAS.

View attachment 21487

-----

View attachment 21488
not sure if you have capacity or ability to test ZFS over iSCSI.
If you are not running LACP you may be able to.

It will give you native ZFS features on shared storage.
COW, Snapshots, Replication etc

just food for thought :)

good luck with the upgrade.

""Cheers
G
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!