Testing Proxmox VE 4.2 as an alternative virtualization solution for larger installations with shared storage we ran into several issues, including a severe problem:
When booting up a server with NFSv3 shares the start-up of kvm vm guests "on boot" inconsistently* fails with " {hostname} pve-manager[2915]: storage '{nfs share}' is not online". *Inconsistently, since from time to time the start-up of kvm guests is OK. When starting multiple kvm vm guests using shared storage the effect is totally random since it's unpredictable if or when any vm guest is able to connect to it's shared storage.
In our test-scenario, all NFS shares are connected by a LACP bond connected to a dedicated storage LAN attached to a OVSwitch with one OVInterface using a tagged VLAN. The used storage servers are amd64 FreNAS 9.3 and 9.10 systems with ZFS RaidZ2 using mirrored SLC SSDs for the ZIL log.
So far, manual starting of any vm guest was always successful. All NFS connections are working after a successful boot of the Proxmox VE host.
It appears as if the "start-up vm guests at boot time" procedure is not in sync with the (remote?) storage handling. Trying to specify start-up delay timeouts for vm guests also fails (unless the first started vm guest use a local storage device? - this was not tested) since the functionality seems to apply the specified timeout for any vm guest only after a previously successfully started other vm guest.
Any ideas how to fix or at least bypass this critical problem (without creating a dummy vm guest or modifying the perl source code )?
When booting up a server with NFSv3 shares the start-up of kvm vm guests "on boot" inconsistently* fails with " {hostname} pve-manager[2915]: storage '{nfs share}' is not online". *Inconsistently, since from time to time the start-up of kvm guests is OK. When starting multiple kvm vm guests using shared storage the effect is totally random since it's unpredictable if or when any vm guest is able to connect to it's shared storage.
In our test-scenario, all NFS shares are connected by a LACP bond connected to a dedicated storage LAN attached to a OVSwitch with one OVInterface using a tagged VLAN. The used storage servers are amd64 FreNAS 9.3 and 9.10 systems with ZFS RaidZ2 using mirrored SLC SSDs for the ZIL log.
So far, manual starting of any vm guest was always successful. All NFS connections are working after a successful boot of the Proxmox VE host.
It appears as if the "start-up vm guests at boot time" procedure is not in sync with the (remote?) storage handling. Trying to specify start-up delay timeouts for vm guests also fails (unless the first started vm guest use a local storage device? - this was not tested) since the functionality seems to apply the specified timeout for any vm guest only after a previously successfully started other vm guest.
Any ideas how to fix or at least bypass this critical problem (without creating a dummy vm guest or modifying the perl source code )?