Hi,
We have a really strange problem with our Proxmox 4.1 setup which we're hoping someone here will be able to help us debug further.
For the last few nights, at roughly the same time, half the VMs on one of our hypervisors have suddenly lost sight of their storage. The machines show as read only. A reboot show that the Hard Disk is no found - but a stop/start will bring the machine back. The machine in question is running two arrays on ZFS in raidz1 and we're mounting these over NFS (although all the storage is local). All the VMs running on one of these arrays seem to have this problem simultaneously.
We assumed this issue was because of a hardware problem or some weirdness with our ZFS settings (we are running with logs and cache on an PCI-E SSD which up until now has been superb) - but when we moved one of the affected VMs over to another Proxmox 4.1 server - the issue started happening there instead.
The timing of the issue does coincide with some high I/O on this VM (it's the time the backups are running and the also when the SQL Server that lives on this VM is optimising).
We've checked zpool, smartd, dmesg - but there is no sign of any issues server side apart from a few very minor temperature variations in smartd. Can anyone give us a suggestion as to where else an issue like this could be happening?
Morph42
				
			We have a really strange problem with our Proxmox 4.1 setup which we're hoping someone here will be able to help us debug further.
For the last few nights, at roughly the same time, half the VMs on one of our hypervisors have suddenly lost sight of their storage. The machines show as read only. A reboot show that the Hard Disk is no found - but a stop/start will bring the machine back. The machine in question is running two arrays on ZFS in raidz1 and we're mounting these over NFS (although all the storage is local). All the VMs running on one of these arrays seem to have this problem simultaneously.
We assumed this issue was because of a hardware problem or some weirdness with our ZFS settings (we are running with logs and cache on an PCI-E SSD which up until now has been superb) - but when we moved one of the affected VMs over to another Proxmox 4.1 server - the issue started happening there instead.
The timing of the issue does coincide with some high I/O on this VM (it's the time the backups are running and the also when the SQL Server that lives on this VM is optimising).
We've checked zpool, smartd, dmesg - but there is no sign of any issues server side apart from a few very minor temperature variations in smartd. Can anyone give us a suggestion as to where else an issue like this could be happening?
Morph42
 
	 
	