Guest Power failure results in application corruption

adamb

Famous Member
Mar 1, 2012
1,329
77
113
Well I thought I pinned it down.

Looks like physical hardware is proving to be more resistant to power loss/crash than a VM running within proxmox on the same hardware.

We are in the process of moving to CentOS7 and ZFS for our production guests. However it seems that our CentOS7 guest sees roughly 20% more corruption during a power loss/crash than a physical server. This is corruption at our application and not the filesystem itself. I have tried messing with all the cache= options but none of them really seem to make a difference.

I can reproduce the issue in multiple different hardware setups. Here is the hardware setups I am using

HP Hardware
HP DL 380p gen9
HP MSA 2040

Basic Supermicro build
E5-1620
LSI 9271
6TB disks

Physical hardware has roughly a 15% chance of corruption were as the virtual environment is coming in around 30%. I would really like to get the virtual enviroment on par with the physical but I am at a loss.

Whats really interesting is I can reproduce the issue with just using "stop" or "reset" within the proxmox gui which isn't even a complete power failure. Hoping some others have some suggestions on what I could do to make the VM as resilient as a physical install.
 
Last edited:
Wanted to add that I tried setting the raid cards to write through but that didn't seem to help. Makes sense as im seeing the issue without actually loosing power to the hardware.

There has to be a way to make a guest as resilient as a physical install.
 
You said you have transferred to CentOS7. You are aware of the fact that CentOS7 (actually RHEL 7) have changed default file system from ext4 to XFS? Maybe this change is the root cause to your problems.
 
You said you have transferred to CentOS7. You are aware of the fact that CentOS7 (actually RHEL 7) have changed default file system from ext4 to XFS? Maybe this change is the root cause to your problems.

Yep completely aware. Xfs is terrible for us, we end up with corruption in our application 100% of the time with power loss/crash situations. With CentOS6 we used ext4 with data=journal and that proved to be quite resiliant, however its not looking to be the case for centos7/rhel7. Not to mention they no longer even support data=journal for ext4.

So that is why we are running zfs within our guests. So far it has proven to be the most resilient. I just don't understand why I am seeing so much more corruption within a guest running C7 vs physical hardware.
 
I am pretty sure your problems boils down to triple caching: Disk cache on host, KVM cache, and ZFS cache. If my suspicion holds sync=always should be a similar solution like ext4 and data = journal.
 
I am pretty sure your problems boils down to triple caching: Disk cache on host, KVM cache, and ZFS cache. If my suspicion holds sync=always should be a similar solution like ext4 and data = journal.

Yep I agree and thats what I thought when I stumbled across sync=always. It has reduced corruption to an extent.

I have a pretty lengthy thread over on the zfs mailing list about the subject. There is some talk that sync=always doesn't behave exactly as what everyone thinks. Basically boiling down to the same thing you are saying.
 
What do you mean by "There is some talk that sync=always doesn't behave exactly as what everyone thinks."? Is this a specific ZOL issue?
On Solaris based OS's sync=always means that every write turns into a synchronous write to SIL circumventing data loss in case of power loss. However, this assumes a sync is carried through to persistent storage.
 
What do you mean by "There is some talk that sync=always doesn't behave exactly as what everyone thinks."? Is this a specific ZOL issue?
On Solaris based OS's sync=always means that every write turns into a synchronous write to SIL circumventing data loss in case of power loss. However, this assumes a sync is carried through to persistent storage.

Yes its specific to ZOL. The mailing list is down for me at the moment so I can't link in the thread.
 
So images are provided by local disk on proxmox. What is file system of proxmox host and what is the mount options for this file system?
 
So images are provided by local disk on proxmox. What is file system of proxmox host and what is the mount options for this file system?

Well the host is using ext4, but im pretty sure there is actually no filesystem on the disks for VM's at the host level. Its just LVM, so they are raw disks.

physical disk -> LVM -> VM