In-VM memory corruption

Ced

New Member
Jun 9, 2012
14
0
1
Despite the host not seeming to have memory problems (I ran memtest twice), inside at least one VM I'm getting corruption on in-VM memtests.
Screenshot with corruption after attempted boot:
VM memory corruption.png
Screenshot without corruption on cold boot:
VM memory corruption.png

It only happens after I tried to start a Sabayon Hardened-server install, which hangs, and reboot or reset the VM. If I do a memtest right after a cold VM boot it works fine. if I get memory corruption and stop the VM, then restart it, the memory corruption is also gone. But as soon as I try to start up the OS again it hangs, and after a reboot it the corruption is back.

Host setup:
CPU: AMD Phenom II X6 1090T
Memory: 4x4GB GEIL 1333MHZ
Motherboard: MSI 870A-G54
Storage: 3x 1.5TB; 2x Seagate somethingsomething and 1 Samsung spinpoint thingy.
The two Seagates are set up in RAID1 with mdadm. VM is hosted on the RAID1 if that's any relevant.

Does anyone know what's going on? I've been trying to figure it out for days :(

Edit: Made a new VM, got the same problem, screenshot here:
memory corruption new vm.png

Is it perhaps a bug? I can reproduce it easily:
make VM => Run a memtest, it'll check out fine => try run the nongraphical sabayon hardened server install, it will hang before it even installs => reset the VM => run memtest from sabayon CD => memory corruption.
Note that no memory corruption will occur until the moment sabayon hardened server tries to boot.
 
Last edited:
Does anyone have any suggestions? I mean, I'm stuck here at the very beginning as I intend to use sabayon hardened server.
I don't have anything running yet, I could reinstall the whole thing (but I'd prefer not to because I just had things configured) to test whether it fixes things.

Could someone perhaps point at a place to start debugging?
 
What happens if you run a memtest on the host?

The host checks out fine, I've let memtest run twice AND swapped the memory once. I don't think this is a hardware issue (unless it'd be compatibility?).
The VM-side corruption doesn't occur until either the non-graphical installation or the actual OS has been attempted to boot at least once in a session.
When the VM is stopped and started again (as in, hit 'stop' and then 'start', NOT reset) the memory corruption is gone. When the VM is 'reset' the corruption persists.
 
Last edited:
Okay since no new solutions really popped up, I decided to reinstall it and give the vanilla installation a try. Guess what? Same problem!

It's not a hardware memory issue. I tested the memory, and I tried with different memory.

So far the easiest way to reproduce it from scratch is: install proxmox (doesn't matter if you apt-get upgrade it afterwards or not) => upload a sabayon hardened server disc => make a new VM (everything default, 2.6/3.0 linux vm OS type) => run memtest, it will show no errors => reset and try run the installation (not the graphical one) => it will hang, reset => memtest will show errors.
reinstall vm corruption test.png


So, does anyone have an idea how to fix this? Or where to start debugging? I'd try ignore it and move on if it wasn't for the fact Sabayon refuses to install and I'm guessing it has something to do with that memory corruption. And aside the fact I'd really like to use Sabayon, as svendsen mentioned it seems to happen on other distros too.

Edit: Just noticed I'm getting an "kvm: 1754: cpu0 unhandled rdmsr: 0xc0010001" on the attached screen/console, don't think I seen that before, could that be related?

Edit2: Here, the output from dmesg if it's useful to anyone http://pastebin.com/pve10fX4

Edit3: Updated the BIOS from http://nl.msi.com/product/mb/870A-G54.html#/?div=BIOS . Still no result; New dmesg output: http://pastebin.com/UMk1tf1i (note the 0.01 increment of the bios version, DMI: MSI MS-7599/870A-G54 (MS-7599), BIOS V17.19 11/01/2012 )
 
Last edited:
Is there any other place I can try to debug? I'm about to just forget about proxmox and try something else, I had good experiences with it but this is really a breaking issue for me, so if it has to go it will go.
If not; Does anyone know of a good alternative? Or perhaps a way to use a bit more bleeding edge kernel than what debian offers with the comfort of the proxmox webinterface?
 
Created 4 VMs, walked through the reproduction steps, this is the result:
VM corruption test multiple VMs.jpg
with the new install it looks like all the corruption occurs at 494.0MB, but one earlier screenshot shows it may happen at other ranges too (see first screenshot in this thread)

Additionally, just resetting the VM doesn't cause the corruption, it only happens when I try to start the installation.
Also, when a VM installation starts, before it hangs I see some message about AGP i915 or something, see screenshot below:
intel agp module not found error.PNG
And when left there for a while, it'll eventually show this:
corrupt desktop.jpg
And hang there, with 100% CPU usage
 
Last edited:
Since this looks like it's more a KVM issue than something with Proxmox, I'm gonna try install a sabayon with the latest kernel and kvm from emerge, and try see if the problem still persists there, if not, I'll let it know here so that whoever feels called upon the task can update the proxmox side of the story, if it does persist, I'll contact the KVM devs to see if there's a fix for it.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!