vm booted from 2-month-old vm state file (filesystem corruption, etc.)

Jesse Norell

Member
Jun 19, 2019
15
2
6
54
Hello,

Yesterday we had a system issue causing a lot of things to restart; most things on proxmox came up fine except one vm, which started from a vm state file from Mar 03 (2 months old). That lead to filesytem corruption, data loss, etc., I'm trying to determine why that happened, and if there's some issue/bug with cleaning up old vmstate images.

I was testing this vm in the past for system time behavior after resume, and likely did suspend the vm and resume in that Mar time frame. After my testing, I was always able to resume the vm and things were running fine. The one thing that comes to mind is I believe resuming the vm from the gui did time out (vm state was on ceph storage) and I had to manually start it with --timeout 0 from the cli. There were no issues once booted, and it's been running fine since.

My assumption would be once the vm has resumed from a state file, that future reboots would be normal, and not continue to resume that same state. Is that incorrect? Does the proxmox gui resume also remove the state from the vm config file and delete the vm state image file, such that manually resuming requires me to manually do the same? Is this a bug? It kind of feels like a bug (if I "start" it manually, and it resumes from a vm state file (per the config file) I did not specify, should I know that I am required to clean up the "resume" state?)?

If not a bug, and this is "documented" (though I didn't see it when searching how to resolve the timeout) behavior, would a feature request to disable the timeout when resuming from the gui be accepted? Had that existed, it would have both made resuming from proxmox possible and would have done the cleanup to avoid the situation.

Another potential rfe would be when stopping/restarting/shutting down a vm via proxmox that it checks to see if a resume state file is set and cleans it up then. Or maybe gives you the option to clean it up (or at least a warning that it won't boot normal the next time).

Thanks,
Jesse
 
Last edited:
My assumption would be once the vm has resumed from a state file, that future reboots would be normal, and not continue to resume that same state. Is that incorrect? Does the proxmox gui resume also remove the state from the vm config file and delete the vm state image file, such that manually resuming requires me to manually do the same? Is this a bug? It kind of feels like a bug (if I "start" it manually, and it resumes from a vm state file (per the config file) I did not specify, should I know that I am required to clean up the "resume" state?)?
That is the intendet behaviour, regardless if you start from gui or the cli (as long as you resume with 'qm'), but just to clarify, how exactly did you resume?

after resuming with 'qm', the state should get removed from the config and the statefile should be removed as well...
if you found a situation where you can reproduce this to not work, could you please open a bug on https://bugzilla.proxmox.com ?

Another potential rfe would be when stopping/restarting/shutting down a vm via proxmox that it checks to see if a resume state file is set and cleans it up then. Or maybe gives you the option to clean it up (or at least a warning that it won't boot normal the next time).
if there is a statefile in the config, you should see it in the 'hardware' panel of the vm and there you can remove it manually
 
That is the intendet behaviour, regardless if you start from gui or the cli (as long as you resume with 'qm'), but just to clarify, how exactly did you resume?
After attempting to start via the gui failed, I started via 'qm' from cli. I believe after the failed start (via gui or cli without disabling timeout) it was in a state that I had to 'qm stop ###' the vm first, then 'qm start ### --timeout 0'.

if you found a situation where you can reproduce this to not work, could you please open a bug on https://bugzilla.proxmox.com ?
You bet. We have not been able to reproduce it yet, but will try some more scenarios, etc.
 
FWIW, we have not found a way to reproduce this. Hopefully it won't turn up again, but we'll post back here in the future if we ever catch it.

Thanks
 
we're in situation where the vms went back to two years state. Tried doing 'qm start ### --timeout 0'. No luck. Urgent help needed
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!