vm booted from 2-month-old vm state file (filesystem corruption, etc.)

Jesse Norell · May 5, 2020

Hello,

Yesterday we had a system issue causing a lot of things to restart; most things on proxmox came up fine except one vm, which started from a vm state file from Mar 03 (2 months old). That lead to filesytem corruption, data loss, etc., I'm trying to determine why that happened, and if there's some issue/bug with cleaning up old vmstate images.

I was testing this vm in the past for system time behavior after resume, and likely did suspend the vm and resume in that Mar time frame. After my testing, I was always able to resume the vm and things were running fine. The one thing that comes to mind is I believe resuming the vm from the gui did time out (vm state was on ceph storage) and I had to manually start it with --timeout 0 from the cli. There were no issues once booted, and it's been running fine since.

My assumption would be once the vm has resumed from a state file, that future reboots would be normal, and not continue to resume that same state. Is that incorrect? Does the proxmox gui resume also remove the state from the vm config file and delete the vm state image file, such that manually resuming requires me to manually do the same? Is this a bug? It kind of feels like a bug (if I "start" it manually, and it resumes from a vm state file (per the config file) I did not specify, should I know that I am required to clean up the "resume" state?)?

If not a bug, and this is "documented" (though I didn't see it when searching how to resolve the timeout) behavior, would a feature request to disable the timeout when resuming from the gui be accepted? Had that existed, it would have both made resuming from proxmox possible and would have done the cleanup to avoid the situation.

Another potential rfe would be when stopping/restarting/shutting down a vm via proxmox that it checks to see if a resume state file is set and cleans it up then. Or maybe gives you the option to clean it up (or at least a warning that it won't boot normal the next time).

Thanks,
Jesse

dcsapak · May 6, 2020

Jesse Norell said:
My assumption would be once the vm has resumed from a state file, that future reboots would be normal, and not continue to resume that same state. Is that incorrect? Does the proxmox gui resume also remove the state from the vm config file and delete the vm state image file, such that manually resuming requires me to manually do the same? Is this a bug? It kind of feels like a bug (if I "start" it manually, and it resumes from a vm state file (per the config file) I did not specify, should I know that I am required to clean up the "resume" state?)?

That is the intendet behaviour, regardless if you start from gui or the cli (as long as you resume with 'qm'), but just to clarify, how exactly did you resume?

after resuming with 'qm', the state should get removed from the config and the statefile should be removed as well...
if you found a situation where you can reproduce this to not work, could you please open a bug on https://bugzilla.proxmox.com ?

Jesse Norell said:
Another potential rfe would be when stopping/restarting/shutting down a vm via proxmox that it checks to see if a resume state file is set and cleans it up then. Or maybe gives you the option to clean it up (or at least a warning that it won't boot normal the next time).

if there is a statefile in the config, you should see it in the 'hardware' panel of the vm and there you can remove it manually

Jesse Norell · May 6, 2020

dcsapak said:
That is the intendet behaviour, regardless if you start from gui or the cli (as long as you resume with 'qm'), but just to clarify, how exactly did you resume?

After attempting to start via the gui failed, I started via 'qm' from cli. I believe after the failed start (via gui or cli without disabling timeout) it was in a state that I had to 'qm stop ###' the vm first, then 'qm start ### --timeout 0'.

dcsapak said:
if you found a situation where you can reproduce this to not work, could you please open a bug on https://bugzilla.proxmox.com ?

You bet. We have not been able to reproduce it yet, but will try some more scenarios, etc.

Jesse Norell · May 8, 2020

FWIW, we have not found a way to reproduce this. Hopefully it won't turn up again, but we'll post back here in the future if we ever catch it.

Thanks

waheed usman · Jan 11, 2023

we're in situation where the vms went back to two years state. Tried doing 'qm start ### --timeout 0'. No luck. Urgent help needed

Search

Search

vm booted from 2-month-old vm state file (filesystem corruption, etc.)

Jesse Norell

Member

dcsapak

Proxmox Staff Member

Jesse Norell

Member

Jesse Norell

Member

waheed usman

New Member