Hello,
Yesterday we had a system issue causing a lot of things to restart; most things on proxmox came up fine except one vm, which started from a vm state file from Mar 03 (2 months old). That lead to filesytem corruption, data loss, etc., I'm trying to determine why that happened, and if there's some issue/bug with cleaning up old vmstate images.
I was testing this vm in the past for system time behavior after resume, and likely did suspend the vm and resume in that Mar time frame. After my testing, I was always able to resume the vm and things were running fine. The one thing that comes to mind is I believe resuming the vm from the gui did time out (vm state was on ceph storage) and I had to manually start it with --timeout 0 from the cli. There were no issues once booted, and it's been running fine since.
My assumption would be once the vm has resumed from a state file, that future reboots would be normal, and not continue to resume that same state. Is that incorrect? Does the proxmox gui resume also remove the state from the vm config file and delete the vm state image file, such that manually resuming requires me to manually do the same? Is this a bug? It kind of feels like a bug (if I "start" it manually, and it resumes from a vm state file (per the config file) I did not specify, should I know that I am required to clean up the "resume" state?)?
If not a bug, and this is "documented" (though I didn't see it when searching how to resolve the timeout) behavior, would a feature request to disable the timeout when resuming from the gui be accepted? Had that existed, it would have both made resuming from proxmox possible and would have done the cleanup to avoid the situation.
Another potential rfe would be when stopping/restarting/shutting down a vm via proxmox that it checks to see if a resume state file is set and cleans it up then. Or maybe gives you the option to clean it up (or at least a warning that it won't boot normal the next time).
Thanks,
Jesse
Yesterday we had a system issue causing a lot of things to restart; most things on proxmox came up fine except one vm, which started from a vm state file from Mar 03 (2 months old). That lead to filesytem corruption, data loss, etc., I'm trying to determine why that happened, and if there's some issue/bug with cleaning up old vmstate images.
I was testing this vm in the past for system time behavior after resume, and likely did suspend the vm and resume in that Mar time frame. After my testing, I was always able to resume the vm and things were running fine. The one thing that comes to mind is I believe resuming the vm from the gui did time out (vm state was on ceph storage) and I had to manually start it with --timeout 0 from the cli. There were no issues once booted, and it's been running fine since.
My assumption would be once the vm has resumed from a state file, that future reboots would be normal, and not continue to resume that same state. Is that incorrect? Does the proxmox gui resume also remove the state from the vm config file and delete the vm state image file, such that manually resuming requires me to manually do the same? Is this a bug? It kind of feels like a bug (if I "start" it manually, and it resumes from a vm state file (per the config file) I did not specify, should I know that I am required to clean up the "resume" state?)?
If not a bug, and this is "documented" (though I didn't see it when searching how to resolve the timeout) behavior, would a feature request to disable the timeout when resuming from the gui be accepted? Had that existed, it would have both made resuming from proxmox possible and would have done the cleanup to avoid the situation.
Another potential rfe would be when stopping/restarting/shutting down a vm via proxmox that it checks to see if a resume state file is set and cleans it up then. Or maybe gives you the option to clean it up (or at least a warning that it won't boot normal the next time).
Thanks,
Jesse
Last edited: