Problem with time within vms

Complaining, "nagging", and statements such as "That should be easy for Proxmox GmbH to fix" don't usually accomplish anything. They mostly just stir up

https://bugzilla.proxmox.com/show_bug.cgi?id=5032

I wrote this many years ago - referenced this here - and have mitigation in the ticket. ‍*shrug*

I wrote several features using the hook script: https://github.com/egandro/proxmox-cpu-affinity (it lacks of the states for suspend/resume - according to my knowledge).

NTP won't 100% help (nor the hook script hack I suggest). Why?

Assume you have a database with a transaction. Assume you do a suspend while it is in the transaction. If you resume after 100 days the transaction is broken. However - VWware is perfect and very often the transaction survives. Why? The clock is set in resume - before the OS starts. NTP can't mitigate this.

So yes. In this thread - since it started years ago - we have working (semi-) solutions. We came up with many things - and - with all respect. "Use NTP - and properly configure it" is a red herring. It might help for you needs - but it's not a general solution solving the problem.
 
I'm not denying that there are situations where this can be critical. Although in such cases, you might simply avoid making any changes to the VM until you have a maintenance window, put the application into maintenance mode, take the snapshot, perform the changes inside the VM, and then bring the application back online.

And for automated snapshots taken while the VM is running, you might just accept the risk of losing a few transactions or ending up with some inconsistent data, because in a disaster scenario that's still often better than not having a snapshot at all.

And yes, I'm sure there are applications where that is absolutely unacceptable and which need to run 24/7 to fulfill their purpose, while still requiring snapshots. But realistically, in those cases you can probably never roll back a snapshot anyway, because the data would be too outdated the moment you restore it. ;)

Seriously though, of course it would be nice if there were a proper solution, if only because a proper solution is always nicer than workarounds. But ideally, I'd guess, such a solution would come from upstream, i.e. the QEMU project itself?

Anyway, perhaps for now you could give the Bugzilla report a gentle push by politely asking whether this could be looked at again at some point. ;)
 
Last edited:
  • Like
Reactions: Johannes S
No worries, I don't need anybody to defend me. Plus, I didn't read the thread very carefully or I would have seen that the OP was using NTP but noted that his issue was the gap between VM startup and when NTP noticed the wrong time.

The hook script workaround, like the guest agent method suggested in the bug ticket, would narrow this gap but it would still be there as the guest agent can't run until the OS is booted. It seems that this issue needs to be fixed at a lower level in the stack, in qemu or in PVE tooling. Hence, may not be not as simple to fix as it might seem.
 
No worries, I don't need anybody to defend me. Plus, I didn't read the thread very carefully or I would have seen that the OP was using NTP but noted that his issue was the gap between VM startup and when NTP noticed the wrong time.

In the bugzilla ticket, I mentioned that NTP is very clumsy. The hwclock approach is much better.

But it's still not a 100% solution.
 
... i have no more words for you.
I mean, yeah, the discussion is kind of pointless as long as nobody comes up with a specific proposal for how to actually fix the problem, which you apparently don't have either. And no, "VMware does it", "NTP is bad", or "using the hardware clock is better, but still not good enough" doesn't really contribute to solving it.

I can think of two possible explanations for why nobody has solved it yet. Either it's not as easy to fix as you think, or the problem simply isn't as significant for most use cases as you're making it out to be.

I mean, Proxmox is fairly widely used these days, and large organizations run all kinds of production workloads on it without constantly losing data because of this issue. ;)
 
Last edited:
  • Like
Reactions: Johannes S
I mean, yeah, the discussion is kind of pointless as long as nobody comes up with a specific proposal for how to actually fix the problem, which you apparently don't have either. And no, "VMware does it", "NTP is bad", or "using the hardware clock is better, but still not good enough" doesn't really contribute to solving it.

I mentioned it the ticket and here multiple times. It is in the ticket.

Proxmox GmbH needs to just implement this. Red Had has a "set-clock" in QEmu Guest Agent.

That is not as awesome as the VMware solution but - we in this thread and in the bug report - know about this.


1780325661967.png
 
Okay, okay, you got me with that. I should probably have read the Bugzilla report more carefully. Also someone already gently nudged it again yesterday. ;)

But one more perhaps stupid question before I disappear: what about that Red Hat guide? Couldn't you simply install and configure that in the guests and call it a day?

I just tried it in a Debian VM for testing, and it seems to be working...

Code:
chronyc sources
MS Name/IP address         Stratum Poll Reach LastRx Last sample          
===============================================================================
#* PHC0                          0   2   377     1     -0ns[   -1ns] +/-  122ns

Maybe I'm missing something, but wouldn't that effectively solve the problem?
 
Last edited:
  • Like
Reactions: Johannes S
But one more perhaps stupid question before I disappear: what about that Red Hat guide? day?

```txt
https://access.redhat.com/documenta...ration_guide/chap-kvm_guest_timing_management

> When a guest is resumed after a pause or a restoration process, a command to
> synchronize the guest clock to a specified value should be issued by the
> management software (such as virt-manager).
```

"The management software" - in our case Proxmox needs to implement certain things. Unfortunately Proxmox GmbH didn't do this.

That was clearly the #1 thing I did before I made the bug report

1) I investigated the source code
2) I investigated the instruction of Red Hat
3) I came up with "this is a bug" as conclusion
4) I made an optimal mitigation not needing an NTP server or an external network connection from the guest. As you can read in the ticket qemu / Proxmox passes the hosts hwclock into the guest. This is much faster and more accurate and makes the Proxmox host the clock for all VMs.

But thank you for your time. I am out here!
 
Last edited:
OK, I see. It would certainly be useful if they implemented that command then.

I'm still wondering if that mysterious command could be sent manually. This could mean that, until this is implemented properly in Proxmox, you might at least be able to restore/resume those time-critical VMs with a custom script.

But never mind, I'll leave it now. ;-)
 
Last edited:
  • Like
Reactions: Johannes S
I'm still wondering if that mysterious command could be sent manually, which in turn could mean that you might at least be able to restore/resume those time critical VMs with a custom script.

Red Hat gave specific instructions what needs to be send to the guest agent.

The calls are missing here in the Proxmox Repository in the resume part. https://github.com/proxmox/qemu-server

That is why I created the bug report.

I repeated that 5 times now.

I also told 3 times - that the other script we have - the hook script lacks of suspend/resume/ pre/post (I can be wrong on this - but I am not aware they exist).

So there is zero that we can do. Proxmox GmbH needs to do.