Thanks @fiona and @fweber for your efforts here, this seems to be a particularly nasty bug ...
I can confirm this, too, I could indeed revive such a stuck VM by hibernating and then resuming it:
# qm suspend 150 --todisk 1
Logical volume "vm-150-state-suspend-2023-08-03" created.
State...
I remember having tried with both KSM and/or ballooning disabled months ago, to no avail. Maybe it improves the situation, but at least for us it did not make a real difference.
Coming back from vacation, I indeed found a stuck Win10 VM. Unfortunately I only noticed it now and I also cannot say how long it has been stuck, because this is an unmonitored VM.
The VM doesn't have the proposed mitigations=off configured, because this has come up only during my vacation...
ok, then I will open an support ticket the next time one of our VMs freezes. Not sure how fast it happens, but it will, sooner or later ... I'll keep you posted.
Would it help if you got access to one of the nodes when the freeze occurs?
If so, I'd be happy to upgrade that node to Basic support and open an official support request. I guess others would be happy to do the same, because this is really growing into a showstopper.
that's exactly what we're seeing as well. The freezing guests can be tight on memory and we have tried with and without memory ballooning, all to no avail.
At least for us, changing it from "VirtIO SCSI" to "VirtIO SCSI single" has made the biggest difference. We've been seeing far less freezes than before ... but as we all know, only time will tell if this really improves things ...
And "VirtIO SCSI single" is actually also the default now.
while we have been "lucky" during the last days, not seeing any recent freezes, what puzzles me is that even a watchdog is not able to interact with such a frozen VM. That is, no matter what has been configured (eg. reboot), the VM is and remains frozen.
As far as I understand how the kernel...
We're probably in the same boat as well, we're seing some very high load Debian 11 VMs (high load on CPU, RAM and I/O) freezing randomly. Sometimes they run without issues for weeks and sometimes they crash after days following a reboot.
In our case, the common determinator is that they run on...
well, I dug a bit deeper as well and it looks like as if the version of the ureq HTTPS client used (as indicated in the exception /usr/share/cargo/registry/ureq-2.4.0/) is outdated. There is a fix for dealing with those "UnsupportedCriticalExtension" since ureq 2.6.0...
Allright, after temporarily removing the local CA certificates from /usr/local/share/ca-certificates and running "update-ca-certificates --fresh" again, pvesubscription update successfully completes.
So the issue must have something to do with those custom CA certificates, but as they have been...
yes, there are some local CA certificates in that directory. They have been put there in Nov 2019, so that's quite a while :)
I however noticed that one of the certificates didn't have .crt but .pem as a filename extension. After listing all known certificates from...
thanks.
I can ping & connect to shop.proxmox.com on the node's commandline, but "pvesubscription update" throws an ugly exception:
root@sisyphos:~# RUST_BACKTRACE=full pvesubscription update
thread '<unnamed>' panicked at 'Failed to add native certificate too root store...
Hi,
one of our nodes feels unhappy about its subscription state and displays "invalid: subscription information too old" in the node's subscription page:
When I click "Check", a popup opens and displays "Connection error 596: Broken pipe".
Nothing conclusive in the logs, maybe anyone else...
When it comes to storage, we have been using ZFS over iSCSI in our clusters for years.
Now for a couple of new projects, we require S3 compatible storage and I am unsure about the best way to handle this situation. I am tempted to use MinIO, but I've read mixed reviews about it and Ceph seems...
alright, figured it out myself. The "problem" is that I ticked "privilege separation" when I created the token. After unchecking privilege separation, everything works now as expected!
I am trying to use the terraform<->proxmox plugin and for that purpose, I have created a dedicated terraform provisioning user like this:
pveum role add TerraformProv -privs "VM.Allocate VM.Clone VM.Config.CDROM VM.Config.CPU VM.Config.Cloudinit VM.Config.Disk VM.Config.HWType VM.Config.Memory...
I'm in the process of migrating our infrastructure to a more declarative, GitOps/terraform style. As a part of this, a lot of VMs will be created using customized cloud-init images (debian, fwiw).
Now the only thing missing for my little puzzle is to get networking right. When creating VMs...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.