Proxmox Virtual Environment 9.1 available!

Several VM's this test round, 120 as the example. I stopped the backup as it got stuck and the VM was frozen.
The output does not seem to hint at a QEMU-internal deadlock.

Code:
INFO:  26% (764.0 MiB of 2.8 GiB) in 6s, read: 60.0 MiB/s, write: 58.7 MiB/s
INFO:  29% (856.0 MiB of 2.8 GiB) in 1h 20m 55s, read: 19.4 KiB/s, write: 19.4 KiB/s
At this time it was already extremely slow. How much load was on the network or Ceph?
Do you run the backup at the same time for multiple nodes? If yes, scheduling at different times could help.

Is the network used for backup different than the one for Ceph? If not, it's recommended to separate it.

In general, you can try setting a bandwidth limit for the backup job and also enable backup fleecing, those might help already.

If nothing helps, what exactly are the symptoms of being "frozen" (console/network/...)? Is there anything in the guest's system logs from around the time of the issue?
 
The output does not seem to hint at a QEMU-internal deadlock.

Code:
INFO:  26% (764.0 MiB of 2.8 GiB) in 6s, read: 60.0 MiB/s, write: 58.7 MiB/s
INFO:  29% (856.0 MiB of 2.8 GiB) in 1h 20m 55s, read: 19.4 KiB/s, write: 19.4 KiB/s
At this time it was already extremely slow. How much load was on the network or Ceph?
Do you run the backup at the same time for multiple nodes? If yes, scheduling at different times could help.

Is the network used for backup different than the one for Ceph? If not, it's recommended to separate it.

In general, you can try setting a bandwidth limit for the backup job and also enable backup fleecing, those might help already.

If nothing helps, what exactly are the symptoms of being "frozen" (console/network/...)? Is there anything in the guest's system logs from around the time of the issue?

Hi Fiona,

Same behavior on one nod versus all 3. In the past all three backed up at once without issue.
Low network utilization, backup server is on a 20gb lagg as well. Graphs don't show anything about 4 gbps.
seperate networks for cluster, frontend (VM traffic) and ceph.
Error on any frozen vm =
VM 114 qmp command 'set_password' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
TASK ERROR: Failed to run vncproxy.
CPU (Virtual) is over 100% and/or Guest Ram is peaked at 100%. VM is not responsive at all via network or console (console error above)
Nothing in the guest system logs.

Note, This dev cluster is about 1.5 years old, 100% backup success rate until 9.1.1. Random VM's, everytime. I'll turn fleecing on and see if it changes anything and report back!

Appreciate the help.
 
Last edited:
Same behavior on one nod versus all 3. In the past all three backed up at once without issue.
Low network utilization, backup server is on a 20gb lagg as well. Graphs don't show anything about 4 gbps.
seperate networks for cluster, frontend (VM traffic) and ceph.
What about the load on the PBS side? Your log shows the backup was extremely slow at some point even before the freeze. I'd still recommend using fleecing and setting a bandwidth limit for the backup job.
Error on any frozen vm =
VM 114 qmp command 'set_password' failed - unable to connect to VM 114 qmp socket - timeout after 51 retries
TASK ERROR: Failed to run vncproxy.
CPU (Virtual) is over 100% and/or Guest Ram is peaked at 100%. VM is not responsive at all via network or console (console error above)
The debug info you posted for 120 showed the VM still responding to other QMP commands. Did you have these errors for VM 120 before obtaining the debug info this time?

Any slow ops messages or similar when you look at the Ceph logs?
 
I wanted to ask because the information on the documentation is not clear and I think it is a 9.1 feature. The host-managed option on network tab what is doing? Is it related to the docker network?
 
Hi,

On Dell PowerEdge R630 PVE 9.1 installer dies at early phase (when populating /dev tree).

After installing PVE 8.4 and upgrade to 9.1 kernel 6.17 resets machine just before leaving kernel boot stage, but booting system with latest 6.8 and 6.14 goes just fine.

On Dell PowerEdge R640 6.17 there is no problem with upgrading to PVE 9.1 with kernel 6.17.
 
  • Like
Reactions: Stoiko Ivanov
I wanted to ask because the information on the documentation is not clear and I think it is a 9.1 feature. The host-managed option on network tab what is doing? Is it related to the docker network?
When enabled, Proxmox VE sets up networking on CT start, opposed to having that done from the CTs network management stack.
It's indeed most useful for OCI images that do not have their own network management stack, i.e. application container ("docker") ones, but can be also useful for system containers.
 
Another question related to docker storage this time. Storage Backed Mount Points: there is option to set size to 0 and this will create directory, however neither the GUI nor the CLI (I tried something) allow me to do this. Plus bind mounts I guess needs to done only from CLI, right?
My use case is that I want to have the main directory of the container available to pass configuration files (since something like docker exec is... not available for now)
 
I am in the same boat using the Frigate container. I noticed if I change it to a unprivileged LXC it will create the container. Privileged fails for me.

Edit: I just confirmed its the privilege argument:
# pct create 122 /data1/images/template/cache/frigate_stable.tar --hostname Frigate1 --storage data1 --password 'password' --rootfs data1:8 --memory 1024 --swap 512 --net0 name=eth0,bridge=vmbr0,ip=dhcp --features nesting=1 --unprivileged 0
Detected OCI archive
unable to create CT 122 - Invalid argument

It works if I change unprivileged to 1.

I was having the same issue, but I was able to work around it in theory (container shows unprivileged=false now, stat /proc shows owned by root) by creating the container as unprivileged, then performing the usual unprivileged -> privileged conversion (i.e. doing a backup and changing privilege level on restore). I have not yet tried actually doing anything else that would exercise the privilege level.
 
Hi,
Another question related to docker storage this time. Storage Backed Mount Points: there is option to set size to 0 and this will create directory, however neither the GUI nor the CLI (I tried something) allow me to do this.
What error did you get? Did you use a directory type storage? For example:
Code:
pct set 107 --mp0 dir:0,mp=/example/path

Plus bind mounts I guess needs to done only from CLI, right?
Yes, this can be done via CLI similarly, specifying a full path on the host:
Code:
pct set 107 --mp0 /path/on/host/,mp=/example/path

My use case is that I want to have the main directory of the container available to pass configuration files (since something like docker exec is... not available for now)
Feel free to open a feature request (after checking if one already exists): https://bugzilla.proxmox.com/