QEMU 7.2 available on pvetest as of now

Status
Not open for further replies.

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
6,440
3,474
303
South Tyrol/Italy
shop.proxmox.com
FYI: The next Proxmox VE point release 7.4 (2023/H1) will default to QEMU 7.2. Internal testing for that version started in December, today it has been made available on the pvetest repository. Besides some small initial bug and regression fixes, we saw no actual new issues coming up.

Note: While our internal workloads run stable, we do not recommend upgrading production setups just yet, that's why we initially made it available on the pvetest repository.
But, if you have test instances, face issues with older QEMU or QEMU/Kernel combinations that 7.2 might fix, or are just interested to evaluate it early it is a great chance for upgrading early and providing feedback here.


To upgrade ensure you have the Proxmox VE Test repositories configured, which can be added through the Web UI's Repositories panel, and then use the standard:
Bash:
apt update
apt full-upgrade
or use the upgrade functionality of the web-interface.

pveversion -v (or the web-interface's Node Summary -> Packages versions) should then include something like pve-qemu-kvm: 7.2.0-5

Note, as with all QEMU updates: A VM needs to be either fully restarted (shutdown/start or using restart via the CLI or web-interface) or, to avoid downtime, live-migrated to an already upgraded host to actually run with the newer QEMU version.


While we successfully run our production and lots of testing loads on this version since a while, no software is bug free, and often such issues are related to the specific setup. So, if you run into regressions that are definitively caused by installing the new QEMU version (and not some other change), please always include the affected VM config and some basic HW and Storage details.

We'll update this thread once this version has been moved to no-subscription, which might happen already next week depending on feedback.
 
Last edited:
I receive the following error when starting a VM.

Code:
task started by HA resource agent
kvm: rbd request failed: cmd 0 offset 0 bytes 540672 flags 0 task.ret -2 (No such file or directory)
kvm: can't read block backend: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1
 
it's worth mentioning , that qemu 7.2 seems to resolve the problem with the hanging network adapters (yellow exclamation mark / error 56) with german windows 10/20xx installation
 
  • Like
Reactions: B.Otto
I receive the following error when starting a VM.

Code:
task started by HA resource agent
kvm: rbd request failed: cmd 0 offset 0 bytes 540672 flags 0 task.ret -2 (No such file or directory)
kvm: can't read block backend: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1
Can you please:
So, if you run into regressions that are definitively caused by installing the new QEMU version (and not some other change), please always include the affected VM config and some basic HW and Storage details.

The ceph version and use of KRBD or not would be interesting too. FWIW, we run most of our internal production services backed on a Ceph Quincy cluster and got some Pacific cluster for test and QA that showed no such issues in combination with, so extra details would be definitively required to know here.

Is the ceph pool healthy? Is the image the VM uses listed in the storage's content tab?
 
I receive the following error when starting a VM.

Code:
task started by HA resource agent
kvm: rbd request failed: cmd 0 offset 0 bytes 540672 flags 0 task.ret -2 (No such file or directory)
kvm: can't read block backend: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1
seem to come from qemu rbd block driver. (so librbd, no krbd here)

what is your librbd version ? (dpkg -l|grep librbd).
 
Ceph is healthy. CephRBD is Replica 3 on 3x nodes with 7x Samsung sm863 OSD per node (21 total). WAL/DB is Optane 900p. RBD persistent write-back cache is also on Optane. This only happens on virtio block. It does not happen on virtio-scsi. I am using librbd because krbd does not support persistent write-back cache.

proxmox-ve: 7.3-1 (running kernel: 6.1.10-1-pve) pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec) pve-kernel-6.1: 7.3-4 pve-kernel-helper: 7.3-4 pve-kernel-5.15: 7.3-2 pve-kernel-6.1.10-1-pve: 6.1.10-1 pve-kernel-6.1.6-1-pve: 6.1.6-1 pve-kernel-6.1.2-1-pve: 6.1.2-1 pve-kernel-5.15.85-1-pve: 5.15.85-1 ceph: 17.2.5-pve1 ceph-fuse: 17.2.5-pve1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: not correctly installed ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.4 libproxmox-backup-qemu0: 1.3.1-1 libpve-access-control: 7.3-1 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.3-2 libpve-guest-common-perl: 4.2-3 libpve-http-server-perl: 4.1-5 libpve-storage-perl: 7.3-2 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-1 lxcfs: 5.0.3-pve1 novnc-pve: 1.4.0-1 proxmox-backup-client: 2.3.3-1 proxmox-backup-file-restore: 2.3.3-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-offline-mirror-helper: 0.5.1-1 proxmox-widget-toolkit: 3.5.5 pve-cluster: 7.3-2 pve-container: 4.4-2 pve-docs: 7.3-1 pve-edk2-firmware: 3.20220526-1 pve-firewall: 4.2-7 pve-firmware: 3.6-3 pve-ha-manager: 3.5.1 pve-i18n: 2.8-2 pve-qemu-kvm: 7.2.0-5 pve-xtermjs: 4.16.0-1 qemu-server: 7.3-3 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+2 vncterm: 1.7-1 zfsutils-linux: 2.1.9-pve1

Tested kernels 6.1.2 and 6.1.10.

agent: 0,type=virtio
args: -device isa-applesmc,osk="ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc" -smbios type=2 -smp 4,sockets=1,cores=2,threads=2,maxcpus=4 -cpu Skylake-Server-IBRS,vendor=GenuineIntel,+avx2,+avx512f,+avx512dq,+avx512cd,+avx512bw,+avx512vl,+vmx,+pclmulqdq,+pdcm,+bmi1,+hle,+smep,+bmi2,+erms,+xsaveopt,+xsavec,+xsaves,+xgetbv1,+smap,+rtm,+mpx,+rdseed,+adx,+clflushopt,+clwb,+pku,+stibp,+aes
balloon: 0
bios: ovmf
boot: order=virtio0
cores: 4
cpu: Skylake-Server-IBRS
cpuunits: 102
efidisk0: CephRBD:vm-110-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
machine: pc-q35-7.2
memory: 4096
name: macOS-caching
net0: virtio=62:65:47:7B:BF:F6,bridge=vmbr1,firewall=1
numa: 0
ostype: other
scsihw: virtio-scsi-single
smbios1: uuid=28a447b1-bc85-4e0b-8319-2415292e33e6
sockets: 1
tablet: 1
vga: vmware,memory=32
virtio0: CephRBD:vm-110-disk-1,aio=io_uring,cache=none,discard=on,iothread=1,size=76295M
vmgenid: 14235fd0-aef4-4e83-bc55-8f7dfadf2d59
vmstatestorage: CephRBD

Code:
rbd ls CephRBD | grep 110
vm-110-disk-0
vm-110-disk-1

Code:
dpkg -l|grep librbd
ii  librbd1                              17.2.5-pve1                    amd64        RADOS block device client library
ii  python3-rbd                          17.2.5-pve1                    amd64        Python 3 libraries for the Ceph librbd library

root@viper:~# rbd status CephRBD/vm-110-disk-0
Watchers: none
Persistent cache state:
host: maverick
path: /mnt/optane/rbd_pwl/rbd-pwl.CephRBD.b61c6812d2d87c.pool
size: 4 GiB
mode: ssd
stats_timestamp: Wed Feb 22 00:30:38 2023
present: true empty: false clean: false
allocated: 3.6 GiB
cached: 226 MiB
dirty: 226 MiB
free: 459 MiB
hits_full: 0 / 0%
hits_partial: 0 / 0%
misses: 0
hit_bytes: 0 B / 0%
miss_bytes: 0 B
root@viper:~# rbd status CephRBD/vm-110-disk-1
Watchers: none
Persistent cache state:
host: maverick
path: /mnt/optane/rbd_pwl/rbd-pwl.CephRBD.b61c712396831f.pool
size: 4 GiB
mode: ssd
stats_timestamp: Fri Feb 10 03:03:55 2023
present: true empty: false clean: true
allocated: 2.0 GiB
cached: 1.9 GiB
dirty: 0 B
free: 2.0 GiB
hits_full: 1146993 / 17%
hits_partial: 14611 / 0%
misses: 5370009
hit_bytes: 38 GiB / 15%
miss_bytes: 202 GiB
 
RBD persistent write-back cache is also on Optane.
Can you disable that and see how it goes?

That is the only difference to my test setup that I can see right now. And it works fine in my test setup.
 
Works when disabling persistent cache on the rbd images. Curiously only effects virtue-blk not virtio-scsi.
offtopic, but does live migration are working with persistent writeback cache ? how does it work in case of node failure ? are you able to restart vm to another node (maybe with loss last write lost, but without corruption ?).
 
offtopic, but does live migration are working with persistent writeback cache ? how does it work in case of node failure ? are you able to restart vm to another node (maybe with loss last write lost, but without corruption ?).
Live migration works fine. I haven’t had an unsafe shutdown to test but writeback cache is safe and there are commands to flush or invalidate cache in the event of such a crash.
 
  • Like
Reactions: spirit
Hi,
Ceph is healthy. CephRBD is Replica 3 on 3x nodes with 7x Samsung sm863 OSD per node (21 total). WAL/DB is Optane 900p. RBD persistent write-back cache is also on Optane. This only happens on virtio block. It does not happen on virtio-scsi. I am using librbd because krbd does not support persistent write-back cache.

proxmox-ve: 7.3-1 (running kernel: 6.1.10-1-pve) pve-manager: 7.3-6 (running version: 7.3-6/723bb6ec) pve-kernel-6.1: 7.3-4 pve-kernel-helper: 7.3-4 pve-kernel-5.15: 7.3-2 pve-kernel-6.1.10-1-pve: 6.1.10-1 pve-kernel-6.1.6-1-pve: 6.1.6-1 pve-kernel-6.1.2-1-pve: 6.1.2-1 pve-kernel-5.15.85-1-pve: 5.15.85-1 ceph: 17.2.5-pve1 ceph-fuse: 17.2.5-pve1 corosync: 3.1.7-pve1 criu: 3.15-1+pve-1 glusterfs-client: 9.2-1 ifupdown: not correctly installed ifupdown2: 3.1.0-1+pmx3 ksm-control-daemon: 1.4-1 libjs-extjs: 7.0.0-1 libknet1: 1.24-pve2 libproxmox-acme-perl: 1.4.4 libproxmox-backup-qemu0: 1.3.1-1 libpve-access-control: 7.3-1 libpve-apiclient-perl: 3.2-1 libpve-common-perl: 7.3-2 libpve-guest-common-perl: 4.2-3 libpve-http-server-perl: 4.1-5 libpve-storage-perl: 7.3-2 libqb0: 1.0.5-1 libspice-server1: 0.14.3-2.1 lvm2: 2.03.11-2.1 lxc-pve: 5.0.2-1 lxcfs: 5.0.3-pve1 novnc-pve: 1.4.0-1 proxmox-backup-client: 2.3.3-1 proxmox-backup-file-restore: 2.3.3-1 proxmox-mail-forward: 0.1.1-1 proxmox-mini-journalreader: 1.3-1 proxmox-offline-mirror-helper: 0.5.1-1 proxmox-widget-toolkit: 3.5.5 pve-cluster: 7.3-2 pve-container: 4.4-2 pve-docs: 7.3-1 pve-edk2-firmware: 3.20220526-1 pve-firewall: 4.2-7 pve-firmware: 3.6-3 pve-ha-manager: 3.5.1 pve-i18n: 2.8-2 pve-qemu-kvm: 7.2.0-5 pve-xtermjs: 4.16.0-1 qemu-server: 7.3-3 smartmontools: 7.2-pve3 spiceterm: 3.2-2 swtpm: 0.8.0~bpo11+2 vncterm: 1.7-1 zfsutils-linux: 2.1.9-pve1

Tested kernels 6.1.2 and 6.1.10.

agent: 0,type=virtio
args: -device isa-applesmc,osk="ourhardworkbythesewordsguardedpleasedontsteal(c)AppleComputerInc" -smbios type=2 -smp 4,sockets=1,cores=2,threads=2,maxcpus=4 -cpu Skylake-Server-IBRS,vendor=GenuineIntel,+avx2,+avx512f,+avx512dq,+avx512cd,+avx512bw,+avx512vl,+vmx,+pclmulqdq,+pdcm,+bmi1,+hle,+smep,+bmi2,+erms,+xsaveopt,+xsavec,+xsaves,+xgetbv1,+smap,+rtm,+mpx,+rdseed,+adx,+clflushopt,+clwb,+pku,+stibp,+aes
balloon: 0
bios: ovmf
boot: order=virtio0
cores: 4
cpu: Skylake-Server-IBRS
cpuunits: 102
efidisk0: CephRBD:vm-110-disk-0,efitype=4m,pre-enrolled-keys=1,size=528K
machine: pc-q35-7.2
memory: 4096
name: macOS-caching
net0: virtio=62:65:47:7B:BF:F6,bridge=vmbr1,firewall=1
numa: 0
ostype: other
scsihw: virtio-scsi-single
smbios1: uuid=28a447b1-bc85-4e0b-8319-2415292e33e6
sockets: 1
tablet: 1
vga: vmware,memory=32
virtio0: CephRBD:vm-110-disk-1,aio=io_uring,cache=none,discard=on,iothread=1,size=76295M
vmgenid: 14235fd0-aef4-4e83-bc55-8f7dfadf2d59
vmstatestorage: CephRBD

Code:
rbd ls CephRBD | grep 110
vm-110-disk-0
vm-110-disk-1

Code:
dpkg -l|grep librbd
ii  librbd1                              17.2.5-pve1                    amd64        RADOS block device client library
ii  python3-rbd                          17.2.5-pve1                    amd64        Python 3 libraries for the Ceph librbd library
I tried to reproduce the issue on my virtual test cluster, but the VM booted fine for me. But I haven't ever used the persistent write cache before, so I do have two questions:

Code:
root@viper:~# rbd status CephRBD/vm-110-disk-0
Watchers: none
Persistent cache state:
When I shut down the VM, so don't have watchers anymore, I don't get the persistent cache state information in the output anymore. Maybe I misconfigured something or are there certain conditions that need to be met for that?

Code:
allocated: 3.6 GiB
        cached: 226 MiB
        dirty: 226 MiB
This is the cache for the EFI disk, so how are the values so high?
 
Good question. If you enable persistent cache at the Ceph.conf level it will by default apply it to all disks. I have subsequently disabled it on the TPM and EFI disks. It is now enabled only on VM data disks.
Does it work then with QEMU 7.2 and virtio-blk?

I receive the following error when starting a VM.

Code:
task started by HA resource agent
kvm: rbd request failed: cmd 0 offset 0 bytes 540672 flags 0 task.ret -2 (No such file or directory)
kvm: can't read block backend: No such file or directory
TASK ERROR: start failed: QEMU exited with code 1
Hmm, the size here is actually 528KiB which matches the EFI disk. So if it's actually about the EFI disk, it'd very strange that switching the other disk from virtio-blk to virtio-scsi makes a difference.
 
Thanks for the feedback here and on other channels.
Besides the odd possible regression with the niche RBD persistent write-back cache (which we cannot reproduce but are still investigating - @jasonsansone: please open a new thread to post any more info w.r.t. Fiona's question) it seems that 7.2 addressed some odd bugs, thus due to popular demand on some Bugzilla entries or forum posts we'll move that release ahead to no-subscription.

I'm closing this thread, please either check out the new one, which has been created for better visibility, or open a separate one for specific questions or issues.
 
  • Like
Reactions: jasonsansone
Status
Not open for further replies.