I noticed that VMs using OVMF with EFI disk on RBD are very slow to boot. Since I have a test cluster with hyper-converged Ceph that is fairly idle, I looked at the Ceph dashboard while starting a fresh VM and saw ~80 write I/Os while OVMF ran (from console being initialized through Proxmox splash screen). Looking at the kvm command-line generated by qm showcmd <VID>, I noticed that no cache setting is set for efidisk0:
With no cache setting, QEMU will default to writethrough (see -drive section on https://linux.die.net/man/1/qemu-kvm). For RBD pools, QEMU uses librbd which manages caching in userspace rather than the kernel (see https://docs.ceph.com/en/latest/rbd/qemu-rbd/. QEMU's -drive cache option is mapped to librbd's cache settings according to https://docs.ceph.com/en/latest/rbd/qemu-rbd/#qemu-cache-options. Additional librbd options use the defaults mentioned in https://docs.ceph.com/en/latest/rbd/rbd-config-ref/#rbd-cache-config-settings notably rbd_cache_policy=writearound.
Putting that all together, efidisk0 configures librbd with:
So reads will be cached but writes will return only once they've been written to the Ceph cluster and will not be cached by librbd. This all makes for terrible performance due to OVMF doing a large number of small writes.
Looking at https://pve.proxmox.com/wiki/Manual:_qm.conf, efidisk0 doesn't accept any options for cache settings so I experimented by modifying the kvm command generated by /usr/share/perl5/PVE/QemuServer.pm. Specifically, I changed line 3298 from:
to
This greatly improves OVMF boot performance as all the writes are cached by librbd. Unfortunately, this also means the writes can be lost if the process crashes or similar. Given the writes only happen during VM boot, it's probably an OK tradeoff. Regardless, the changes to QemuServer.pm will need to be reworked as rbd_cache_policy can only be specified if the file is an RBD volume.
Anyway, hopefully this answers _why_ OVMF with EFI disk on RBD is slow. Adding a cache option for efidisk0 and allowing RBD paths to specify additional RBD options in qm.conf is probably the correct fix. Maybe with RBD volumes defaulting to writeback for EFI disks.
Code:
/usr/bin/kvm -id 104 -name test -no-shutdown \
-chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server,nowait' \
-mon 'chardev=qmp,mode=control' \
-chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
-mon 'chardev=qmp-event,mode=control' \
-pidfile /var/run/qemu-server/104.pid \
-daemonize -smbios 'type=1,uuid=79ebeace-02e2-4aec-8314-c2ba9e258a82' \
-drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' \
-drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=rbd:vm-disks/vm-104-disk-1:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/vm-disks.keyring' \
-smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults \
-boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
-vnc unix:/var/run/qemu-server/104.vnc,password \
-cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 \
-readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
-device 'vmgenid,guid=040ad59b-8c9e-4f92-b513-5aa78e0d7263' \
-device 'qxl-vga,id=vga,max_outputs=4,bus=pcie.0,addr=0x1' \
-device 'virtio-serial,id=spice,bus=pci.0,addr=0x9' \
-chardev 'spicevmc,id=vdagent,name=vdagent' \
-device 'virtserialport,chardev=vdagent,name=com.redhat.spice.0' \
-spice 'tls-port=61001,addr=127.0.0.1,tls-ciphers=HIGH,seamless-migration=on' \
-device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
-iscsi 'initiator-name=iqn.1993-08.org.debian:01:55f7b7f2a174' \
-drive 'file=/mnt/pve/cephfs/template/iso/debian-10.9.0-amd64-DVD-1.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' \
-device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' \
-drive 'file=rbd:vm-disks/vm-104-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/vm-disks.keyring,if=none,id=drive-virtio0,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap' \
-device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' \
-netdev 'type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
-device 'virtio-net-pci,mac=5E:71:F1:44:F9:0F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' \
-machine 'type=q35+pve0'
With no cache setting, QEMU will default to writethrough (see -drive section on https://linux.die.net/man/1/qemu-kvm). For RBD pools, QEMU uses librbd which manages caching in userspace rather than the kernel (see https://docs.ceph.com/en/latest/rbd/qemu-rbd/. QEMU's -drive cache option is mapped to librbd's cache settings according to https://docs.ceph.com/en/latest/rbd/qemu-rbd/#qemu-cache-options. Additional librbd options use the defaults mentioned in https://docs.ceph.com/en/latest/rbd/rbd-config-ref/#rbd-cache-config-settings notably rbd_cache_policy=writearound.
Putting that all together, efidisk0 configures librbd with:
Code:
rbd_cache = true
rbd_cache_max_dirty = 0
rbd_cache_policy = writearound
So reads will be cached but writes will return only once they've been written to the Ceph cluster and will not be cached by librbd. This all makes for terrible performance due to OVMF doing a large number of small writes.
Looking at https://pve.proxmox.com/wiki/Manual:_qm.conf, efidisk0 doesn't accept any options for cache settings so I experimented by modifying the kvm command generated by /usr/share/perl5/PVE/QemuServer.pm. Specifically, I changed line 3298 from:
Code:
push @$cmd, '-drive', "if=pflash,unit=1,format=$format,id=drive-efidisk0$size_str,file=$path";
to
Code:
push @$cmd, '-drive', "if=pflash,unit=1,format=$format,id=drive-efidisk0$size_str,file=$path:rbd_cache_policy=writeback,cache=writeback";
This greatly improves OVMF boot performance as all the writes are cached by librbd. Unfortunately, this also means the writes can be lost if the process crashes or similar. Given the writes only happen during VM boot, it's probably an OK tradeoff. Regardless, the changes to QemuServer.pm will need to be reworked as rbd_cache_policy can only be specified if the file is an RBD volume.
Anyway, hopefully this answers _why_ OVMF with EFI disk on RBD is slow. Adding a cache option for efidisk0 and allowing RBD paths to specify additional RBD options in qm.conf is probably the correct fix. Maybe with RBD volumes defaulting to writeback for EFI disks.
Last edited: