Tracking down boot slowness with OVMF and EFI disk on RBD

kc8apf

New Member
May 28, 2021
1
0
1
42
I noticed that VMs using OVMF with EFI disk on RBD are very slow to boot. Since I have a test cluster with hyper-converged Ceph that is fairly idle, I looked at the Ceph dashboard while starting a fresh VM and saw ~80 write I/Os while OVMF ran (from console being initialized through Proxmox splash screen). Looking at the kvm command-line generated by qm showcmd <VID>, I noticed that no cache setting is set for efidisk0:

Code:
/usr/bin/kvm -id 104 -name test -no-shutdown \
  -chardev 'socket,id=qmp,path=/var/run/qemu-server/104.qmp,server,nowait' \
  -mon 'chardev=qmp,mode=control' \
  -chardev 'socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5' \
  -mon 'chardev=qmp-event,mode=control' \
  -pidfile /var/run/qemu-server/104.pid \
  -daemonize -smbios 'type=1,uuid=79ebeace-02e2-4aec-8314-c2ba9e258a82' \
  -drive 'if=pflash,unit=0,format=raw,readonly,file=/usr/share/pve-edk2-firmware//OVMF_CODE.fd' \
  -drive 'if=pflash,unit=1,format=raw,id=drive-efidisk0,size=131072,file=rbd:vm-disks/vm-104-disk-1:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/vm-disks.keyring' \
  -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults \
  -boot 'menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg' \
  -vnc unix:/var/run/qemu-server/104.vnc,password \
  -cpu kvm64,enforce,+kvm_pv_eoi,+kvm_pv_unhalt,+lahf_lm,+sep -m 2048 \
  -readconfig /usr/share/qemu-server/pve-q35-4.0.cfg \
  -device 'vmgenid,guid=040ad59b-8c9e-4f92-b513-5aa78e0d7263' \
  -device 'qxl-vga,id=vga,max_outputs=4,bus=pcie.0,addr=0x1' \
  -device 'virtio-serial,id=spice,bus=pci.0,addr=0x9' \
  -chardev 'spicevmc,id=vdagent,name=vdagent' \
  -device 'virtserialport,chardev=vdagent,name=com.redhat.spice.0' \
  -spice 'tls-port=61001,addr=127.0.0.1,tls-ciphers=HIGH,seamless-migration=on' \
  -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' \
  -iscsi 'initiator-name=iqn.1993-08.org.debian:01:55f7b7f2a174' \
  -drive 'file=/mnt/pve/cephfs/template/iso/debian-10.9.0-amd64-DVD-1.iso,if=none,id=drive-ide2,media=cdrom,aio=threads' \
  -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=101' \
  -drive 'file=rbd:vm-disks/vm-104-disk-0:conf=/etc/pve/ceph.conf:id=admin:keyring=/etc/pve/priv/ceph/vm-disks.keyring,if=none,id=drive-virtio0,discard=on,format=raw,cache=none,aio=native,detect-zeroes=unmap' \
  -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' \
  -netdev 'type=tap,id=net0,ifname=tap104i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' \
  -device 'virtio-net-pci,mac=5E:71:F1:44:F9:0F,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=102' \
  -machine 'type=q35+pve0'

With no cache setting, QEMU will default to writethrough (see -drive section on https://linux.die.net/man/1/qemu-kvm). For RBD pools, QEMU uses librbd which manages caching in userspace rather than the kernel (see https://docs.ceph.com/en/latest/rbd/qemu-rbd/. QEMU's -drive cache option is mapped to librbd's cache settings according to https://docs.ceph.com/en/latest/rbd/qemu-rbd/#qemu-cache-options. Additional librbd options use the defaults mentioned in https://docs.ceph.com/en/latest/rbd/rbd-config-ref/#rbd-cache-config-settings notably rbd_cache_policy=writearound.

Putting that all together, efidisk0 configures librbd with:
Code:
rbd_cache = true
rbd_cache_max_dirty = 0
rbd_cache_policy = writearound

So reads will be cached but writes will return only once they've been written to the Ceph cluster and will not be cached by librbd. This all makes for terrible performance due to OVMF doing a large number of small writes.

Looking at https://pve.proxmox.com/wiki/Manual:_qm.conf, efidisk0 doesn't accept any options for cache settings so I experimented by modifying the kvm command generated by /usr/share/perl5/PVE/QemuServer.pm. Specifically, I changed line 3298 from:

Code:
push @$cmd, '-drive', "if=pflash,unit=1,format=$format,id=drive-efidisk0$size_str,file=$path";

to

Code:
push @$cmd, '-drive', "if=pflash,unit=1,format=$format,id=drive-efidisk0$size_str,file=$path:rbd_cache_policy=writeback,cache=writeback";

This greatly improves OVMF boot performance as all the writes are cached by librbd. Unfortunately, this also means the writes can be lost if the process crashes or similar. Given the writes only happen during VM boot, it's probably an OK tradeoff. Regardless, the changes to QemuServer.pm will need to be reworked as rbd_cache_policy can only be specified if the file is an RBD volume.

Anyway, hopefully this answers _why_ OVMF with EFI disk on RBD is slow. Adding a cache option for efidisk0 and allowing RBD paths to specify additional RBD options in qm.conf is probably the correct fix. Maybe with RBD volumes defaulting to writeback for EFI disks.
 
Last edited:

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!