EFI VM's won't start under 7 Beta with Writeback Cache

jasonsansone

Active Member
May 17, 2021
121
28
28
Oklahoma City, OK
www.sansonehowell.com
Original Post Here

I can't get any virtual machines to start on 7.0-5 Beta when using writeback cache with Ceph backed raw virtual disks. The console hangs before the EFI stage begins. See attached image. At the advise of @t.lamprecht, I narrowed the problem down to the cache setting.

Won't Work:
scsi0: CephRBD:vm-111-disk-1,cache=writeback,discard=on,iothread=1,size=153601M,ssd=1

Works:
scsi0: CephRBD:vm-111-disk-1,cache=none,discard=on,iothread=1,size=153601M,ssd=1

Bug Report - https://bugzilla.proxmox.com/show_bug.cgi?id=3498
 

Attachments

  • Screen Shot 2021-06-29 at 7.13.09 AM.png
    Screen Shot 2021-06-29 at 7.13.09 AM.png
    88.5 KB · Views: 3

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
Hmm, cache setting can look like it would affect some behavior, but fully hanging seems a bit weird.

What was the longest time you waited to see if it wasn't fully frozen but real slow?

Also, did you try to use another display option than explicitly setting stdvga?

Just trying to see if there are some other factors playing (additionally) into that - thanks for your reports in any case!
 

jasonsansone

Active Member
May 17, 2021
121
28
28
Oklahoma City, OK
www.sansonehowell.com
What was the longest time you waited to see if it wasn't fully frozen but real slow?

Also, did you try to use another display option than explicitly setting stdvga?

I have waited several minutes between each iterative test. Tested using virto-gpu, VMware, and SPICE. No change. The results are the same. Graphics seems unrelated.

I also turned off SSD emulation, IO thread, and discard to test those settings. No effect.

What did work was changing the SCSI Controller. The LSI and MegaRAID controllers allow me to advanced past EFI settings. VirtIO and VMWare controllers hang with writeback cache. They work with caching = none.
 

Attachments

  • Screen Shot 2021-06-29 at 10.35.35 AM.png
    Screen Shot 2021-06-29 at 10.35.35 AM.png
    84.9 KB · Views: 1

t.lamprecht

Proxmox Staff Member
Staff member
Jul 28, 2015
5,511
1,760
164
South Tyrol/Italy
shop.proxmox.com
That option needs to be a drive option (I adapted the release notes), e.g., in your case you'd do something like:

Bash:
qm set VMID --scsi0 "CephRBD:vm-111-disk-1,cache=writeback,discard=on,iothread=1,size=153601M,ssd=1,aio=native"
 
  • Like
Reactions: jasonsansone

jasonsansone

Active Member
May 17, 2021
121
28
28
Oklahoma City, OK
www.sansonehowell.com
Boots:
scsi0: CephRBD:vm-111-disk-1,cache=none,discard=on,iothread=1,size=153601M,ssd=1,aio=io_uring
scsi0: CephRBD:vm-111-disk-1,cache=none,discard=on,iothread=1,size=153601M,ssd=1,aio=native
scsi0: CephRBD:vm-111-disk-1,cache=writeback,discard=on,iothread=1,size=153601M,ssd=1,aio=threads

Won't Boot to EFI Init:
scsi0: CephRBD:vm-111-disk-1,cache=writeback,discard=on,iothread=1,size=153601M,ssd=1,aio=io_uring
 

Stefan_R

Proxmox Retired Staff
Retired Staff
Jun 4, 2019
1,300
280
88
Vienna
Can you post your storage config (/etc/pve/storage.cfg)? Most importantly from that, do you use krbd for your "CephRBD" storage, and does it make a difference if you change it?
 

jasonsansone

Active Member
May 17, 2021
121
28
28
Oklahoma City, OK
www.sansonehowell.com
Can you post your storage config (/etc/pve/storage.cfg)? Most importantly from that, do you use krbd for your "CephRBD" storage, and does it make a difference if you change it?

Yes, I typically use the kernel RBD module for performance. If I use librbd instead (shutdown VM, uncheck KRBD, changed settings to writeback cache from none, boot VM) everything works as expected. The problem does not occur using librbd but appears to be isolated to the kernel RBD module when using the new io_uring. As tested above, the problem did not occur on KRBD when using aio=threads with writeback cache or when cache is set to none.

dir: local
disable
path /var/lib/vz
content backup
prune-backups keep-all=1
shared 0

cephfs: CephFS
path /mnt/pve/CephFS
content vztmpl,iso
prune-backups keep-daily=7

rbd: CephRBD
content images,rootdir
krbd 1
pool CephRBD

pbs: PBS
datastore Home
server 192.168.2.222
content backup
fingerprint 0d:a2:fc:17:03:cc:50:83:00:cc:56:3d:24:69:b3:de:6d:ca:95:10:a1:9a:01:72:54:b0:09:e1:09:e3:c8:5d
prune-backups keep-all=1
username root@pam

proxmox-ve: 7.0-2 (running kernel: 5.11.22-1-pve)
pve-manager: 7.0-5 (running version: 7.0-5/cce9b25f)
pve-kernel-5.11: 7.0-3
pve-kernel-helper: 7.0-3
pve-kernel-5.11.22-1-pve: 5.11.22-1
pve-kernel-5.11.21-1-pve: 5.11.21-1
ceph: 16.2.4-pve1
ceph-fuse: 16.2.4-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: not correctly installed
ifupdown2: 3.0.0-1+pve5 ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.1.0
libproxmox-backup-qemu0: 1.0.3-1
libpve-access-control: 7.0-3
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-4
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-6
libqb0: 1.0.5-1
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-1
lxcfs: 4.0.8-pve1
novnc-pve: 1.2.0-3
proxmox-backup-client: 1.1.10-1
proxmox-backup-file-restore: 1.1.10-1
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.1-4
pve-cluster: 7.0-2
pve-container: 4.0-3
pve-docs: 7.0-3
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.2-2
pve-i18n: 2.3-1
pve-qemu-kvm: 6.0.0-2
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-5
smartmontools: 7.2-pve2
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.4-pve1

agent: 1,fstrim_cloned_disks=1
balloon: 0
bios: ovmf
boot:
cores: 28
cpu: host
cpuunits: 101
efidisk0: CephRBD:vm-111-disk-2,size=1M
machine: pc-q35-6.0
memory: 24576
name: encoder
net0: virtio=00:a0:98:7f:85:c8,bridge=vmbr1,firewall=1
numa: 1
ostype: win10
scsi0: CephRBD:vm-111-disk-1,cache=none,discard=on,iothread=1,size=153601M,ssd=1
scsihw: virtio-scsi-single
smbios1: uuid=3fe0d48d-0709-4e0c-99b6-8f8bcfb658f0
sockets: 2
tablet: 0
vga: std,memory=32
vmgenid: 1acdd3ec-cc43-4bcd-b739-05202257a5b2
vmstatestorage: CephRBD
 
Last edited:

Stefan_R

Proxmox Retired Staff
Retired Staff
Jun 4, 2019
1,300
280
88
Vienna

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get your own in 60 seconds.

Buy now!