Hi everyone,
I think I found an issue with cloud-init volumes. I'm wondering if anyone else has run into this and I'm also wondering if anyone else can replicate what I'm seeing.
It seems that when cold-starting a VM with a cloud-init volume attached to IDE2, that the OS (Alma9 Cloud image in this case) cannot see the volume. After restarting warm (VM reset, or ctrl-alt-delete in the VM), the volume is readable.
I'm wondering if anyone else has run into this and/or can find something silly I'm doing. I've looked to see if it's a bug and I haven't found anything specific to this. If this does wind up being a bug, I'm wondering if anyone has advice on things I can provide to help triage/get it addressed.
For the time-being, I found that attaching the cloud-init drive via SCSI works without issue.
At the grub boot-menu, edit default entry and add
After the kernel has booted up:
After a system reset, sr0 device will be present
Qemu
What's curious is that after I do the reset/reboot to get the cloud-init drive working, that the UUID on the cloud-init partition (as seen by `lsblk -f`) shows the same timestamp as the VM start task...
When looking at the WebUI task log, it shows the 'VM ### - Start' task was created at 2022-10-31 at 15:59:32, which was the first time the VM was trying to boot. So I think the disk image is fine, it's just not getting mapped into the VM somehow.
I couldn't figure out how to get the disk UUID when booting with
I looked to see if there was a difference in qemu process flags at all, but there really wasn't anything different that I could notice
I think I found an issue with cloud-init volumes. I'm wondering if anyone else has run into this and I'm also wondering if anyone else can replicate what I'm seeing.
It seems that when cold-starting a VM with a cloud-init volume attached to IDE2, that the OS (Alma9 Cloud image in this case) cannot see the volume. After restarting warm (VM reset, or ctrl-alt-delete in the VM), the volume is readable.
I'm wondering if anyone else has run into this and/or can find something silly I'm doing. I've looked to see if it's a bug and I haven't found anything specific to this. If this does wind up being a bug, I'm wondering if anyone has advice on things I can provide to help triage/get it addressed.
For the time-being, I found that attaching the cloud-init drive via SCSI works without issue.
Reproducing the issue
I did this all viapvesh
, however I think it might actually work via the qm create
or via the WebUI
Bash:
pvesh create /nodes/vdev-1/qemu \
--name=kdevdev-compute-1 \
--agent=1 \
--boot=c \
--bootdisk=scsi0 \
--scsihw=virtio-scsi-pci \
--bios=ovmf \
--onboot=1 \
--serial0=socket \
--net0=virtio,bridge=vmbr301,firewall=1 \
--ostype=l26 \
--citype=nocloud \
--ciuser=root \
--cipassword=$6$--REDACTED-- \
--sshkeys=ssh-rsa%20--REDACTED-- \
--cpu=host \
--memory=65536 \
--cores=8 \
--efidisk0=local-nvme2-blk:0,efitype=4m,pre-enrolled-keys=1 \
--scsi0=local-nvme2-blk:0,discard=on,size=100G,import-from=/mnt/pve/ceph-fs/ci-images/AlmaLinux-9-GenericCloud-latest.x86_64.qcow2 \
--scsi1=local-nvme2-blk:300,discard=on,size=300G \
--ide2=local-nvme2-blk:cloudinit \
--vmid=1061 \
--ipconfig0=ip=--REDACTED--/26,gw=--REDACTED--,ip6=--REDACTED--/64,gw6=--REDACTED-- \
--output-format=json-pretty
pvesh set /nodes/vdev-1/qemu/1061/firewall/options \
--enable=1 \
--output-format=json-pretty
pvesh create /nodes/vdev-1/qemu/1061/firewall/rules \
--action=kdevdev-compute \
--enable=1 \
--pos=0 \
--type=group
pvesh create /nodes/vdev-1/qemu/1061/status/start
# output
generating cloud-init ISO
UPID:vdev-1:002209BE:105C8F9C:635FD5DC:qmstart:1061:root@pam:
At the grub boot-menu, edit default entry and add
init=/bin/bash
Bash:
linux ... init=/bin/bash
After the kernel has booted up:
Bash:
bash-5.1# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
|-sda1
|-sda2
|-sda3
`-sda4 8.6G 8% /
sdb
### Rescan scsi devices just to be sure...
bash-5.1# for h in /sys/class/scsi_host/host*; do echo "- - -" > $h/scan; done
bash-5.1# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
|-sda1
|-sda2
|-sda3
`-sda4 8.6G 8% /
sdb
### Look for any sr0 related messages in dmesg
bash-5.1# dmesg | grep sr0
[nothing]
After a system reset, sr0 device will be present
Bash:
echo "b" > /proc/sysrq-trigger
### At grub, add `init=/bin/bash`
bash-5.1# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
|-sda1
|-sda2
|-sda3
`-sda4 8.6G 8% /
sdb
sr0
### Check for sr0 related messages in dmesg
bash-5.1# dmesg | grep sr0
[ 1.810820] sr 2:0:0:0: [sr0] scsi3-mmc drive: 4x/4x cd/rw xa/form2 tray
[ 1.833423] sr 2:0:0:0: Attached scsi CD-ROM sr0
Things I've done to narrow down the issue
- Almost universally, resetting, or rebooting after that initial startup will clear the issue.
reset
via WebUI- Guest VM self-reset via
echo b > /proc/sysrq-trigger
- Guest VM reboot via ctrl-alt-del. I suspect a
reboot
would work too, except I'm not able to login to the cloud image.
- Creating the VM via
pvesh
, but booting via WebUI- Result: no block device present on sr0
- Creating the VM via
pvesh
, clicking "Regenerate Image" in Cloud-Init in the WebUI, then booting via the WebUI- Result: no block device present on sr0
- Creating the VM via
pvesh
, but removing/re-adding the cloud-init drive prior to first-boot- Result: no block device present on sr0
- Creating the VM via
pvesh
, but changing cloud-init settings prior to first-boot- Result: no block device present on sr0
- Creating the VM using a SCSI to attach the cloud-init volume
- Result: This seems to work consistently
Other observations
These are things I noted, but may or may not be relevant to the problem.Qemu
info block
command doesn't seem to show any difference before/after reset, apart from locked/not-locked. The only odd part to me is that it's attached to IDE2 instead of a scsi drive, but that might be irrelevant.
Bash:
# info block
pflash0 (#block197): /usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd (raw, read-only)
Attached to: /machine/system.flash0
Cache mode: writeback
drive-efidisk0 (#block368): json:{"driver": "raw", "size": "540672", "file": {"driver": "host_device", "filename": "/dev/nvme2-lvm/vm-1062-disk-0"}} (raw)
Attached to: /machine/system.flash1
Cache mode: writeback
drive-ide2 (#block521): /dev/nvme2-lvm/vm-1062-cloudinit (raw, read-only)
Attached to: ide2
Removable device: not locked, tray closed
Cache mode: writeback
drive-scsi0 (#block794): /dev/nvme2-lvm/vm-1062-disk-1 (raw)
Attached to: scsi0
Cache mode: writeback, direct
Detect zeroes: unmap
drive-scsi1 (#block988): /dev/nvme2-lvm/vm-1062-disk-2 (raw)
Attached to: scsi1
Cache mode: writeback, direct
Detect zeroes: unmap
What's curious is that after I do the reset/reboot to get the cloud-init drive working, that the UUID on the cloud-init partition (as seen by `lsblk -f`) shows the same timestamp as the VM start task...
Bash:
[root@localhost ~]# lsblk -f
NAME FSTYPE FSVER LABEL UUID FSAVAIL FSUSE% MOUNTPOINTS
sda
├─sda1
│
├─sda2
│ vfat FAT16 57E9-4595 192.8M 3% /boot/efi
├─sda3
│ xfs abd5d231-f8cb-41d6-9b8a-528febbfb19e 395.3M 20% /boot
└─sda4
xfs 953ee9e0-c4d0-4f57-8255-d356fc915215 8.4G 10% /
sdb
sr0 iso966 cidata
2022-10-31-15-59-32-00
When looking at the WebUI task log, it shows the 'VM ### - Start' task was created at 2022-10-31 at 15:59:32, which was the first time the VM was trying to boot. So I think the disk image is fine, it's just not getting mapped into the VM somehow.
I couldn't figure out how to get the disk UUID when booting with
init=/bin/bash
, and I suspect it's because udev or some other userland subsystem hasn't booted.I looked to see if there was a difference in qemu process flags at all, but there really wasn't anything different that I could notice
Bash:
root@vdev-3:~# ps auxw | grep 1063
root 1567387 75.7 0.0 68336012 121156 ? Sl 18:33 0:09 /usr/bin/kvm -id 1063 -name kdevdev-compute-3,debug-threads=on -no-shutdown -chardev socket,id=qmp,path=/var/run/qemu-server/1063.qmp,server=on,wait=off -mon chardev=qmp,mode=control -chardev socket,id=qmp-event,path=/var/run/qmeventd.sock,reconnect=5 -mon chardev=qmp-event,mode=control -pidfile /var/run/qemu-server/1063.pid -daemonize -smbios type=1,uuid=b0e59e01-b833-4da7-ab0a-7cd319a33015 -drive if=pflash,unit=0,format=raw,readonly=on,file=/usr/share/pve-edk2-firmware//OVMF_CODE_4M.fd -drive if=pflash,unit=1,format=raw,id=drive-efidisk0,size=540672,file=/dev/nvme2-lvm/vm-1063-disk-0 -smp 8,sockets=1,cores=8,maxcpus=8 -nodefaults -boot menu=on,strict=on,reboot-timeout=1000,splash=/usr/share/qemu-server/bootsplash.jpg -vnc unix:/var/run/qemu-server/1063.vnc,password=on -cpu host,+kvm_pv_eoi,+kvm_pv_unhalt -m 65536 -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e -device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device vmgenid,guid=dd06270a-db04-40c3-809e-23e340a0ca36 -device piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -device usb-tablet,id=tablet,bus=uhci.0,port=1 -chardev socket,id=serial0,path=/var/run/qemu-server/1063.serial0,server=on,wait=off -device isa-serial,chardev=serial0 -device VGA,id=vga,bus=pci.0,addr=0x2 -chardev socket,path=/var/run/qemu-server/1063.qga,server=on,wait=off,id=qga0 -device virtio-serial,id=qga0,bus=pci.0,addr=0x8 -device virtserialport,chardev=qga0,name=org.qemu.guest_agent.0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3,free-page-reporting=on -iscsi initiator-name=iqn.1993-08.org.debian:01:5398f791fb65 -drive file=/dev/nvme2-lvm/vm-1063-cloudinit,if=none,id=drive-ide2,media=cdrom,aio=io_uring -device ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2 -device virtio-scsi-pci,id=scsihw0,bus=pci.0,addr=0x5 -drive file=/dev/nvme2-lvm/vm-1063-disk-1,if=none,id=drive-scsi0,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0,bootindex=101 -drive file=/dev/nvme2-lvm/vm-1063-disk-2,if=none,id=drive-scsi1,discard=on,format=raw,cache=none,aio=io_uring,detect-zeroes=unmap -device scsi-hd,bus=scsihw0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi1,id=scsi1 -netdev type=tap,id=net0,ifname=tap1063i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on -device virtio-net-pci,mac=8A:8C:A1:DD:E4:6A,netdev=net0,bus=pci.0,addr=0x12,id=net0 -machine type=pc+pve0
Last edited: