KVM guest is booting from host disk

kobuki

Renowned Member
Dec 30, 2008
473
27
93
I've encountered a very strange and potentially very dangerous issue. I'm using virtio-scsi on top of a local LVM volume on a guest and instead of trying to boot from the virtual disk or CD image, it's trying to boot from the host drive. I can see the host's Grub menu on the guest's console. When I saw it I quickly stopped the VM so it doesn't try to do anything funky on the host. If I start the VM again, it sometimes boots from the virtual CD, but if I wait about 5..10 minutes, it reliably tries to boot from the host drive again. I'm quite baffled at the issue. The server is an older Sun server with Adaptec-based RAID (Sun STK, a 3805 with Sun firmware + BBWC).

Is there a fix for this or have anyone seen this before? My config below.

# pveversion -v
proxmox-ve-2.6.32: 3.3-139 (running kernel: 3.10.0-5-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-3.10.0-5-pve: 3.10.0-19
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1

VM config:
# cat /etc/pve/qemu-server/105.conf
bootdisk: scsi0
cores: 4
ide2: local:iso/debian-8.0.0-amd64-i386-netinst.iso,media=cdrom
memory: 8192
name: x
net0: virtio=0E:5F:32:08:92:01,bridge=vmbr0
ostype: l26
scsi0: local-raw:vm-105-disk-1,cache=writeback,size=12G
scsi1: local-raw:vm-105-disk-2,cache=writeback,size=200G
scsihw: virtio-scsi-pci
smbios1: uuid=9d131852-2ebe-4346-8fc9-9470d839a72d
sockets: 1
 
this is weird...

have you done tests like
- changing scsi ith ide or else?
- are you sure is the host disk really?
- other cluster nodes behave the same with the vm?
- does it work the same creating a new vm with same setup?
- are you sure your local lvm setup has not anything wrong? (I'm not an expert but this could lead to something wrong here... maybe checking your storage.conf and lvm config woudl help)
?

Marco
 
An update: after some reading around I tried loading the vhost-scsi module with "modprobe vhost_scsi". It loaded and I haven't been able to reproduce the error since. Before that I tried upgrading the HN to the latest PVE but it exhibited the exact same problem with kernel 3.10.0-10-pve. One VM went as far as to completely boot the host PVE OS! I was frightened to death to lose the node and reinstall everything on it but apparently it didn't cause any problems. I've checked the host root fs and have found no issues. It might have been read-only for the VM and that might have saved me. Very strange.
 
this is weird...
have you done tests like
- changing scsi ith ide or else?
- are you sure is the host disk really?
- other cluster nodes behave the same with the vm?
- does it work the same creating a new vm with same setup?
- are you sure your local lvm setup has not anything wrong? (I'm not an expert but this could lead to something wrong here... maybe checking your storage.conf and lvm config woudl help)
?
Marco

I've tried and checked all of this before reporting the problem but here it goes. I'll try to answer everything in order.

1. Only virtio-scsi causes issues.
2. Yes, absolutely. See my previous post, too. I was greeted with the host's login prompt in the NoVNC console...
3. No cluster here yet.
4. On this node I haven't tried to reproduce it in another VM but I've seen it on another HN in another VM once. I was quick to dismiss it since I was in a hurry in a DC and haven't given it much thought since then. It was OK after a VM stop/start cycle. That node exhibits the infamous "Booting from hard disk..." freeze on ALL of its VMs on their first start (again, stop/start helps). All of them are using the same setup: virtio-scsi over LVM, disk is Adaptec HWRAID. No other virtual SCSI HW causes issues of this kind. Might be this specific combination on older Adaptec RAID cards, who knows at this point.
5. No apparent problems with LVM of any kind.

I'm thinking it's a virtio-scsi problem with either some specific combination of HW or just a general KVM bug. VM stop/start cycle always seems to solve these strange boot issues I'm seeing so it might be some race condition or an obscure timeout somewhere. I wonder if I should file a bug report to the KVM devs. The whole issue is very strange...

I'll reboot the node with the vhost_scsi module loaded manually via /etc/modules and see how it goes (see my previous post about it).
 
Well, I solved it by running from the latest 2.6.32 PVE kernel. Not a single issue since then. This host doesn't run OVZ guests so I thought I give the 3.10 series a go. It might be fine on newer HW, probably, I don't know. But for production use on this server and probably generally it's not usable yet. I will continue to test it from time to time in the future, though. Maybe on brand new hardware.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!