Unable to start VMs

hacman

Renowned Member
Oct 11, 2013
90
8
73
Newcastle upon Tyne, UK
Hi all,

After a little help here.

Suddenly and seemingly without cause, we can't start any VMs on either of our cluster nodes.

We get the message:

Code:
root@[REDACTED]:~# qm start 100
ipcc_send_rec failed: File too large
Running as unit 100.scope.
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 0]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 1]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 2]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 4]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 6]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 7]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 8]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 9]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 12]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 13]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 14]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 15]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 16]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 17]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 23]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 24]
kvm: -vnc unix:/var/run/qemu-server/100.vnc,x509,password: Failed to start VNC server: Our own certificate /etc/pve/local/pve-ssl.pem failed validation against /etc/pve/pve-root-ca.pem: The certificate hasn't got a known issuer
start failed: command '/usr/bin/systemd-run --scope --slice qemu --unit 100 -p 'KillMode=none' -p 'CPUShares=1000' /usr/bin/kvm -id 100 -chardev 'socket,id=qmp,path=/var/run/qemu-server/100.qmp,server,nowait' -mon 'chardev=qmp,mode=control' -vnc unix:/var/run/qemu-server/100.vnc,x509,password -pidfile /var/run/qemu-server/100.pid -daemonize -smbios 'type=1,uuid=9e87c234-47e1-4ab5-bec3-f9071133d2e2' -name [REDACTED] -smp '1,sockets=1,cores=1,maxcpus=1' -nodefaults -boot 'menu=on,strict=on,reboot-timeout=1000' -vga cirrus -cpu host,+kvm_pv_unhalt,+kvm_pv_eoi -m 2048 -object 'memory-backend-ram,size=2048M,id=ram-node0' -numa 'node,nodeid=0,cpus=0,memdev=ram-node0' -k en-gb -device 'pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e' -device 'pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f' -device 'piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2' -device 'usb-tablet,id=tablet,bus=uhci.0,port=1' -device 'virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3' -iscsi 'initiator-name=iqn.1993-08.org.debian:01:d11eb7d96739' -drive 'file=/var/lib/vz/images/100/vm-100-disk-1.qcow2,if=none,id=drive-virtio0,format=qcow2,cache=none,aio=native,detect-zeroes=on' -device 'virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100' -drive 'if=none,id=drive-ide2,media=cdrom,aio=threads' -device 'ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200' -netdev 'type=tap,id=net0,ifname=tap100i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on' -device 'virtio-net-pci,mac=32:63:66:31:31:66,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300'' failed: exit code 1
ipcc_send_rec failed: File too large

This happens on all VMs, and we've rebooted to host to rule out anything weird there. The same error is given trying to start the VMs from the GUI or CLI using "qm start".

We've also re-installed the SSL certificates on the machine, and they are fine (newly issues today actually).

Our version info is below:

Code:
proxmox-ve: 4.1-37 (running kernel: 4.2.8-1-pve)
pve-manager: 4.1-13 (running version: 4.1-13/cfb599fb)
pve-kernel-4.2.6-1-pve: 4.2.6-36
pve-kernel-4.2.8-1-pve: 4.2.8-37
lvm2: 2.02.116-pve2
corosync-pve: 2.3.5-2
libqb0: 1.0-1
pve-cluster: 4.0-32
qemu-server: 4.0-55
pve-firmware: 1.1-7
libpve-common-perl: 4.0-48
libpve-access-control: 4.0-11
libpve-storage-perl: 4.0-40
pve-libspice-server1: 0.12.5-2
vncterm: 1.2-1
pve-qemu-kvm: 2.5-5
pve-container: 1.0-44
pve-firewall: 2.0-17
pve-ha-manager: 1.0-21
ksm-control-daemon: 1.2-1
glusterfs-client: 3.5.2-2+deb8u1
lxc-pve: 1.1.5-7
lxcfs: 0.13-pve3
cgmanager: 0.39-pve1
criu: 1.6.0-1
zfsutils: 0.6.5-pve7~jessie

Does anyone have any ideas on what could be the cause of this?

Any help much appreciated!

Thanks,

Jon
 
Last edited:
Just a quick update;

Changing the VM CPU type removes all the unsupported feature errors, but the rest remains.

Very odd since these VMs have been running for months without issue.

Jon
 
Thanks for the reply!

Running systemctl stop 100.scope has no effect sadly, as the system states that the scope is not yet loaded.

I'd come across that thread you mention in my initial research of the issue, and whilst the symptoms seem related that sadly has no effect either.

The cluster seems to be working as expected - the web UI is all good too which is what is so strange.

I think the issue here is just that the certificates have become badly messed up somehow.
Next step for us as this is not a live cluster is we're going to tear-down and rebuild.

I'd still be interested to hear any theories anyone else has though - as this one is kind of bugging me!

Jon
 
Interestingly on a new cluster we're seeing the same:

Code:
Running as unit 100.scope.
kvm: -vnc unix:/var/run/qemu-server/100.vnc,x509,password: Failed to start VNC server: Our own certificate /etc/pve/local/pve-ssl.pem failed validation against /etc/pve/pve-root-ca.pem: The certificate hasn't got a known issuer

So I think the certificates are at fault.

One for the Proxmox team - are there any plans to bring certificate installation into the GUI?
 
Ok - got full confirmation it's the certificates now. The new node does the same.

This is very odd as the certs on the old cluster were not changed at any point!

Seems we have an issue with our chain or CA - we're using Comodo PositiveSSL.

I'll update if we find out where I'm going wrong.
 
Progress :)

Got things working by changing the order in which the certificates are listed in the PEM files.

Despite the instructions on the Wiki, I added the intermediate certificate only to the server cert file.

Seems to all be working now :)

Thanks for all the help folks!
 
Despite the instructions on the Wiki, I added the intermediate certificate only to the server cert file.

Note: this will work for the web GUI (and probably noVnc?), but will break Spice (at least using remote-viewer/virt-viewer, not sure about other clients). Just a heads up.
 
clients). Just a heads up.

Interesting to know! Thanks!

Are you able to advise on the correct method for installing certificates? The instructions in the wiki don't seem to work "as is", and I had to resort to the above just to get something that works. We're now seeing other things broken though.

As I say, an area in the web UI to manage and validate certificates would be most welcome!

Thanks,

Jon
 
There will probably be some changes coming up to this area when let's encrypt support is enabled as experimental feature, but it's not completely finalized yet. For the moment, replacing the cluster CA with a non self-signed one is dangerous because it breaks adding nodes to the cluster and Spice. Replacing just the node certificate and key should "only" break Spice.

If you experience other issues than those known two, please report them and we will try to fix them in the upcoming SSL related changes. The more information you give, the higher the chance that we are able to reproduce the issues and find solutions.
 
I'll have to check again, but I'm sure replacing the server certificate and the key resulted in VMs not booting, as the system was unhappy that the CA could not allow validation of the server cert against it.

What has actually changed to break all this? As it was working perfectly in 3.x, and even early on in 4.x

The method by which SSL is enabled seems to need quite a bit of work, as I'm sure we're not the only ones who want to put our own certificate onto the cluster in place of the self signed. We're using a wildcard certificate, so in our use-case putting the same cert to many nodes is not an issue.

Let me know if there is any information that would be especially useful in working this out - we're happy to try and help with bug fixing and the ongoing development.

Thanks,

Jon
 
You are right, if VNC is enabled for Qemu it will also try to validate the node certificate against the cluster CA. So you either need to disable the vnc console, or replace the cluster CA with your commercial CA certificates and be careful about adding nodes to the cluster (i.e., whenever you add a node you need to restore the cluster CA temporarily and replace it again afterwards).
 
So, to check that I comprehend, we're ok to use commercial certs, so long as whenever we do an add/remove node from the cluster we temporarily put the self-signed certs back in place?

And then aside from that all thins should work as expected?

We only really use the NoVNC terminal functionality, so SPICE is not an issue.

What actually brought about this issue? As I'm sure this worked previously.

Thanks again!

Jon
 
Yes, as far as I can tell. Like I said, there will probably be some changes in this area in the next weeks which should make all of this more easy to setup and use.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!