Kernel panic - not syncing: IO-APIC + timer doesn't work!

mangoo

Member
Feb 4, 2009
198
0
16
wpkg.org
During host bootup, I believe all guests are started at the same time.

On a busy host with multiple guests, this can result in some guests not starting properly, i.e.:

Kernel panic - not syncing: IO-APIC + timer doesn't work!

Is it possible, on host bootup, not to start all the guests at the same time, but for example, with a 15 seconds interval beetween them?

See http://marc.info/?l=kvm&m=124333983824075 for reference.
 
Maybe you can try this workaround:

- do not auto-start any of your VMs
- then, in rc.local or a cron job, try to run your VMs by invoking "/usr/sbin/qm showcmd $vmid" ... where $vmid is the ID of your VM

so, in rc.local, you can have something like:

/usr/sbin/qm showcmd 100
sleep 60
/usr/sbin/qm showcmd 200
sleep 60

and so on ....
 
That's not the problem here.

The problem are some timing issues when lots of guests are being started concurrently, as pointed out by Avi Kivity.

I am curious. If there is no problem with ACPI than why kernel bother to hang with kernel panic. No one of kernel parameters like acpi=off pci=noacpi noapic nolapic didn't really work for you. Even with noapic option guests machines start will fail with same error? If multiple started guests in same lead to some timeouts in kernel and message tell what it is related to ACPI than disabling it can make it better. It is not final solution but maybe it can help.

Anyway you can try modify vz startup script and add some delay to vm startup section. Look at /etc/init.d/vz and search for start_ves() section where is for loop for starting all vms. You can add some sleep 60 into loop.
 
Avi wrote:


So that bug only occurs when you use virtio?


Avi's comment and the whole thread on kvm-devel is about something else.

When using virtio_net driver, after some time, on some guests, guest network is slow (i.e. when normally ping time is below 1 ms, the delay is around or exactly 1000 ms).

Additionally, virtio_console is slow (i.e. if you type over serial virtio_console connection, characters appear after several seconds).

Other drivers (e1000) are not affected when a guest goes into that weird state (i.e. if you start a guest with virtio and e1000 interfaces, only virtio will be slow).

Also, restarting the guest doesn't help here - guest has to be stopped and started again to be "cured" (new kvm process has to start).


There was a longer discussion about it the past couple of months on kvm-devel list, if you search messages with "slowness" in subject, there were other people affected, there is no easy way to reproduce it, there is no fix ;)
 
I am curious. If there is no problem with ACPI than why kernel bother to hang with kernel panic. No one of kernel parameters like acpi=off pci=noacpi noapic nolapic didn't really work for you. Even with noapic option guests machines start will fail with same error? If multiple started guests in same lead to some timeouts in kernel and message tell what it is related to ACPI than disabling it can make it better. It is not final solution but maybe it can help.

Anyway you can try modify vz startup script and add some delay to vm startup section. Look at /etc/init.d/vz and search for start_ves() section where is for loop for starting all vms. You can add some sleep 60 into loop.

1) they are kvm guests
2) I didn't try to change guest's kernel parameters as virtualization should be precise and fault-free
 
Maybe you can try this workaround:

- do not auto-start any of your VMs
- then, in rc.local or a cron job, try to run your VMs by invoking "/usr/sbin/qm showcmd $vmid" ... where $vmid is the ID of your VM

so, in rc.local, you can have something like:

/usr/sbin/qm showcmd 100
sleep 60
/usr/sbin/qm showcmd 200
sleep 60

and so on ....


Yes, it should work.

How about this change:

Code:
--- QemuServer.pm.usbdevice     2009-05-27 11:23:21.000000000 +0200
+++ QemuServer.pm       2009-05-27 12:44:32.000000000 +0200
@@ -1746,6 +1746,7 @@
            if ($conf->{onboot}) {
                print STDERR "Starting Qemu VM $vmid\n";
                vm_start ($vmid);
+               sleep $startall_sleep;
            }
        };
        print STDERR $@ if $@;
$startall_sleep could be either defined i.e. /etc/pve/qemu-server.cfg (additional changes needed to get the variable from there), or simply hardcoded to, say, 15 seconds.
 
$startall_sleep could be either defined i.e. /etc/pve/qemu-server.cfg (additional changes needed to get the variable from there), or simply hardcoded to, say, 15 seconds.

Please make it configurable, default to 15 seconds - or should the default be 0?

- Dietmar
 
Please make it configurable, default to 15 seconds - or should the default be 0?

- Dietmar

OK, I'll make it configurable.

I think it should default to 15 seconds? Users running few kvm guests (4-6) will most likely not notice the delay as the server boots; users with more guests (15-20 or more) will be happy that all guests start in a usable state.


I'll make some more tests and send you a patch, perhaps tomorrow.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!