KVM box needs reset

hk@

Renowned Member
Feb 10, 2010
248
8
83
Vienna
kapper.net
Hi
on the host we have:
pveversion -v
pve-manager: 1.6-5 (pve-manager/1.6/5261)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.6-25
pve-kernel-2.6.32-4-pve: 2.6.32-25
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-22
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-14
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-8
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.12.5-2
ksm-control-daemon: 1.0-4

the kvm-instance needs a reset and it seems we hit a kernelbug on the host, as in the kvm-instance we get this:
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] BUG: soft lockup - CPU#0
stuck for 4096s! [swapper:0]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Modules linked in: ipv6
nfs lockd nfs_acl sunrpc ipt_LOG xt_limit nf_conntrack_ipv4 xt_state
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables loop snd_pcm
snd_timer snd soundcore snd_page_alloc pcspkr serio_raw psmouse button
i2c_piix4 i2c_core joydev evdev ext3 jbd mbcache ide_cd_mod cdrom
ata_generic libata scsi_mod dock usbhid hid ff_memless virtio_blk piix
e1000 floppy virtio_pci virtio_ring virtio ide_pci_generic ide_core
uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] CPU 0:
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Modules linked in: ipv6
nfs lockd nfs_acl sunrpc ipt_LOG xt_limit nf_conntrack_ipv4 xt_state
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables loop snd_pcm
snd_timer snd soundcore snd_page_alloc pcspkr serio_raw psmouse button
i2c_piix4 i2c_core joydev evdev ext3 jbd mbcache ide_cd_mod cdrom
ata_generic libata scsi_mod dock usbhid hid ff_memless virtio_blk piix
e1000 floppy virtio_pci virtio_ring virtio ide_pci_generic ide_core
uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Pid: 0, comm: swapper Not
tainted 2.6.26-2-amd64 #1
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RIP:
0010:[<ffffffff8021eb64>] [<ffffffff8021eb64>] native_safe_halt+0x2/0x3
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RSP: 0000:ffffffff80575f38
EFLAGS: 00000246
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RAX: ffffffff80575fd8 RBX:
0000000000000000 RCX: 0000000000000000
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RDX: 0000000000000000 RSI:
0000000000000001 RDI: ffffffff804fce70
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RBP: 0000000012e0d0a8 R08:
ffffffff8021eb64 R09: ffff81011d0624c0
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] R10: ffff810100f7d938 R11:
ffff81011d0624c0 R12: ffffffff80575ed8
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] R13: 0000000000000000 R14:
ffffffff8023ce76 R15: 00035f9663eb9df5
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] FS:
0000000000000000(0000) GS:ffffffff8053d000(0000) knlGS:0000000000000000
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] CS: 0010 DS: 0018 ES:
0018 CR0: 000000008005003b
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] CR2: 00000000e78eb730 CR3:
000000011086c000 CR4: 00000000000006e0
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Nov 18 22:16:23 fin01tn kernel: [1082291.622189]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Call Trace:
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] [<ffffffff8020b0d8>] ?
default_idle+0x2a/0x46
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] [<ffffffff8020ad04>] ?
cpu_idle+0x8e/0xb8
Nov 18 22:16:23 fin01tn kernel: [1082291.622189]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] BUG: soft lockup - CPU#1
stuck for 4096s! [swapper:0]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Modules linked in: ipv6
nfs lockd nfs_acl sunrpc ipt_LOG xt_limit nf_conntrack_ipv4 xt_sta
te nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables loop snd_pcm
snd_timer snd soundcore snd_page_alloc pcspkr serio_raw psmouse button
i2c_piix4 i2c_core joydev evdev ext3 jbd mbcache ide_cd_mod cdrom
ata_generic libata scsi_mod dock usbhid hid ff_memless virtio_blk piix
e1000 floppy virtio_pci virtio_ring virtio ide_pci_generic ide_core
uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] CPU 1:
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Modules linked in: ipv6
nfs lockd nfs_acl sunrpc ipt_LOG xt_limit nf_conntrack_ipv4 xt_state
nf_conntrack xt_tcpudp iptable_filter ip_tables x_tables loop snd_pcm
snd_timer snd soundcore snd_page_alloc pcspkr serio_raw psmouse button
i2c_piix4 i2c_core joydev evdev ext3 jbd mbcache ide_cd_mod cdrom
ata_generic libata scsi_mod dock usbhid hid ff_memless virtio_blk piix
e1000 floppy virtio_pci virtio_ring virtio ide_pci_generic ide_core
uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Pid: 0, comm: swapper Not
tainted 2.6.26-2-amd64 #1
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RIP:
0010:[<ffffffff8021eb64>] [<ffffffff8021eb64>] native_safe_halt+0x2/0x3
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RSP: 0000:ffff81011faa5f38
EFLAGS: 00000246
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RAX: ffff81011faa5fd8 RBX:
0000000000000000 RCX: 0000000000000000
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RDX: 0000000000000000 RSI:
0000000000000001 RDI: ffffffff804fce70
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] RBP: 0000000012e0d0aa R08:
ffffffff8021eb64 R09: ffff81010417b260
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] R10: ffff810100f7dc48 R11:
ffff81011d0c8790 R12: ffff81011faa5ed8
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] R13: 0000000000000000 R14:
ffffffff8023ce76 R15: 00035f9663f2dbe7
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] FS:
0000000000000000(0000) GS:ffff81011fa738c0(0000) knlGS:0000000000000000
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] CS: 0010 DS: 0018 ES:
0018 CR0: 000000008005003b
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] CR2: 00007f26a377cba0 CR3:
000000011086c000 CR4: 00000000000006e0
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] DR0: 0000000000000000 DR1:
0000000000000000 DR2: 0000000000000000
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] DR3: 0000000000000000 DR6:
00000000ffff0ff0 DR7: 0000000000000400
Nov 18 22:16:23 fin01tn kernel: [1082291.622189]
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] Call Trace:
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] [<ffffffff8020b0d8>] ?
default_idle+0x2a/0x46
Nov 18 22:16:23 fin01tn kernel: [1082291.622189] [<ffffffff8020ad04>] ?
cpu_idle+0x8e/0xb8
Nov 18 22:16:23 fin01tn kernel: [1082291.622189]
 
try latest 2.6.32 kernel/kvm from pvetest (we plan to release these packages to stable anyways)
 
Hi
as this is a six server cluster I'd appreciate your releasing it to stable before we do upgrades to test-kernels.

Except - is there a clean way to get only this very kernel from pvetest?

Thank you in advance
hk
 
Hi
as this is a six server cluster I'd appreciate your releasing it to stable before we do upgrades to test-kernels.

just what you prefer. the only reason why we did not release it this week was limited time of the team.

Except - is there a clean way to get only this very kernel from pvetest?

Thank you in advance
hk

just download the kernel with wget and install with dpkg -i and reboot.

btw, the plan is to release 1.7 next week.
 
Sorry, but just to be sure - only the kernel is needed no further packages in order to get a clear upgrade here without further trouble to be expected?

Thank you
hk
 
If you want just the kernel install it as described, if you want all new packages (which i recommend) just upgrade all. but I am not sure if the new kernel solves your problem but it makes sense to try.
 
Tom,

Sorry for hi-jacking the thread, but you're breaking the news on 1.7 for next week above ... :)

Do you have a list of release notes and/or features for 1.7 you can share with the community?
 
Sorry for hi-jacking the thread, but you're breaking the news on 1.7 for next week above ... :)

Sorry- there was a (very) serious bug in the latest 2.6.18 kernel - so we decided to wait until that is fixed.

Do you have a list of release notes and/or features for 1.7 you can share with the community?

just small fixes (kvm 0.13) and kernel updates - the release will have release notes :-)
 
Hi,

i have exactly the same bug :

Code:
Dec  1 01:30:34 debian06 kernel: [154553.100143] BUG: soft lockup - CPU#1 stuck for 4096s! [swapper:0]
Dec  1 01:30:34 debian06 kernel: [154553.100143] Modules linked in: iptable_filter ip_tables x_tables ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd soundcore snd_page_alloc psmouse button i2c_piix4 i2c_core serio_raw joydev evdev ext3 jbd mbcache ide_cd_mod cdrom usbhid hid ff_memless virtio_blk piix ide_pci_generic ide_core floppy ata_generic virtio_pci libata scsi_mod dock virtio_ring virtio uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Dec  1 01:30:34 debian06 kernel: [154553.100143] CPU 1:
Dec  1 01:30:34 debian06 kernel: [154553.100143] Modules linked in: iptable_filter ip_tables x_tables ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd soundcore snd_page_alloc psmouse button i2c_piix4 i2c_core serio_raw joydev evdev ext3 jbd mbcache ide_cd_mod cdrom usbhid hid ff_memless virtio_blk piix ide_pci_generic ide_core floppy ata_generic virtio_pci libata scsi_mod dock virtio_ring virtio uhci_hcd thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Dec  1 01:30:34 debian06 kernel: [154553.100143] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1
Dec  1 01:30:34 debian06 kernel: [154553.100143] RIP: 0010:[<ffffffff8021eb64>]  [<ffffffff8021eb64>] native_safe_halt+0x2/0x3
Dec  1 01:30:34 debian06 kernel: [154553.100143] RSP: 0018:ffff81031e4b1f38  EFLAGS: 00000246
Dec  1 01:30:34 debian06 kernel: [154553.100143] RAX: ffff81031e4b1fd8 RBX: 0000000000000000 RCX: 0000000000000000
Dec  1 01:30:34 debian06 kernel: [154553.100143] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff804fce70
Dec  1 01:30:34 debian06 kernel: [154553.100143] RBP: 0000000004f472c6 R08: ffffffff8021eb64 R09: ffff81031be94d60
Dec  1 08:13:10 debian06 kernel: imklog 3.18.6, log source = /proc/kmsg started.
on this host :

pveversion -v
pve-manager: 1.6-5 (pve-manager/1.6/5261)
running kernel: 2.6.32-3-pve
pve-kernel-2.6.32-3-pve: 2.6.32-18
qemu-server: 1.1-22
pve-firmware: 1.0-9
libpve-storage-perl: 1.0-14
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-8
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1dso1

the host is http://www.ovh.co.uk/products/mg_hybrid.xml
with RAID Hardware, my guest is on the SSD Raid 1 (formatted as EXT3).

with this guest:

debian06:~# uname -a
Linux debian06.localdomain 2.6.26-2-amd64 #1 SMP Thu Sep 16 15:56:38 UTC 2010 x86_64 GNU/Linux

ostype: l26
memory: 12288
sockets: 4
vlan2: virtio=06:67:60:33:97:8E
name: VM-XXXXXXXXXXXXXX
ide2: none,media=cdrom
bootdisk: virtio0
virtio0: local:122/vm-122-disk-1.raw
virtio1: local:122/vm-122-disk-2.raw
boot: cad
freeze: 0
cpuunits: 1000
acpi: 1
kvm: 1
onboot: 0
cores: 1
description: IP PRIVE 192.168.115.20

i have two others guest on this host, only 1 core, 512Mb RAM, they never freeze, looks like a problem with multiple vCPU. When the freeze happen there is no load on the server, it's really "random".

I didn't try 1.7 at this time...
 
try the latest 2.6.32 from 1.7.

i'll ... but i have to fix my windows 2003 problem with 1.7 first :(

to be clear :
- on my production server i have random freeze in debian but no problem with 2003
- on my test environnement with 1.7 i have freeze with 2003

a bit hard to fix no ? :)
 
any details about the win2003 problem? post 'pveversion -v' and VMID.conf file to the guest.