Hi.
We are running serveral machines (linux + windows) on proxmox for some time now. All of them are kvm machines. Afer upgrading the system (vserver), all machines went up, however the ones with Debian lenny had soft lockup issues after some time ( several minutes). There doesn't seem to be a problem on the proxmox host. Other machines (Debian squeeze and Windows) seem to run without problems.
Excerpt from syslog:
groupware:~# uname -a
Linux groupware 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011 x86_64 GNU/Linux
The machine is not usable, since these problems occur very frequently. Is has a some network activity, but not much load (nagios, ldap, apache).
The proxmox system is a Debian lenny+proxmox sources...
vserver:~# pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
It had the new pve kernel running at that time, the excerpt from the dmesg log:
vserver:~# cat /var/log/dmesg.0
...
total RAM covered: 12287M
...
gran_size: 128K chunk_size: 256M num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 128K chunk_size: 512M num_reg: 8 lose cover RAM: -256M
gran_size: 128K chunk_size: 1G num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 128K chunk_size: 2G num_reg: 8 lose cover RAM: -1G
gran_size: 256K chunk_size: 256K num_reg: 8 lose cover RAM: 9178624K
...
gran_size: 256K chunk_size: 256M num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 256K chunk_size: 512M num_reg: 8 lose cover RAM: -256M
gran_size: 256K chunk_size: 1G num_reg: 8 lose cover RAM: 0G
...
This error occurs serveral times for the new kernel. I restarted vserver with the old 1.8 kernel (2.6.32-4-pve) and had neither this errors nor a problem with any of the virtual machines.
I doubt, that this is a proxmox issue, but generally one with my hardware/(kvm?)kernel combination.
Proxmox runs on an Intel Xeon E5310 @ 1.60GHz and an Intel SE7320VP2 Server Board, 12GB RAM
Regards, C. Doering
We are running serveral machines (linux + windows) on proxmox for some time now. All of them are kvm machines. Afer upgrading the system (vserver), all machines went up, however the ones with Debian lenny had soft lockup issues after some time ( several minutes). There doesn't seem to be a problem on the proxmox host. Other machines (Debian squeeze and Windows) seem to run without problems.
Excerpt from syslog:
The kernel on this machine is a vanilla debian kernel....
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Clocksource tsc unstable (delta = -4398041517969 ns)
Sep 15 14:57:50 groupware kernel: [ 9414.142420] BUG: soft lockup - CPU#1 stuck for 246s! [swapper:0]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] CPU 1:
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RIP: 0010:[<ffffffff8021eb64>] [<ffffffff8021eb64>] native_safe_halt+0x2/0x3
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RSP: 0018:ffff81007fb9ff38 EFLAGS: 00000246
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RAX: ffff81007fb9ffd8 RBX: 0000000000000000 RCX: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffffffff804fce70
Sep 15 14:57:50 groupware kernel: [ 9414.142420] RBP: 0000000000144448 R08: ffffffff8021eb64 R09: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] R10: ffff81007e15bdc8 R11: ffff81007fb9fef8 R12: ffff81007fb9fed8
Sep 15 14:57:50 groupware kernel: [ 9414.142420] R13: 0000000000000000 R14: ffffffff8023cd72 R15: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] FS: 0000000000000000(0000) GS:ffff81007fb719c0(0000) knlGS:0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 15 14:57:50 groupware kernel: [ 9414.142420] CR2: 00007f61feb9d56c CR3: 000000007e44d000 CR4: 00000000000006e0
Sep 15 14:57:50 groupware kernel: [ 9414.142420] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 14:57:50 groupware kernel: [ 9414.142420] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 15 14:57:50 groupware kernel: [ 9414.142420]
Sep 15 14:57:50 groupware kernel: [ 9414.142420] Call Trace:
Sep 15 14:57:50 groupware kernel: [ 9414.142420] [<ffffffff8020b0d8>] ? default_idle+0x2a/0x46
Sep 15 14:57:50 groupware kernel: [ 9414.142420] [<ffffffff8020ad04>] ? cpu_idle+0x8e/0xb8
Sep 15 14:57:50 groupware kernel: [ 9414.142420]
...
Sep 15 15:21:50 groupware kernel: [11527.482897] BUG: soft lockup - CPU#0 stuck for 183s! [swapper:0]
Sep 15 15:21:50 groupware kernel: [11527.486467] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 15:21:50 groupware kernel: [11527.486467] CPU 0:
Sep 15 15:21:50 groupware kernel: [11527.486467] Modules linked in: ipv6 loop snd_pcsp virtio_net snd_pcm snd_timer snd psmouse soundcore snd_page_alloc serio_raw i2c_piix4 i2c_core button evdev joydev ext3 jbd mbcache usbhid hid ff_memless virtio_blk floppy piix ide_pci_generic ide_core virtio_pci virtio_ring virtio ata_generic uhci_hcd libata scsi_mod dock thermal processor fan thermal_sys [last unloaded: scsi_wait_scan]
Sep 15 15:21:50 groupware kernel: [11527.486467] Pid: 0, comm: swapper Not tainted 2.6.26-2-amd64 #1
Sep 15 15:21:50 groupware kernel: [11527.486467] RIP: 0010:[<ffffffff8023c9ec>] [<ffffffff8023c9ec>] run_timer_softirq+0x155/0x1e2
Sep 15 15:21:50 groupware kernel: [11527.486467] RSP: 0018:ffffffff805e4ef0 EFLAGS: 00000206
Sep 15 15:21:50 groupware kernel: [11527.486467] RAX: ffffffff805e4ef0 RBX: ffff810067c89e88 RCX: ffffffff8023cc34
Sep 15 15:21:50 groupware kernel: [11527.486467] RDX: ffffffff805e4ef0 RSI: 0000000000001e39 RDI: ffff810067c89e88
Sep 15 15:21:50 groupware kernel: [11527.486467] RBP: ffffffff805e4e70 R08: ffff810067c89ec0 R09: 0000000000000000
Sep 15 15:21:50 groupware kernel: [11527.486467] R10: 0000000000000009 R11: ffffffffa008d33f R12: ffffffff8020cd02
Sep 15 15:21:50 groupware kernel: [11527.486467] R13: ffffffff805e4e70 R14: ffff81006610a140 R15: 0000000000000286
Sep 15 15:21:50 groupware kernel: [11527.486467] FS: 0000000000000000(0000) GS:ffffffff8053d000(0000) knlGS:0000000000000000
Sep 15 15:21:50 groupware kernel: [11527.486467] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Sep 15 15:21:50 groupware kernel: [11527.486467] CR2: 00007feb57d0bde0 CR3: 000000007e107000 CR4: 00000000000006e0
Sep 15 15:21:50 groupware kernel: [11527.486467] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Sep 15 15:21:50 groupware kernel: [11527.486467] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Sep 15 15:21:50 groupware kernel: [11527.486467]
Sep 15 15:21:50 groupware kernel: [11527.486467] Call Trace:
Sep 15 15:21:50 groupware kernel: [11527.486467] <IRQ> [<ffffffff802393cd>] ? __do_softirq+0x5c/0xd1
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020d2dc>] ? call_softirq+0x1c/0x28
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020f3e8>] ? do_softirq+0x3c/0x81
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8023932b>] ? irq_exit+0x3f/0x85
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8021aaab>] ? smp_apic_timer_interrupt+0x8c/0xa4
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020b0ae>] ? default_idle+0x0/0x46
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020cd02>] ? apic_timer_interrupt+0x72/0x80
Sep 15 15:21:50 groupware kernel: [11527.486467] <EOI> [<ffffffff8021eb64>] ? native_safe_halt+0x2/0x3
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8021eb64>] ? native_safe_halt+0x2/0x3
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020b0d8>] ? default_idle+0x2a/0x46
Sep 15 15:21:50 groupware kernel: [11527.486467] [<ffffffff8020ad04>] ? cpu_idle+0x8e/0xb8
Sep 15 15:21:50 groupware kernel: [11527.486467]
...
groupware:~# uname -a
Linux groupware 2.6.26-2-amd64 #1 SMP Mon Jun 13 16:29:33 UTC 2011 x86_64 GNU/Linux
The machine is not usable, since these problems occur very frequently. Is has a some network activity, but not much load (nagios, ldap, apache).
The proxmox system is a Debian lenny+proxmox sources...
vserver:~# pveversion -v
pve-manager: 1.9-24 (pve-manager/1.9/6542)
running kernel: 2.6.32-6-pve
proxmox-ve-2.6.32: 1.9-43
pve-kernel-2.6.32-4-pve: 2.6.32-33
pve-kernel-2.6.32-6-pve: 2.6.32-43
qemu-server: 1.1-32
pve-firmware: 1.0-13
libpve-storage-perl: 1.0-19
vncterm: 0.9-2
vzctl: 3.0.28-1pve5
vzdump: 1.2-15
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.15.0-1
ksm-control-daemon: 1.0-6
It had the new pve kernel running at that time, the excerpt from the dmesg log:
vserver:~# cat /var/log/dmesg.0
...
total RAM covered: 12287M
...
gran_size: 128K chunk_size: 256M num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 128K chunk_size: 512M num_reg: 8 lose cover RAM: -256M
gran_size: 128K chunk_size: 1G num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 128K chunk_size: 2G num_reg: 8 lose cover RAM: -1G
gran_size: 256K chunk_size: 256K num_reg: 8 lose cover RAM: 9178624K
...
gran_size: 256K chunk_size: 256M num_reg: 8 lose cover RAM: 0G
*BAD*gran_size: 256K chunk_size: 512M num_reg: 8 lose cover RAM: -256M
gran_size: 256K chunk_size: 1G num_reg: 8 lose cover RAM: 0G
...
This error occurs serveral times for the new kernel. I restarted vserver with the old 1.8 kernel (2.6.32-4-pve) and had neither this errors nor a problem with any of the virtual machines.
I doubt, that this is a proxmox issue, but generally one with my hardware/(kvm?)kernel combination.
Proxmox runs on an Intel Xeon E5310 @ 1.60GHz and an Intel SE7320VP2 Server Board, 12GB RAM
Regards, C. Doering