Debian OS based KVM freezes in Proxmox 2.1

webservio · Sep 4, 2012

We experienced this problem with Proxmox 1.xx and decided to give it a try in Proxmox 2.1. However same issue as the KVM VPS freezes and you have to stop and and start again. I believe the reason could be since it is a anti-spam filtering system it is CPU intensive. we allocated 2 cpus and 2 cores but same result. This same system worked in experimental XEN (open source) as well as the free VMWare. I am suspecting that there are parameters set up (in Proxmox) that might not be available in GUI interface. Can someone give us some ideas please? This is getting to be very important as we prefer Proxmox however this particular problem has not gone away.

dietmar · Sep 4, 2012

Never observer such behavior. How to reproduce that bug?

webservio · Sep 4, 2012

Hi Dietmar,

Well I would be more than happy to share all the information needed as we truly love Proxmox

. It normally happens when this VPS based filtering is working with CPU intensive applications running (MailScanner/ SpamAssasin). In Proxmox 1.xx When the VPS would freeze we would see CPU stuck in the VNC console. In the Proxmox version 1.xx we increased the CPU units to 500000 which was the maximum and it helped a little but in 2.1 there is no parameter. What I am suspecting is the VPS system needs more of the CPU sliced resources? If so how do we increase that in 2.1?

Regards,

dietmar · Sep 4, 2012

I am confused now. Do you talk about KVM or Openvz?

webservio · Sep 4, 2012

dietmar said:
I am confused now. Do you talk about KVM or Openvz?

Sorry about the confusion. I am solely talking about KVM only (NOT OPENVZ).

Regards,

webservio · Sep 4, 2012

Per further investigation found similar entries (see below) in /var/log/kern.log in the faulty VPS system (KVM). Seems like the application MailScanner demands for CPU.

A sample message from /var/log/kern.log:

Sep 5 01:28:05 localhost kernel: [68163.499139] Code: dc c6 ff 66 90 f6 c7 02 75 0e 48 83 3d 36 68 1e 00 00 75 12 0f 0b eb fe 48 83 3d 28 68 1e 00 00 75 04 0f 0b eb fe 48 89 df 57 9d <0f> 1f 44 00 00 5e 5b c3 83 c8 ff f0 0f c1 07 ff c8 ba 01 00 00
Sep 5 01:28:05 localhost kernel: [68163.499139] Call Trace:
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff81049b95>] ? do_wait+0x1e7/0x225
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff81049c68>] ? sys_wait4+0x95/0xb0
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff8104830c>] ? child_wait_callback+0x0/0x48
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff81009c82>] ? system_call_fastpath+0x16/0x1b
Sep 5 01:30:34 localhost kernel: [68312.258989] BUG: soft lockup - CPU#0 stuck for 138s! [MailScanner:3728]

udo · Sep 4, 2012

webservio said:
Sorry about the confusion. I am solely talking about KVM only (NOT OPENVZ).

Regards,

Hi,
which client-os do you use?
Can you also post the VMID.conf?
If the VM freezed, are there any hints in syslog/messages of the host?

Udo

webservio · Sep 4, 2012

udo said:
Hi,
which client-os do you use?
Can you also post the VMID.conf?
If the VM freezed, are there any hints in syslog/messages of the host?

Udo

VMID.conf:

#IP address is 192.168.17.104
bootdisk: ide0
cores: 4
ide0: local:103/vm-103-disk-1.qcow2
ide2: none,media=cdrom
memory: 4096
name: node5.spameater.com
net0: rtl8139=32:98:8C:9F:5F:85,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 2

Here is a snapshot of syslog:

03:root@pam:
Sep 4 12:42:07 vpshost25 kernel: vmbr0: port 3(tap103i0) entering disabled state
Sep 4 12:42:07 vpshost25 kernel: vmbr0: port 3(tap103i0) entering disabled state
Sep 4 12:42:08 vpshost25 pvedaemon[51303]: <root@pam> end task UPID:vpshost25:0000CAA4:135B4D59:50462F5F:qmstop:103:ro
ot@pam: OK
Sep 4 12:42:17 vpshost25 pvedaemon[51930]: start VM 103: UPID:vpshost25:0000CADA:135B5123:50462F69:qmstart:103:root@pa
m:
Sep 4 12:42:17 vpshost25 pvedaemon[51875]: <root@pam> starting task UPID:vpshost25:0000CADA:135B5123:50462F69:qmstart:
103:root@pam:
Sep 4 12:42:17 vpshost25 kernel: device tap103i0 entered promiscuous mode
Sep 4 12:42:17 vpshost25 kernel: vmbr0: port 3(tap103i0) entering forwarding state
Sep 4 12:42:17 vpshost25 pvedaemon[51875]: <root@pam> end task UPID:vpshost25:0000CADA:135B5123:50462F69:qmstart:103:r
oot@pam: OK
Sep 4 12:42:20 vpshost25 pvedaemon[51963]: starting vnc proxy UPID:vpshost25:0000CAFB:135B524D:50462F6C:vncproxy:103:r
oot@pam:
Sep 4 12:42:20 vpshost25 pvedaemon[51303]: <root@pam> starting task UPID:vpshost25:0000CAFB:135B524D:50462F6C:vncproxy
:103:root@pam:
Sep 4 12:42:23 vpshost25 ntpd[2045]: Listen normally on 107 tap103i0 fe80::ac2d:3aff:fef1:7e37 UDP 123
Sep 4 12:42:23 vpshost25 ntpd[2045]: Deleting interface #106 tap103i0, fe80::a89e:46ff:fefc:1827#123, interface stats:
received=0, sent=0, dropped=0, active_time=17700 secs
Sep 4 12:42:28 vpshost25 kernel: tap103i0: no IPv6 routers present
Sep 4 12:42:28 vpshost25 pvedaemon[51875]: <root@pam> successful auth for user 'root@pam'
Sep 4 12:50:14 vpshost25 pvedaemon[54609]: starting vnc proxy UPID:vpshost25:0000D551:135C0BB4:50463146:vncproxy:103:r
oot@pam:
Sep 4 12:50:14 vpshost25 pvedaemon[51875]: <root@pam> starting task UPID:vpshost25:0000D551:135C0BB4:50463146:vncproxy
:103:root@pam:
Sep 4 12:50:14 vpshost25 pvedaemon[51303]: <root@pam> end task UPID:vpshost25:0000CAFB:135B524D:50462F6C:vncproxy:103:
root@pam: OK
Sep 4 12:50:21 vpshost25 pvedaemon[50866]: <root@pam> successful auth for user 'root@pam'
Sep 4 12:51:06 vpshost25 pvedaemon[51875]: <root@pam> successful auth for user 'root@pam'
Sep 4 12:51:15 vpshost25 pvedaemon[50866]: <root@pam> successful auth for user 'root@pam'
Sep 4 13:00:56 vpshost25 pvedaemon[2276]: worker 50866 finished
Sep 4 13:00:56 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:00:56 vpshost25 pvedaemon[2276]: worker 58179 started
Sep 4 13:01:24 vpshost25 pvedaemon[51875]: <root@pam> end task UPID:vpshost25:0000D551:135C0BB4:50463146:vncproxy:103:
root@pam: OK
Sep 4 13:03:20 vpshost25 pvedaemon[2276]: worker 51303 finished
Sep 4 13:03:20 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:03:20 vpshost25 pvedaemon[2276]: worker 58944 started
Sep 4 13:04:59 vpshost25 pvedaemon[2276]: worker 51875 finished
Sep 4 13:04:59 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:04:59 vpshost25 pvedaemon[2276]: worker 59496 started
Sep 4 13:06:06 vpshost25 pvedaemon[58944]: <root@pam> successful auth for user 'root@pam'
Sep 4 13:17:01 vpshost25 /USR/SBIN/CRON[63460]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 4 13:21:06 vpshost25 pvedaemon[59496]: <root@pam> successful auth for user 'root@pam'
Sep 4 13:24:26 vpshost25 pvedaemon[2276]: worker 58179 finished
Sep 4 13:24:26 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:24:26 vpshost25 pvedaemon[2276]: worker 65910 started
root@vpshost25:/var/log#

dietmar · Sep 5, 2012

And what is the output of

# pveversion -v

udo · Sep 5, 2012

webservio said:
VMID.conf:

#IP address is 192.168.17.104
bootdisk: ide0
cores: 4
ide0: local:103/vm-103-disk-1.qcow2
ide2: none,media=cdrom
memory: 4096
name: node5.spameater.com
net0: rtl8139=32:98:8C:9F:5F:85,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 2

...

Hi,
your issue looks cpu-related, but perhaps it's also IO related.
So you can try to convert the VM-disk from qcow2 to raw.

What is the output of

Code:

pveperf /var/lib/vz

?

You don't answer the question about the OS-Version of the client!

BTW, with linux-clients you should use virtio for the nic instead of rtl8139 (IMHO makes only sense with bsd).

Udo

spirit · Sep 5, 2012

udo said:
Hi,
your issue looks cpu-related, but perhaps it's also IO related.
So you can try to convert the VM-disk from qcow2 to raw.

What is the output of

Code:

pveperf /var/lib/vz

?

You don't answer the question about the OS-Version of the client!

BTW, with linux-clients you should use virtio for the nic instead of rtl8139 (IMHO makes only sense with bsd).

Udo

you can also try with virtio disk.

webservio · Sep 11, 2012

Here we go:

pveperf /var/lib/vz
CPU BOGOMIPS: 31920.61
REGEX/SECOND: 682297
HD SIZE: 1707.78 GB (/dev/mapper/pve-data)
BUFFERED READS: 103.19 MB/sec
AVERAGE SEEK TIME: 24.41 ms
FSYNCS/SECOND: 1618.57
DNS EXT: 61.89 ms
DNS INT: 26.46 ms

pveversion -v
pve-manager: 2.1-12 (pve-manager/2.1/be112d89)
running kernel: 2.6.32-13-pve
proxmox-ve-2.6.32: 2.1-72
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-13-pve: 2.6.32-72
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-45
pve-firmware: 1.0-17
libpve-common-perl: 1.0-28
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-27
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-6
ksm-control-daemon: 1.1-1

OS-Version of the client

uname -a
Linux mailcleaner 2.6.33 #1 SMP Fri Feb 26 14:34:37 CET 2010 x86_64 GNU/Linux

webservio · Sep 15, 2012

I have done more investigation on this and I hope that I finally found the cause of these issues. I believe the issue was the kernel version. I did an upgrade to the following (see below) and seems like lots of bugs have been fixed:

pveversion -v
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1

As I remarked before our team has been using Proxmox since the very first version successfully and this was one of the rare issues we have had. Meaning so far to our technical opinion Proxmox is a superior environment. Thanks so much for all the great work.

Regards,

Mishi

adamb · Oct 1, 2012

I am also seeing an identical issue with a debain lenny VM. Locks up at certain points. Just like the OP this VM is used for scanning network traffic so it is very cpu intensive. I am also on the same kernel as the OP. Has to be something going on.

tom · Oct 1, 2012

there was problems with Lenny 2.6.26 kernel and virtio - solved by using a 2.6.32 kernel from lenny-backports.

Search

Search

Debian OS based KVM freezes in Proxmox 2.1

webservio

Renowned Member

dietmar

Proxmox Staff Member

webservio

Renowned Member

dietmar

Proxmox Staff Member

webservio

Renowned Member

webservio

Renowned Member

udo

Distinguished Member

webservio

Renowned Member

dietmar

Proxmox Staff Member

udo

Distinguished Member

spirit

Distinguished Member

webservio

Renowned Member

webservio

Renowned Member

adamb

Famous Member

tom

Proxmox Staff Member