Debian OS based KVM freezes in Proxmox 2.1

webservio

Renowned Member
May 13, 2009
106
1
83
We experienced this problem with Proxmox 1.xx and decided to give it a try in Proxmox 2.1. However same issue as the KVM VPS freezes and you have to stop and and start again. I believe the reason could be since it is a anti-spam filtering system it is CPU intensive. we allocated 2 cpus and 2 cores but same result. This same system worked in experimental XEN (open source) as well as the free VMWare. I am suspecting that there are parameters set up (in Proxmox) that might not be available in GUI interface. Can someone give us some ideas please? This is getting to be very important as we prefer Proxmox however this particular problem has not gone away.
 
Hi Dietmar,

Well I would be more than happy to share all the information needed as we truly love Proxmox :). It normally happens when this VPS based filtering is working with CPU intensive applications running (MailScanner/ SpamAssasin). In Proxmox 1.xx When the VPS would freeze we would see CPU stuck in the VNC console. In the Proxmox version 1.xx we increased the CPU units to 500000 which was the maximum and it helped a little but in 2.1 there is no parameter. What I am suspecting is the VPS system needs more of the CPU sliced resources? If so how do we increase that in 2.1?

Regards,
 
Per further investigation found similar entries (see below) in /var/log/kern.log in the faulty VPS system (KVM). Seems like the application MailScanner demands for CPU.


A sample message from /var/log/kern.log:

Sep 5 01:28:05 localhost kernel: [68163.499139] Code: dc c6 ff 66 90 f6 c7 02 75 0e 48 83 3d 36 68 1e 00 00 75 12 0f 0b eb fe 48 83 3d 28 68 1e 00 00 75 04 0f 0b eb fe 48 89 df 57 9d <0f> 1f 44 00 00 5e 5b c3 83 c8 ff f0 0f c1 07 ff c8 ba 01 00 00
Sep 5 01:28:05 localhost kernel: [68163.499139] Call Trace:
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff81049b95>] ? do_wait+0x1e7/0x225
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff81049c68>] ? sys_wait4+0x95/0xb0
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff8104830c>] ? child_wait_callback+0x0/0x48
Sep 5 01:28:05 localhost kernel: [68163.499139] [<ffffffff81009c82>] ? system_call_fastpath+0x16/0x1b
Sep 5 01:30:34 localhost kernel: [68312.258989] BUG: soft lockup - CPU#0 stuck for 138s! [MailScanner:3728]
 
Hi,
which client-os do you use?
Can you also post the VMID.conf?
If the VM freezed, are there any hints in syslog/messages of the host?

Udo

VMID.conf:

#IP address is 192.168.17.104
bootdisk: ide0
cores: 4
ide0: local:103/vm-103-disk-1.qcow2
ide2: none,media=cdrom
memory: 4096
name: node5.spameater.com
net0: rtl8139=32:98:8C:9F:5F:85,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 2


Here is a snapshot of syslog:

03:root@pam:
Sep 4 12:42:07 vpshost25 kernel: vmbr0: port 3(tap103i0) entering disabled state
Sep 4 12:42:07 vpshost25 kernel: vmbr0: port 3(tap103i0) entering disabled state
Sep 4 12:42:08 vpshost25 pvedaemon[51303]: <root@pam> end task UPID:vpshost25:0000CAA4:135B4D59:50462F5F:qmstop:103:ro
ot@pam: OK
Sep 4 12:42:17 vpshost25 pvedaemon[51930]: start VM 103: UPID:vpshost25:0000CADA:135B5123:50462F69:qmstart:103:root@pa
m:
Sep 4 12:42:17 vpshost25 pvedaemon[51875]: <root@pam> starting task UPID:vpshost25:0000CADA:135B5123:50462F69:qmstart:
103:root@pam:
Sep 4 12:42:17 vpshost25 kernel: device tap103i0 entered promiscuous mode
Sep 4 12:42:17 vpshost25 kernel: vmbr0: port 3(tap103i0) entering forwarding state
Sep 4 12:42:17 vpshost25 pvedaemon[51875]: <root@pam> end task UPID:vpshost25:0000CADA:135B5123:50462F69:qmstart:103:r
oot@pam: OK
Sep 4 12:42:20 vpshost25 pvedaemon[51963]: starting vnc proxy UPID:vpshost25:0000CAFB:135B524D:50462F6C:vncproxy:103:r
oot@pam:
Sep 4 12:42:20 vpshost25 pvedaemon[51303]: <root@pam> starting task UPID:vpshost25:0000CAFB:135B524D:50462F6C:vncproxy
:103:root@pam:
Sep 4 12:42:23 vpshost25 ntpd[2045]: Listen normally on 107 tap103i0 fe80::ac2d:3aff:fef1:7e37 UDP 123
Sep 4 12:42:23 vpshost25 ntpd[2045]: Deleting interface #106 tap103i0, fe80::a89e:46ff:fefc:1827#123, interface stats:
received=0, sent=0, dropped=0, active_time=17700 secs
Sep 4 12:42:28 vpshost25 kernel: tap103i0: no IPv6 routers present
Sep 4 12:42:28 vpshost25 pvedaemon[51875]: <root@pam> successful auth for user 'root@pam'
Sep 4 12:50:14 vpshost25 pvedaemon[54609]: starting vnc proxy UPID:vpshost25:0000D551:135C0BB4:50463146:vncproxy:103:r
oot@pam:
Sep 4 12:50:14 vpshost25 pvedaemon[51875]: <root@pam> starting task UPID:vpshost25:0000D551:135C0BB4:50463146:vncproxy
:103:root@pam:
Sep 4 12:50:14 vpshost25 pvedaemon[51303]: <root@pam> end task UPID:vpshost25:0000CAFB:135B524D:50462F6C:vncproxy:103:
root@pam: OK
Sep 4 12:50:21 vpshost25 pvedaemon[50866]: <root@pam> successful auth for user 'root@pam'
Sep 4 12:51:06 vpshost25 pvedaemon[51875]: <root@pam> successful auth for user 'root@pam'
Sep 4 12:51:15 vpshost25 pvedaemon[50866]: <root@pam> successful auth for user 'root@pam'
Sep 4 13:00:56 vpshost25 pvedaemon[2276]: worker 50866 finished
Sep 4 13:00:56 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:00:56 vpshost25 pvedaemon[2276]: worker 58179 started
Sep 4 13:01:24 vpshost25 pvedaemon[51875]: <root@pam> end task UPID:vpshost25:0000D551:135C0BB4:50463146:vncproxy:103:
root@pam: OK
Sep 4 13:03:20 vpshost25 pvedaemon[2276]: worker 51303 finished
Sep 4 13:03:20 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:03:20 vpshost25 pvedaemon[2276]: worker 58944 started
Sep 4 13:04:59 vpshost25 pvedaemon[2276]: worker 51875 finished
Sep 4 13:04:59 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:04:59 vpshost25 pvedaemon[2276]: worker 59496 started
Sep 4 13:06:06 vpshost25 pvedaemon[58944]: <root@pam> successful auth for user 'root@pam'
Sep 4 13:17:01 vpshost25 /USR/SBIN/CRON[63460]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Sep 4 13:21:06 vpshost25 pvedaemon[59496]: <root@pam> successful auth for user 'root@pam'
Sep 4 13:24:26 vpshost25 pvedaemon[2276]: worker 58179 finished
Sep 4 13:24:26 vpshost25 pvedaemon[2276]: starting 1 worker(s)
Sep 4 13:24:26 vpshost25 pvedaemon[2276]: worker 65910 started
root@vpshost25:/var/log#
 
VMID.conf:

#IP address is 192.168.17.104
bootdisk: ide0
cores: 4
ide0: local:103/vm-103-disk-1.qcow2
ide2: none,media=cdrom
memory: 4096
name: node5.spameater.com
net0: rtl8139=32:98:8C:9F:5F:85,bridge=vmbr0
onboot: 1
ostype: l26
sockets: 2

...
Hi,
your issue looks cpu-related, but perhaps it's also IO related.
So you can try to convert the VM-disk from qcow2 to raw.

What is the output of
Code:
pveperf /var/lib/vz
?

You don't answer the question about the OS-Version of the client!

BTW, with linux-clients you should use virtio for the nic instead of rtl8139 (IMHO makes only sense with bsd).

Udo
 
Hi,
your issue looks cpu-related, but perhaps it's also IO related.
So you can try to convert the VM-disk from qcow2 to raw.

What is the output of
Code:
pveperf /var/lib/vz
?

You don't answer the question about the OS-Version of the client!

BTW, with linux-clients you should use virtio for the nic instead of rtl8139 (IMHO makes only sense with bsd).

Udo

you can also try with virtio disk.
 
Here we go:

pveperf /var/lib/vz
CPU BOGOMIPS: 31920.61
REGEX/SECOND: 682297
HD SIZE: 1707.78 GB (/dev/mapper/pve-data)
BUFFERED READS: 103.19 MB/sec
AVERAGE SEEK TIME: 24.41 ms
FSYNCS/SECOND: 1618.57
DNS EXT: 61.89 ms
DNS INT: 26.46 ms

pveversion -v
pve-manager: 2.1-12 (pve-manager/2.1/be112d89)
running kernel: 2.6.32-13-pve
proxmox-ve-2.6.32: 2.1-72
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-13-pve: 2.6.32-72
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-2
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-45
pve-firmware: 1.0-17
libpve-common-perl: 1.0-28
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-27
vncterm: 1.0-2
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-6
ksm-control-daemon: 1.1-1


OS-Version of the client

uname -a
Linux mailcleaner 2.6.33 #1 SMP Fri Feb 26 14:34:37 CET 2010 x86_64 GNU/Linux
 
I have done more investigation on this and I hope that I finally found the cause of these issues. I believe the issue was the kernel version. I did an upgrade to the following (see below) and seems like lots of bugs have been fixed:

pveversion -v
pve-manager: 2.1-14 (pve-manager/2.1/f32f3f46)
running kernel: 2.6.32-14-pve
proxmox-ve-2.6.32: 2.1-74
pve-kernel-2.6.32-11-pve: 2.6.32-66
pve-kernel-2.6.32-14-pve: 2.6.32-74
lvm2: 2.02.95-1pve2
clvm: 2.02.95-1pve2
corosync-pve: 1.4.3-1
openais-pve: 1.1.4-2
libqb: 0.10.1-2
redhat-cluster-pve: 3.1.92-3
resource-agents-pve: 3.9.2-3
fence-agents-pve: 3.1.8-1
pve-cluster: 1.0-27
qemu-server: 2.0-49
pve-firmware: 1.0-18
libpve-common-perl: 1.0-30
libpve-access-control: 1.0-24
libpve-storage-perl: 2.0-31
vncterm: 1.0-3
vzctl: 3.0.30-2pve5
vzprocps: 2.0.11-2
vzquota: 3.0.12-3
pve-qemu-kvm: 1.1-8
ksm-control-daemon: 1.1-1

As I remarked before our team has been using Proxmox since the very first version successfully and this was one of the rare issues we have had. Meaning so far to our technical opinion Proxmox is a superior environment. Thanks so much for all the great work.

Regards,

Mishi
 
I am also seeing an identical issue with a debain lenny VM. Locks up at certain points. Just like the OP this VM is used for scanning network traffic so it is very cpu intensive. I am also on the same kernel as the OP. Has to be something going on.
 
there was problems with Lenny 2.6.26 kernel and virtio - solved by using a 2.6.32 kernel from lenny-backports.
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!