KVM: virtio network stability with linux guests

hverbeek

Member
Feb 14, 2011
40
1
8
I am running a PVE 1.8 cluster of three hosts, only linux KVM guests are deployed. The guests are running Debian 5 Lenny 64bit with virtio network cards. Host runs kernel 2.6.32-4-pve, guests run kernel 2.6.32-bpo.5-amd64.

Three of the guests are configured as PHP5 application servers; they receive requests on :80 from a reverse proxy (which are also KVM guests on the same cluster). The application is pretty bandwidth-heavy as we're running a file-transfer-solution; sustained data rates of 30-60Mbit/s for long parts of the day is normal.

Everything works great, but about once a week, one of the appservers becomes unresponsive (a different one each time):
  • PING works
  • Telnet to port 80 works (connection is established) but no response
  • SSH login takes forever, but finally login is established
  • console login takes very very very long
From the viewpoint of the end-user, the system is dead.

Once logged in, the shell is pretty responsive. No IO wait, load is low. It is possible to ping out, resolve DNS, etc. I notice that nothing new is written into /var/log/syslog from the moment the system "hangs".

dmesg shows a couple of segfaults:
Code:
[457508.135737] php5[16098] general protection ip:7fd417ba019f sp:7fff8e4c40f0 error:0 in memcache.so[7fd417b99000+14000]
(The appserver probably tries to connect to remote memcache instances, but dies in the process.)

After a reboot, all is good again (sigh...).

At the moment, it feels to me like a network driver issue, not a user-land issue. I have replaced virtio with e1000 for now and will observe for a week or so, but that's just fishing in the dark... It kills me that I don't really see what's going on/wrong! Also, I cannot (yet) reliably reproduce the issue.

Does anyone have similar experiences? Any suggestions where to look next time?
 
post the output of pveversion -v

virtio on lenny KVM guests are known to be unstable in the default 2.6.26, but you already using a newer one.
 
post the output of pveversion -v.

Sorry, forgot that...
Code:
pve-manager: 1.8-15 (pve-manager/1.8/5754)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-32
pve-kernel-2.6.32-4-pve: 2.6.32-32
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5
 
this is ok, you are using the latest stable.
 
Hi

I'm running Debian 5 2.6.26.2-686 as a guest on a proxmox 1.8 box.

I have this setup

# pveversion -v
pve-manager: 1.8-15 (pve-manager/1.8/5754)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.8-32
pve-kernel-2.6.32-4-pve: 2.6.32-32
pve-kernel-2.6.18-2-pve: 2.6.18-5
qemu-server: 1.1-30
pve-firmware: 1.0-11
libpve-storage-perl: 1.0-17
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-11
vzprocps: 2.0.11-2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.14.0-3
ksm-control-daemon: 1.0-5

According to the post above this guest kernel has some kvm virtio-lan instabilities ?

Apart from using e1000 network driver instead, are there any other ways I can eradicate any network issues ?