Re: New Proxmox VE Kernels (2.6.18 - 2.6.24 - 2.6.32)
I'm facing a strange issue with 2.6.32 kernel
I have two network interfaces,
one bridged to the public net (eth3)
and the other one to the private (eth2)
Code:
pve:~# mii-tool
eth2: negotiated 1000baseT-FD flow-control, link ok
eth3: negotiated 100baseTx-FD, link ok
I've deployed several VMs (Windows, RHEL/Fedora, Vyatta and so on)
When the network traffic is generated by a single VM everything works well
The problem appears when the network traffic is involving
two or three VMs concurrently
After a while, randomly, traffic stops flowing,
a transceiver seems to "disappear"
Code:
pve:~# mii-tool
eth2: negotiated 1000baseT-FD flow-control, link ok
No MII transceiver present!.
and I can find this in dmesg
Code:
------------[ cut here ]------------
WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x297/0x2b0()
Hardware name: Unknow
NETDEV WATCHDOG: eth3 (r8169): transmit queue 0 timed out
Modules linked in: tun kvm_amd kvm ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi bridge stp snd_pcm snd_timer snd soundcore snd_page_alloc serio_raw amd64_edac_mod edac_core edac_mce_amd pcspkr k8temp i2c_piix4 shpchp raid10 raid456 async_raid6_recov async_pq raid6_pq async_xor xor async_memcpy async_tx raid1 raid0 multipath linear usbhid r8169 mii [last unloaded: scsi_wait_scan]
Pid: 2547, comm: kvm Not tainted 2.6.32-1-pve #1
Call Trace:
<IRQ> [<ffffffff814848e7>] ? dev_watchdog+0x297/0x2b0
[<ffffffff81060fa8>] warn_slowpath_common+0x78/0xd0
[<ffffffff81061084>] warn_slowpath_fmt+0x64/0x70
[<ffffffff8107efa1>] ? autoremove_wake_function+0x11/0x40
[<ffffffff81046e4a>] ? __wake_up_common+0x5a/0x90
[<ffffffff812a197a>] ? strlcpy+0x4a/0x60
[<ffffffff81469a83>] ? netdev_drivername+0x43/0x50
[<ffffffff814848e7>] dev_watchdog+0x297/0x2b0
[<ffffffff8107aa18>] ? insert_work+0x98/0xb0
[<ffffffff81484650>] ? dev_watchdog+0x0/0x2b0
[<ffffffff810715f1>] run_timer_softirq+0x191/0x310
[<ffffffff810685fa>] __do_softirq+0xfa/0x1d0
[<ffffffff810131ac>] call_softirq+0x1c/0x30
[<ffffffff81014be5>] do_softirq+0x65/0xa0
[<ffffffff81068395>] irq_exit+0x75/0xa0
[<ffffffff81014153>] do_IRQ+0x73/0xf0
[<ffffffff810129d3>] ret_from_intr+0x0/0x11
<EOI>
---[ end trace 304a9c063950565b ]---
r8169: eth3: link up
r8169: eth3: link up
r8169: eth3: link up
then the only way to make eth3 work again, is to reboot the system
My setup:
Code:
pve:~# pveversion -v
pve-manager: 1.5-1 (pve-manager/1.5/4561)
running kernel: 2.6.32-1-pve
proxmox-ve-2.6.32: 1.5-2
pve-kernel-2.6.32-1-pve: 2.6.32-2
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-10
pve-firmware: 1.0-3
libpve-storage-perl: 1.0-6
vncterm: 0.9-2
vzctl: 3.0.23-1pve4
vzdump: 1.2-5
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.11.1-1
ksm-control-daemon: 1.0-2
Code:
AMD Athlon(tm) Dual Core Processor 5050e
...
02:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 02)
With 2.6.24 the situation is similar, but I don't have any trace in dmesg
I've tried several BIOS setting, but nothing changes, except when enabling
Cool&Quiet together with powernowd:
this combination makes the machine crash very soon after startup,
so I've disabled C&Q
I don't think this is a hardware related problem, because installing Windows the system is rock solid...
I know eth controllers are not excellent,
but this is a test env for evaluation, everything on the pcb is integrated
and cannot be changed
Maybe a kernel bug?
Should I open a bug upstream, or not?
Any advice is appreciated...
Thanks in advance