e1000e driver update - query on recommended route ?

fortechitsolutions

Renowned Member
Jun 4, 2008
447
51
93
Hi,

I'm hoping to get a bit of guidance as to the best / recommended way to achieve this goal (if possible): I have 2 ProxmoxVE servers in a small cluster, in pre-deployment. The servers are identical 1u intel boxes, single-socket quadcore and dual-on-board-gig-ether NICs. One nic is e1000 chipset, while the other is e1000e.

I believe the stock e1000e driver in Debian Etch is slightly flakey ( as is the case with many distros! :) but there is a newer version of the e1000e driver available (I believe it is available in "etch and a half" interm release also?)

On CentOS / RHEL boxes I've managed that need good / working e1000e, the 'simple fix' is,

- install kernel headers RPM via yum
- ensure GCC / compiler tools are present
- build the kernel module for e1000e using the stock make; make install cycle on the fresh (Dec-2008) e1000e sources - and then you are laughing.

In particular, jumbo frame support is 'good' on the latest driver, while it is problematic on the earlier e1000e, from what I've seen / found.

Alas, my experience in doing this on Debian .. is rather more limited, to put it modestly. I know we have a tweaked kernel for ProxMox when compared to a totally stock Debian Etch install (?) (OpenVZ support, PVM support, etc). and I'm relutant to break things by barging ahead.

If a brief pointer could be made, on how to integrate new e1000e driver / module -- without totally messing up the proxmox system -- certainly such help would be greatly appreciated.

Many thanks,


--Tim Chipman
 
Hi,

Just a footnote to mention,

I downloaded the kernel source as hinted; also installed make and gcc; and then had no problem building the new e1000e driver and installing the updated kernel module using the standard "make install" for this source tarball. So all very nice.

Many thanks for the help,

--Tim


For reference: Capture of commands is given below--------------

Confirm that gcc, make are not present; then install
111 which gcc
112 which make
113 apt-get install make gcc

Get sources required and prep for install /build:
114 cd /opt/src
115 ls -la
116 mkdir e1000e
117 cd e1000e/
119 wget http://download.proxmox.com/debian/...4/pve-headers-2.6.24-1-pve_2.6.24-4_amd64.deb

Install kernel headers
122 dpkg -i pve-headers-2.6.24-1-pve_2.6.24-4_amd64.deb

Get e1000e source, latest version from ~dec-08
126 wget http://internap.dl.sourceforge.net/sourceforge/e1000/e1000e-0.5.8.2.tar.gz
127 gzip -d e1000e-0.5.8.2.tar.gz
128 tar xfv e1000e-0.5.8.2.tar
129 cd e1000e-0.5.8.2/src/
130 ls -la
131 make install

Confirm that new kernel module exists where indicated, with appropriate timestamp on file
132 ls -la /lib/modules/2.6.24-1-pve/kernel/drivers/net/e1000e/e1000e.ko
133 date

Confirm we have any e1000 modules presently in use
134 lsmod | grep -i e1000

remove current (older) module and install new module
135 rmmod e1000e; modprobe e1000e

verify in dmesg the new e1000e driver was announced / is in use
136 dmesg

confirm nic still appears to be configured, do ping connectivity test
137 ifconfig -a | more
138 ping www.slashdot.org
 
Hi,

Here we go:

------paste--------


pvm2:~# modinfo e1000e
filename: /lib/modules/2.6.24-1-pve/kernel/drivers/net/e1000e/e1000e.ko
author: Intel Corporation, <linux.nics@intel.com>
description: Intel(R) PRO/1000 Network Driver
license: GPL
version: 0.5.8.2-NAPI
vermagic: 2.6.24-1-pve SMP preempt mod_unload
depends:
alias: pci:v00008086d0000105Esv*sd*bc*sc*i*
.....etc.... for ~30 more lines here...
alias: pci:v00008086d000010DFsv*sd*bc*sc*i*
srcversion: A42F8874B58F240524B9A16
parm: CrcStripping:Enable CRC Stripping, disable if your BMC needs the CRC (array of int)
parm: KumeranLockLoss:Enable Kumeran lock loss workaround (array of in t)
parm: SmartPowerDownEnable:Enable PHY smart power down (array of int)
parm: IntMode:Interrupt Mode (array of int)
parm: InterruptThrottleRate:Interrupt Throttling Rate (array of int)
parm: RxAbsIntDelay:Receive Absolute Interrupt Delay (array of int)
parm: RxIntDelay:Receive Interrupt Delay (array of int)
parm: TxAbsIntDelay:Transmit Absolute Interrupt Delay (array of int)
parm: TxIntDelay:Transmit Interrupt Delay (array of int)
parm: copybreak:Maximum size of packet that is copied to a new buffer on receive (uint)
pvm2:~#

note in dmesg, we see:

e1000e: Intel(R) PRO/1000 Network Driver - 0.5.8.2-NAPI
e1000e: Copyright (c) 1999-2008 Intel Corporation.
ACPI: PCI Interrupt 0000:00:19.0[A] -> GSI 20 (level, low) -> IRQ 20
PCI: Setting latency timer of device 0000:00:19.0 to 64
0000:00:19.0: eth1: (PCI Express:2.5GB/s:Width x1) 00:15:17:8e:38:a1
0000:00:19.0: eth1: Intel(R) PRO/1000 Network Connection
0000:00:19.0: eth1: MAC: 7, PHY: 6, PBA No: 0070ff-0ff
pvm2:~#
 
Hi, yes - I had problems with NIC performance when jumbo frame MTU 9000 was enabled. Symptoms of the problem are,

- you can SSH in and do small packet type work OK
- but as soon as you try anything less trivial, the connection lags / hangs terribly. (ie, to scp a 2 meg file or larger, for example, into the host)
- if you drop MTU back to 1500 the problem goes away

I've seen the same behaviour on other linux environments (RHEL/CentOS 5) as well - ie - that the stock e1000e driver included has known bugs / and that the latest version addresses these known bugs, but the latest version driver is not yet available in the distro yet by default ..

I believe you can find discussion of the issues in the e1000 sourceforge forums. But if you don't feel like digging in extensively, just using the latest driver for e1000e seems to do the trick.


Tim
 
Hi - Sure I'll try to do some tests this week to confirm. I had noted after doing a clean install of ProxVE on a new system this week, that the e1000/e1000e drivers were magically updated now - very nice :)

Once I have feedback I'll let you know,


Tim
 
Hi Dietmar, just to let you know I've applied the updates available for ProxVE to both the systems with the e1000e NICs which had required the updated driver in the past to work well. Using the latest ProxVE 1.1 they both behave well / no problems.

The only other thing, which has been an ongoing 'minor' issue, maybe I should mention here: I have an openvz virtual machine (CentOS 5 template) with Cacti running on one of these 2 ProxVE physical hosts. This cacti host is moitoring a number of servers and a switch in this group of hardware (5 devices monitored in total via SNMP). I note in my cacti graphs, there are periods of 'network discontinuity' at fairly regular intervals, ie, maybe 12+ times per day cacti thinks it can't reach other systems. These devices are all connected togother directly to the same physical gig-ether switch; a very simple LAN network. Also 'interestingly', the catci host never has trouble (ie, discontinuities of any kind) when monitoring itself, nor the physical ProxVE host upon which it resides.

I did also note some messages logged in output of 'dmesg' regarding short UDP packets, as follow:

---paste---
vmbr0: no IPv6 routers present
UDP: short packet: From 10.10.2.10:0 0/219 to 10.10.2.255:0
UDP: short packet: From 10.10.2.10:0 16896/219 to 10.10.2.255:2339
UDP: short packet: From 10.10.2.10:43616 3673/219 to 10.10.2.255:80
UDP: short packet: From 10.10.2.10:55344 3732/219 to 10.10.2.255:22
UDP: short packet: From 10.10.2.10:55344 3732/219 to 10.10.2.255:22
pvm1:~#

---endpaste---

(note in this case, 10.10.2.10 is one of the local physical servers monitored, which does have this 'discontinuity' apparent in cacti graphs..)



I'm not certain, if this type of behaviour has been observed before / elsewhere or not, or if the udp short packet errors might be related in any way or not. I've just changed the config in cacti to monitor 'device availability' via ping and snmp, not just ping; and also to try 5 ping attempts before flagging a device as inaccessible (rather than the default of one fail means device inaccessible) -- we'll see if this makes any difference to behaviour.

I know that I"ve run cacti in virtual (openvz) environment at another site (not using ProxVE - just straight Openvz install on CentOS physical host) -- and there were no general problems there with 'intermittent interruptions of network' like I'm seeing here..


This isn't urgent, but I thought I should mention it in case it was a known issue / or in case it was of interest.


Tim
 
If the container does no network traffic for at least a few minutes at a time and then starts up its polling cycle, it may have a delay for the traffic that it tries within the first half-second to each host on the same LAN segment that it has to do arp with. I have definitely seen slow arp with openvz containers using venet, but it doesn't seem to be an issue at all if there's continuous traffic.

Your fix of trying more than one ping and also trying snmp sounds like it should be effective.
 
Hi,

Just a brief footnote to followup and confirm, that with the tweak to ping multiple times before flagging a fail in cacti - did indeed resolve my issue. (In case this info is of help to others in the future ..)

--Tim
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!