Loosing ethernet connection e1000e

bohansen · Jul 6, 2010

Hi,

Just had a very strange error today. Suddenly I lost all connection to the Proxmox box (been running for 120 days). I looked at the console and found it running fine, but log-messages revealed the following:

Code:

Jul  6 13:31:10 jango kernel: NETDEV WATCHDOG: eth1: transmit timed out
Jul  6 13:31:10 jango kernel: bonding: bond0: link status definitely 
down for interface eth1, disabling it
Jul  6 13:31:13 jango kernel: 0000:05:00.1: eth1: Error reading PHY register
Jul  6 13:31:13 jango kernel: e1000e: eth1 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: RX/TX
Jul  6 13:31:13 jango kernel: bonding: bond0: link status definitely up 
for interface eth1.
Jul  6 13:33:28 jango kernel: NETDEV WATCHDOG: eth1: transmit timed out
Jul  6 13:33:28 jango kernel: bonding: bond0: link status definitely 
down for interface eth1, disabling it
Jul  6 13:34:09 jango kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul  6 13:34:09 jango kernel: bonding: bond0: link status definitely 
down for interface eth0, disabling it
Jul  6 13:34:09 jango kernel: bonding: bond0: now running without any 
active interface !
Jul  6 13:34:12 jango kernel: 0000:05:00.0: eth0: Error reading PHY register
Jul  6 13:34:12 jango kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: RX/TX
Jul  6 13:34:12 jango kernel: vmbr0: port 1(bond0) entering disabled state
Jul  6 13:34:12 jango kernel: bonding: bond0: link status definitely up 
for interface eth0.
Jul  6 13:34:12 jango kernel: bonding: bond0: first active interface up!
Jul  6 13:34:12 jango kernel: vmbr0: port 1(bond0) entering learning state
Jul  6 13:34:12 jango kernel: vmbr0: topology change detected, propagating
Jul  6 13:34:12 jango kernel: vmbr0: port 1(bond0) entering forwarding state
Jul  6 13:36:02 jango kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul  6 13:36:02 jango kernel: bonding: bond0: link status definitely 
down for interface eth0, disabling it
Jul  6 13:36:02 jango kernel: bonding: bond0: now running without any 
active interface !
Jul  6 13:36:02 jango kernel: 0000:05:00.0: eth0: Error reading PHY register
Jul  6 13:36:02 jango kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: RX/TX
Jul  6 13:36:02 jango kernel: vmbr0: port 1(bond0) entering disabled state
Jul  6 13:36:03 jango kernel: bonding: bond0: link status definitely up 
for interface eth0.
Jul  6 13:36:03 jango kernel: bonding: bond0: first active interface up!
Jul  6 13:36:03 jango kernel: vmbr0: port 1(bond0) entering learning state
Jul  6 13:36:03 jango kernel: vmbr0: topology change detected, propagating
Jul  6 13:36:03 jango kernel: vmbr0: port 1(bond0) entering forwarding state
Jul  6 13:38:02 jango kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jul  6 13:38:02 jango kernel: bonding: bond0: link status definitely 
down for interface eth0, disabling it
Jul  6 13:38:02 jango kernel: bonding: bond0: now running without any 
active interface !
Jul  6 13:38:02 jango kernel: 0000:05:00.0: eth0: Error reading PHY register
Jul  6 13:38:02 jango kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full 
Duplex, Flow Control: RX/TX
Jul  6 13:38:02 jango kernel: vmbr0: port 1(bond0) entering disabled state
Jul  6 13:38:03 jango kernel: bonding: bond0: link status definitely up 
for interface eth0.
Jul  6 13:38:03 jango kernel: bonding: bond0: first active interface up!
Jul  6 13:38:03 jango kernel: vmbr0: port 1(bond0) entering learning state
Jul  6 13:38:03 jango kernel: vmbr0: topology change detected, propagating
Jul  6 13:38:03 jango kernel: vmbr0: port 1(bond0) entering forwarding state

After a restart everything is running fine. I did a search and found this thread: http://sourceforge.net/tracker/index.php?func=detail&aid=2908463&group_id=42302&atid=447449

Eth0 and eth1 is configured as balance-xor.

Is there any plans to upgrade the e1000e driver to either 1.1.19 or 1.2.8 soon? According to the thread it seems more likely to fix the problem than the current 1.1.2. I'm currently on 2.6.24-9 (e1000e 1.0.15).

Best regards,
Bo

dietmar · Jul 7, 2010

bohansen said:
Is there any plans to upgrade the e1000e driver to either 1.1.19 or 1.2.8 soon? According to the thread it seems more likely to fix the problem than the current 1.1.2. I'm currently on 2.6.24-9 (e1000e 1.0.15).

I just compiled and uploaded a new kernel to the pvetest repository. To install the new kernel do:

# wget ftp://download.proxmox.com/debian/dists/lenny/pvetest/binary-amd64/proxmox-ve-2.6.24_1.5-24_all.deb
# wget ftp://download.proxmox.com/debian/dists/lenny/pvetest/binary-amd64/pve-kernel-2.6.24-12-pve_2.6.24-24_amd64.deb
# dpkg -i proxmox-ve-2.6.24_1.5-24_all.deb pve-kernel-2.6.24-12-pve_2.6.24-24_amd64.deb

Please can you test?

bohansen · Jul 8, 2010

I'm lucky to have an almost identical test server at the moment. I've been running the 2.6.24-12 kernel the last 24 hours - installing and running a few VMs.
I've stressed the network a bit, but it is difficult to thoroughly test an error which popped up after 120 days.

Thank you for your quick reaction Dietmar!

Bo

MikeR · Mar 21, 2011

dietmar said:
I just compiled and uploaded a new kernel to the pvetest repository. To install the new kernel do:

# wget ftp://download.proxmox.com/debian/dists/lenny/pvetest/binary-amd64/proxmox-ve-2.6.24_1.5-24_all.deb
# wget ftp://download.proxmox.com/debian/d.../pve-kernel-2.6.24-12-pve_2.6.24-24_amd64.deb
# dpkg -i proxmox-ve-2.6.24_1.5-24_all.deb pve-kernel-2.6.24-12-pve_2.6.24-24_amd64.deb

Please can you test?

Hello Dietmar

I am new to proxmox and am impressed with it but I do seem to have the same problem that is described here but not with the e1000e and I have a new proxmox 1.7 running.
I have 2 networkcards in my proxmox 1.7 so I can split LAN and DMZ.
I was told by a college of mine that has more experiance with KVM that virtio would give the best performance so I set all disks and networkcards of my VM's to virtio.
I have 5 VM's 1 Windows 7 and 4 Linux distributions, 32 and 64 bit mixed.
All VM's loose their networkconnection when you start using them a bit.
At that point I cannot see if it has crashed or just lost its network connection.
I cannot get a stabile VM running and I'm sure that proxmox should be much more stabile than that and I hope you can point me in the right direction to solve my problem.
I like to get a stabile proxmox and you like to get information on how to make proxmox more stabile. I have an interesting system and am offering to test to find out how it can be so unstabile in my configuration

I found this in the loggings which I see as the source of my problem.
All VM's are connected to vmbr1 and only the vortual ports seem to disable:
vmbr1: port 2(tap104i1d0) entering disabled state
vmbr1: port 3(tap104i1d1) entering disabled state
vmbr1: port 4(tap103i1d0) entering disabled state
vmbr1: port 5(tap104i1d0) entering disabled state

I did some searching in my system and got the following info:
proxmox01:~# uname -r
2.6.32-4-pve

proxmox01:~# pveversion -v
pve-manager: 1.7-11 (pve-manager/1.7/5470)
running kernel: 2.6.32-4-pve
proxmox-ve-2.6.32: 1.7-30
pve-kernel-2.6.32-4-pve: 2.6.32-30
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4

proxmox01:~# ifconfig
eth0 Link encap:Ethernet HWaddr 00:1d:7d:04:84:b4
inet6 addr: fe80::21d:7dff:fe04:84b4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:12017 errors:0 dropped:0 overruns:0 frame:0
TX packets:22236 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:953223 (930.8 KiB) TX bytes:2860420 (2.7 MiB)
Interrupt:30 Base address:0xe000

eth1 Link encap:Ethernet HWaddr 00:1d:7d:04:84:a4
inet6 addr: fe80::21d:7dff:fe04:84a4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:376116199 errors:0 dropped:0 overruns:0 frame:0
TX packets:113473107 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:542622785601 (505.3 GiB) TX bytes:120127017301 (111.8 GiB)
Interrupt:31 Base address:0xa000

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:444082 errors:0 dropped:0 overruns:0 frame:0
TX packets:444082 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:128527438 (122.5 MiB) TX bytes:128527438 (122.5 MiB)

tap101i1d0 Link encap:Ethernet HWaddr 56:d7:9b:1b:ba:59
inet6 addr: fe80::54d7:9bff:fe1b:ba59/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:2168343 errors:0 dropped:2 overruns:0 frame:0
TX packets:3972555 errors:0 dropped:0 overruns:1 carrier:0
collisions:0 txqueuelen:500
RX bytes:4851268176 (4.5 GiB) TX bytes:5558599341 (5.1 GiB)

tap105i1d0 Link encap:Ethernet HWaddr 2a:a2:8c:db:35:2e
inet6 addr: fe80::28a2:8cff:fedb:352e/64 Scope:Link
UP BROADCAST RUNNING PROMISC MULTICAST MTU:1500 Metric:1
RX packets:22735 errors:0 dropped:0 overruns:0 frame:0
TX packets:40295 errors:0 dropped:0 overruns:1 carrier:0
collisions:0 txqueuelen:500
RX bytes:1777064 (1.6 MiB) TX bytes:56532040 (53.9 MiB)

venet0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 -00
UP BROADCAST POINTOPOINT RUNNING NOARP MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 B) TX bytes:0 (0.0 B)

vmbr0 Link encap:Ethernet HWaddr 00:1d:7d:04:84:b4
inet addr:192.168.200.238 Bcast:192.168.200.255 Mask:255.255.255.224
inet6 addr: fe80::21d:7dff:fe04:84b4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:4 errors:0 dropped:0 overruns:0 frame:0
TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:228 (228.0 B) TX bytes:1931 (1.8 KiB)

vmbr1 Link encap:Ethernet HWaddr 00:1d:7d:04:84:a4
inet addr:172.22.22.238 Bcast:172.22.22.255 Mask:255.255.255.0
inet6 addr: fe80::21d:7dff:fe04:84a4/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:559 errors:0 dropped:0 overruns:0 frame:0
TX packets:390 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:106191 (103.7 KiB) TX bytes:121718 (118.8 KiB)

I also have a KVM bug that I already found on the internet described here:
https://bugzilla.redhat.com/show_bug.cgi?id=507085

Here is a part of my dmesg log:
kvm: 4391: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0x0
kvm: 4391: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x130076
kvm: 4391: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffdcbe90
kvm: 4391: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076
kvm: 4391: cpu1 unimplemented perfctr wrmsr: 0xc0010004 data 0x0
kvm: 4391: cpu1 unimplemented perfctr wrmsr: 0xc0010000 data 0x130076
kvm: 4391: cpu1 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffdcbe90
kvm: 4391: cpu1 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076
vmbr1: port 2(tap105i1d0) entering disabled state
vmbr1: port 2(tap105i1d0) entering disabled state
device tap105i1d0 entered promiscuous mode
vmbr1: port 2(tap105i1d0) entering forwarding state
tap105i1d0: no IPv6 routers present
kvm: 8694: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0x0
kvm: 8694: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x130076
kvm: 8694: cpu0 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffdcbe90
kvm: 8694: cpu0 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076
kvm: 8694: cpu1 unimplemented perfctr wrmsr: 0xc0010004 data 0x0
kvm: 8694: cpu1 unimplemented perfctr wrmsr: 0xc0010000 data 0x130076
kvm: 8694: cpu1 unimplemented perfctr wrmsr: 0xc0010004 data 0xffffffffffdcbe90
kvm: 8694: cpu1 unimplemented perfctr wrmsr: 0xc0010000 data 0x530076

I'm not sure if it is connected to my primary problem but I guess I should mention it.

For a home setup I have a nice system:
AMD Athlon(tm) II X4 605e Processor
8 Gigabyte or memory on a Gigabyte GA-MA790FX-DQ6 motherboard.
HighPoint RocketRAID 3520 raid controller
3 x ST31000340NS in RAID5 making a 2 Terrabyte ext3 drive for the Proxmox setup.
5 x ST32000644NS in RAID5 making a 8 Terrabyte drive split up into 4 ext3 partitions for my samba shares for my local network.

I hope you have some good tips for me...

Mike R

dietmar · Mar 22, 2011

MikeR said:
I cannot get a stabile VM running and I'm sure that proxmox should be much more stabile than that and I hope you can point me in the right direction to solve my problem.

What kind of OS do you run inside the guest? Does it work when you use e1000 instead of virtio?

MikeR · Mar 22, 2011

dietmar said:
What kind of OS do you run inside the guest? Does it work when you use e1000 instead of virtio?

Hello Dietmar

Right now I am just testing to get to know proxmox and how it handles and I am running these guests:
Guest1: Ubuntu Desktop 10.10 32bit 40GB harddisk 1024 MB memory
Guest2: openSUSE 11.4 64bit 40GB harddisk 1024 MB memory
Guest3: Debian Squeeze 6.0 64 bit 40GB harddisk 1024 MB memory
Guest4: Windows 7 32bit 80GB harddisk 1024 MB memory
Guest5: Scientific Linux 6.0 64 bit 40GB harddisk 1536 MB memory

When I will be starting realy using it I would like to run several instances of SME server to get my domains running virtually while having more room on the same machine to run additional distro testing and some servers with additional functions.

switching to e1000 was my next step too find out what the problem is but I did not get to that just yet.
I will get back to you on that if I have tested it.

Questions:
1) Are there still limitations on using virtio for network or disks?
2) Are there issues using more than one networkcard in proxmox?
3) Will using the e1000 give performance degredation compared to using virtio?

I found that sometimes my VM's really crash and sometimes they just loose network connection.
In the case I only cannot connect to them they cannot be shutdown from the webinterface but I found that sometimes I can shutdown the VM with the command: qm stop <vmid>
Sometimes I have to kill it with this command: kill -QUIT <pid>
So I am not sure but it looks like I have a combination problem.
Like I said I will get back to you on that when I have tested with the e1000.

dietmar · Mar 22, 2011

MikeR said:
1) Are there still limitations on using virtio for network or disks?

it works here.

MikeR said:
2) Are there issues using more than one networkcard in proxmox?

no

MikeR said:
3) Will using the e1000 give performance degredation compared to using virtio?

Yes (if you use latest kernel (2.6.35) virtio is quite fast).

MikeR · Mar 28, 2011

dietmar said:
Yes (if you use latest kernel (2.6.35) virtio is quite fast).

Hello Dietmar

I switched all virtio networkcards for E1000 ones and it runs more stabile.
Ran the first 4 VM's for 5 days and after that I still was able to connect to them on the Open VNC Console so that's an improvement.
The 4 VM's did not do anymore than run, no extra processes than what the systems run with a default installation.
After that I started to download something with the first Ubuntu VM and when that ran okay I also started to copy a bigger local file from that Ubuntu VM to to a smb-share while downloading.
Unfortunately it did not take a very long time for the Open VNC Console to stop responding.
Shutting down the VM by the webinterface of the proxmox did not function anymore.
After that I started putty, connected to the proxmox and ran commands that I found somewhere online, commands like the following:
proxmox01:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
101 Ubuntu_10.10_Desktop_32_bit stopped 1024 40.00 0
102 openSUSE_11.4_64_bit running 1024 40.00 19496
103 Debian_6.0_64_bit running 1024 40.00 3520
104 Windows_7_32_bit stopped 1024 80.00 0
105 Scientific_Linux_6 stopped 1536 40.00 0
106 GhostBSD_64bit stopped 1024 40.00 0
proxmox01:~# qm stop 102
trying to aquire lock...got timeout
proxmox01:~# qm stop 103
trying to aquire lock...got timeout
proxmox01:~# kill -QUIT 19496
proxmox01:~# kill -QUIT 3520
proxmox01:~# qm list
VMID NAME STATUS MEM(MB) BOOTDISK(GB) PID
101 Ubuntu_10.10_Desktop_32_bit stopped 1024 40.00 0
102 openSUSE_11.4_64_bit stopped 1024 40.00 0
103 Debian_6.0_64_bit stopped 1024 40.00 0
104 Windows_7_32_bit stopped 1024 80.00 0
105 Scientific_Linux_6 stopped 1536 40.00 0
106 GhostBSD_64bit stopped 1024 40.00 0

So, all in all the proxmox became more stabile but after pushing just one VM to use more performance, the system crashes.
I already changed the virtio networkcards but I still used virtio harddisks upto now.
Just today I changed the harddisks to SCSI disks to find out if they are of influence to the stability of the system.
I'l get back to you if I have more results on this harddisk change.
If you have any bright idea's that I am missing I would like to hear it of cause.

Mike R

Search

Search

Loosing ethernet connection e1000e

bohansen

Active Member

dietmar

Proxmox Staff Member

bohansen

Active Member

MikeR

New Member

dietmar

Proxmox Staff Member

MikeR

New Member

dietmar

Proxmox Staff Member

MikeR

New Member

We value your privacy