Proxmox network drivers are all quite unstable

felipe · Nov 29, 2013

we changed from normal libvirt kvm to proxmox...
before we had a good runing setup. never any problem.
now we just encounter one problem after another... started with problems of the windows ide drivers..
and now all our windows computer with rtl8139 lost all their network...
now changed to e1000 ... deactivating ipv6 freezes the machine... some windows clients also suddenly hang now without doing anything...
its catastrophic run a production cluster like this....

m.ardito · Nov 29, 2013

strange, I never had a problem in years with windows and net drivers... try to use other and read wiki notes about windows guests...

Marco

mmenaz · Nov 29, 2013

Has always been a nightmare to me too for Windows guests. At the beginning tons of "strange" probelms with virtio drivers, then I went to intel e1000. Had some vm that worked fine with rtl8139 and had network freeze in more recent proxmox. Different windows version have different behaviour, in short a minefield.
But can't be a proxmox issue, since the underlying technology is KVM and Fedora virtio guest drivers, so the same technology you used with libvirt and kvm.
If you use the same kvm version with libvirt and kvm, and same guest virtio drivers, you should experience the same problems as well. If is not the case, please report exact config/versions because if there is some bug on proxmox side I'm sure developers will be happy to be able to (try to) fix.

felipe · Nov 30, 2013

we have about 50 different windows machines. the last 3 years we never had any problems with rtl8139 or any other....
now we moved to proxmox and problems started. i really like proxmox and the proxmox related stuff really works perfect...
but the kernel versions etc... don't seem that stable with the drivers like my old setup... i would never have expected this because so many people use proxmox and i thought they just use the most stable of all....
but we had 3 different kernel versions before... even different ubuntu versions.. and never any problems with rtl8139 in windows. (linux is still stable ...)
the actual kernel used by proxmox seems to have big problems with windows guests...(network)

my last try of using a newer vdriver resulted in bluescreen and the machine did not even boot anymore... i had to remove the drivers by hand.
i have just problems to argue against our windows guys.. because a simple hyper V setup works a lot mor stable than all the kvm stuff....
the sad thing it is that bevore changing to proxmox we neved had any issues....
if any developer wants to no exact data of what is not working :

pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

i would like to contribute to get this more stable!

spirit · Nov 30, 2013

Hi,

I'm running around 100 windows vm (2003R2 x64 , 2008R2) with virtio drivers, and I never had any problem with them (no crash, no network interrupt,...)

I see a lot of new patches/fix for e1000 && rtl8139 in qemu git, so maybe coming qemu 1.7 for proxmox will solve your problems
http://git.qemu.org/qemu.git?p=qemu.git&a=search&h=HEAD&st=commit&s=e1000
http://git.qemu.org/qemu.git?p=qemu.git&a=search&h=HEAD&st=commit&s=rtl8139

for virtio, what version do you use ?

mmenaz · Nov 30, 2013

I would love to have details of your previous working setup, kernel version, kvm version, etc.. I.e. when proxmox had older kvm, rtl8139 in windows worked fine, when I upgraded proxmox that had a more recent kvm version, I had (have) connectivity problems, i.e. if you test with iperf after some seconds connection stops on guest side.
What puzzles me is that Proxmox uses standard and stable stuff, like red hat kernel and stable kvm code, so I dubt that what they add can be the cause of such troubles, but if you provide us more details or you are able to do more tests (i.e. in your working old setup install newer kvm? Standard RH kernel? Kernel from Proxmox? change kvm flags when run the vm? etc.) maybe developers could see if is something that they added.

felipe · Dec 2, 2013

i had
3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
3.2.0-29-generic #46-Ubuntu SMP Fri Jul 27 17:03:23 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
and one kernel ubuntu 2.32.xxxx
all 3 worked perfect with rtl8139 ....
as i changed the server i cant check for exact kvm versions anymore... but i can setup a new server to check if the network drivers work there..

i was also excpecting the proxmox kernel/kvm more stable than other ubuntu/debian versions because of lots of testing....
but it seems that the rtl nic with actual proxmox version does not work at all with win7 guests and is very slow with win2008 guests.,. on linux i did not loose network connectivity at the moment but servers configured with the rtl nic are not very productive... but i think it is a lot more stable than windows....
generally i would say that in actual proxmox it is unstable to use the rtl card....
e1000 seems to work fine at the moment.... stress tests over hours didnt make a problem...

is there any page in the wiki to describe which proxmox version & network or disk drivers seems to run good with which linunx and windows versions? would help people out there a lot. specially some versions of virtio drivers are veeeery buggy and other work good...
thats what i really miss in proxmox (wiki). knowing which windows/linux version works perfect and stable with which proxmox version and drivers! (i think the big problem with kvm is generally windows)

e100 · Dec 2, 2013

felipe said:
my last try of using a newer vdriver resulted in bluescreen and the machine did not even boot anymore... i had to remove the drivers by hand.

I have had this happen once, I let windows search for the driver, it used one from the wrong sub-folder in the virtio driver ISO.
There is a handy table here to help one choose the proper folder:
http://pve.proxmox.com/wiki/Paravirtualized_Network_Drivers_for_Windows#Introduction

I use virtio for disks and e1000 for network when using Windows.

blackpaw · Dec 2, 2013

e100 said:
There is a handy table here to help one choose the proper folder:
http://pve.proxmox.com/wiki/Paravirtualized_Network_Drivers_for_Windows#Introduction

Handy that, needs 2012 & 2012R2 added though.

Any reason why you don't use the PV network drivers for windows?

felipe · Dec 2, 2013

are the "latest" drivers stable? i know it from a lot of people and it also happened to me that virtio drivers where really unstable...
on some machines we allready use older virtio drivers....
i changed 2 weeks ago a ide driver to virtio which resulted in blue screen- reading about lots of complains in the internet since then i am little bit afraid of the virtio drivers. e1000 allways worked and had enogh performance for most guests...
i just got rid of all rtl8139 cards... but still about 20-30 machines had this drivers... not a funny work on production machines...
i was just really disappointed that the proxmox kernel is not that stable with the nics then my standard ubutu kvm before...
i love the proxmox interface and web gui.... but the kernel... ufff thats far away from what i expected... even that it is an old kernel...
like mentioned in another post today it is not possible to install win2012 remote desktop services because of the old kernel....

quickstarter · Jan 4, 2014

felipe said:
we have about 50 different windows machines. the last 3 years we never had any problems with rtl8139 or any other....
now we moved to proxmox and problems started. i really like proxmox and the proxmox related stuff really works perfect...
but the kernel versions etc... don't seem that stable with the drivers like my old setup... i would never have expected this because so many people use proxmox and i thought they just use the most stable of all....
but we had 3 different kernel versions before... even different ubuntu versions.. and never any problems with rtl8139 in windows. (linux is still stable ...)
the actual kernel used by proxmox seems to have big problems with windows guests...(network)

my last try of using a newer vdriver resulted in bluescreen and the machine did not even boot anymore... i had to remove the drivers by hand.
i have just problems to argue against our windows guys.. because a simple hyper V setup works a lot mor stable than all the kvm stuff....
the sad thing it is that bevore changing to proxmox we neved had any issues....
if any developer wants to no exact data of what is not working :

pveversion -v
proxmox-ve-2.6.32: 3.1-114 (running kernel: 2.6.32-26-pve)
pve-manager: 3.1-24 (running version: 3.1-24/060bd5a6)
pve-kernel-2.6.32-26-pve: 2.6.32-114
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.5-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.0-2
pve-cluster: 3.0-8
qemu-server: 3.1-8
pve-firmware: 1.0-23
libpve-common-perl: 3.0-9
libpve-access-control: 3.0-8
libpve-storage-perl: 3.0-18
pve-libspice-server1: 0.12.4-2
vncterm: 1.1-6
vzctl: 4.0-1pve4
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 1.4-17
ksm-control-daemon: 1.1-1
glusterfs-client: 3.4.1-1

i would like to contribute to get this more stable!

I am running the exact same version as you, and am encountering similar issues. In my case my whole network has become destablized by adding the proxmox server. My previously stable router now crashes 3 times/week, and the proxmox server has random network slowdowns. I did the following to make proxmpox somewhat more stable:
ethtool -K eth0 rx off

The server was up for 1 week before a crash vvs 1 day before. There is nothing in the logs after a crash. I've configured 12 openvz servers , and like proxmox, and am now trying to figure out what is going on. I'm not sure if this is a networking issue or if I just have bad memory, a bad network cable or other bad hardware. I will post a solution if I find one.

My hardware is a 3-5 year old SGI rackable with 24 GB and 2 quad core AMD cpus.

spirit · Jan 4, 2014

Hi, about windows and rtl8139, they are a confirmed bug fixed in qemu git
http://git.qemu.org/?p=qemu.git;a=commit;h=00b7ade807b5ce6779ddd86ce29c5521ec5c529a

(So it should be fixed in next proxmox release with qemu 1.7).

For e1000, I really don't known, I never have had problem with it

quickstarter · Jan 5, 2014

quickstarter said:
I am running the exact same version as you, and am encountering similar issues. In my case my whole network has become destablized by adding the proxmox server. My previously stable router now crashes 3 times/week, and the proxmox server has random network slowdowns. I did the following to make proxmpox somewhat more stable:
ethtool -K eth0 rx off

The server was up for 1 week before a crash vvs 1 day before. There is nothing in the logs after a crash. I've configured 12 openvz servers , and like proxmox, and am now trying to figure out what is going on. I'm not sure if this is a networking issue or if I just have bad memory, a bad network cable or other bad hardware. I will post a solution if I find one.

My hardware is a 3-5 year old SGI rackable with 24 GB and 2 quad core AMD cpus.

It turns out my issue was due to bad RAM

No more netowrking issues once the bad ram was removed.

blackpaw · Jan 5, 2014

Now that's interesting

Sent from my MT27i using Tapatalk

jdayrhino · Mar 12, 2014

Has anyone resolved this issue ? I have the exact same issue but only with Windows. CentOS and Ubuntu work great.
p
It seems like the network drivers just go to sleep. If I connect to the console and ping one of my other machines the network card starts working, and I can also ping this VM.

If I let it sit idle for 30 seconds and then try to ping the VM again it "Request timed out" on me. I can then go back in the console, ping again and the network is alive again.

It is very strange, if I log into the console and setup a persistent ping, it will keep the network running indefinitely, as soon as I stop the ping and wait a few seconds the vm nic stops responding.

felipe · Mar 12, 2014

as mentioned rtl8139 has a bug and should be fixed with the update to proxmox 3.2 (qemu 1.7) but i dont use the rtl8139 cards anymore. just change it for e1000 it works perfect...

jdayrhino · Mar 12, 2014

Thanks for the quick response. I played around with it a little more and it seems to be related to bonding. I setup both NICs in a bond using balance-rr (my switch does not support 802.3ad) and this does not seem to work very well. I changed it to balance-alb and that seems to work just fine.

e100 · Mar 13, 2014

Whenever there is a crash and no logs of why, it is usually related to a kernel panic. My experience is kernel panics are almost always caused by bad hardware and its usually bad RAM.

Even ECC RAM can go bad and cause a kernel panic. But I have had ECC RAM prevent errors numerous times.

e100 · Mar 13, 2014

I see you changed your bonding and resolved this problem, just wanted to make a comment that might help future readers.

What you described is a classic symptom of a problem at layer 2 in the 7 layer burrito. http://en.m.wikipedia.org/wiki/OSI_model

Duplicated MAC or IP address, improperly configured vlans, bridges, trunks, bonds, switches, etc. Switches/servers keep track of where a particular MAC is located and what MAC is used for a particular IP address. Due to a misconfiguration the MAC cannot be resolved properly so the network stops working for the things using that MAC. When the problem MAC is used, eg sending pings from that MAC, the systems are 'reminded' where to find that MAC and the network starts working.
Most switches I have used only cache MAC data for 30 second which is why the unreachable problem occurs after about 30 second of activity.

I know this and have seen this happen numerous times and occasionally I still find myself scratching my head thinking WTF when these problems occur. It is too easy to overlook the basics when you are distracted by the cries of ten thousand users complaining they can't access the latest lolcats for five minutes. :-D

Proxmox network drivers are all quite unstable

Well-Known Member

Famous Member

Renowned Member

Well-Known Member

Distinguished Member

Renowned Member

Well-Known Member

Famous Member

Renowned Member

Well-Known Member

New Member

Distinguished Member

New Member

Renowned Member

Member

Well-Known Member

Member

Famous Member

Famous Member

We value your privacy