Lost link on Dell r720xd with Intel driver (Proxmox 2.2)

P

pstoneman

Guest
Hi all

We have a couple of Dell r720xd's operating as a proxmox cluster with DRBD and so on. We've got the Intel 4x1GigE Dell daughter card in the servers, with two interfaces bonded for 'public' and two bonded (balance-rr) for drbd syncronisation.

Last night I upgraded them (one at a time) to Proxmox 2.2. Everything returned stable, drbd re-synchronised, all was good. A couple of hours later (after I'd gone home!), I saw this in the logs on both boxes: "kernel: igb: eth3 NIC Link is Down", followed 20 minutes later by "kernel: igb: eth2 NIC Link is Down". Obviously, drbd then failed, and the whole world ended :) When I logged on this morning, I saw eth2 on server1 in half-duplex mode, with no link on eth2 on server2. I saw server3 with eth3 in half-duplex, with eth3 on server1 with no link. I rebooted the servers, and everything came back as normal.

There was nothing in any log file, in dmesg, or in the iDRAC/OMSA log to indicate why the network cards failed, so I'm a bit stumped as to why it happened. They're right next to each other, with eth2-eth2 and eth3-eth3 both via 1m brand-new network cables. They've been working and totally stable for a few months prior to last night. Since then (about 12 hours ago), everything's been stable, and there's been a normal amount of drbd traffic going over the backend interfaces, with no blips in dmesg/syslog.

Does anyone have any idea how I can start investigating it? I'm at a loss!

Thanks!

Phil
 
So this just happened again tonight. Just applied the latest proxmox updates and rebooted. Lo and behold, about 4 hours later, both crossover'd NICs lost link, then one came back properly, but one came back at half duplex (the other end had no link). ifconfig eth2 down; ifconfig eth2 up fixed.

In the meantime, drbd had detected split-brain due to the loss of networking (although I've now recovered it)

Any idea why the NICs might have died - or what a good way to start debugging this would be?

Thanks...
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!