P
pstoneman
Guest
Hi all
We have a couple of Dell r720xd's operating as a proxmox cluster with DRBD and so on. We've got the Intel 4x1GigE Dell daughter card in the servers, with two interfaces bonded for 'public' and two bonded (balance-rr) for drbd syncronisation.
Last night I upgraded them (one at a time) to Proxmox 2.2. Everything returned stable, drbd re-synchronised, all was good. A couple of hours later (after I'd gone home!), I saw this in the logs on both boxes: "kernel: igb: eth3 NIC Link is Down", followed 20 minutes later by "kernel: igb: eth2 NIC Link is Down". Obviously, drbd then failed, and the whole world ended When I logged on this morning, I saw eth2 on server1 in half-duplex mode, with no link on eth2 on server2. I saw server3 with eth3 in half-duplex, with eth3 on server1 with no link. I rebooted the servers, and everything came back as normal.
There was nothing in any log file, in dmesg, or in the iDRAC/OMSA log to indicate why the network cards failed, so I'm a bit stumped as to why it happened. They're right next to each other, with eth2-eth2 and eth3-eth3 both via 1m brand-new network cables. They've been working and totally stable for a few months prior to last night. Since then (about 12 hours ago), everything's been stable, and there's been a normal amount of drbd traffic going over the backend interfaces, with no blips in dmesg/syslog.
Does anyone have any idea how I can start investigating it? I'm at a loss!
Thanks!
Phil
We have a couple of Dell r720xd's operating as a proxmox cluster with DRBD and so on. We've got the Intel 4x1GigE Dell daughter card in the servers, with two interfaces bonded for 'public' and two bonded (balance-rr) for drbd syncronisation.
Last night I upgraded them (one at a time) to Proxmox 2.2. Everything returned stable, drbd re-synchronised, all was good. A couple of hours later (after I'd gone home!), I saw this in the logs on both boxes: "kernel: igb: eth3 NIC Link is Down", followed 20 minutes later by "kernel: igb: eth2 NIC Link is Down". Obviously, drbd then failed, and the whole world ended When I logged on this morning, I saw eth2 on server1 in half-duplex mode, with no link on eth2 on server2. I saw server3 with eth3 in half-duplex, with eth3 on server1 with no link. I rebooted the servers, and everything came back as normal.
There was nothing in any log file, in dmesg, or in the iDRAC/OMSA log to indicate why the network cards failed, so I'm a bit stumped as to why it happened. They're right next to each other, with eth2-eth2 and eth3-eth3 both via 1m brand-new network cables. They've been working and totally stable for a few months prior to last night. Since then (about 12 hours ago), everything's been stable, and there's been a normal amount of drbd traffic going over the backend interfaces, with no blips in dmesg/syslog.
Does anyone have any idea how I can start investigating it? I'm at a loss!
Thanks!
Phil