Hello, we have a production cluster that is working for about 5 months. We have 3 DELL R440 Intel Gold servers with the same specs each:
80 cores
128 mb RAM,
Kingston DC600M 480gb for the proxmox
Kingston DC600M 2TB for the ceph osd's
1 Intel X520 DA2
1 Bradcomm 2 port 1GB ethernet
All was working for about 5 months but since yesterday, the speed on the X520 started to rapidly (less than 3min) degrade and is solved only if we restart the interface (ifdown -> ifup). The Ceph health status is always OK (full green).
We have truied everything.
gro, gso , tso, ntuple off
MTU is all 9000
the connection between servers is DAC (no switch)
tried with vlan aware on and off with the same result
We did an update yesterday on hope that the ixgbe driver was corrupted but the update does not solve the problem.
I was thinking. Maybe it's fisical ? Maybe someone bumps into the cable or conector ?
Anyone had this issue or have any ideas ?
For now we are restarting the interface on one of the nodes every 2 min (crontab) but we now it's a bad idea.
Thanks in advance
80 cores
128 mb RAM,
Kingston DC600M 480gb for the proxmox
Kingston DC600M 2TB for the ceph osd's
1 Intel X520 DA2
1 Bradcomm 2 port 1GB ethernet
All was working for about 5 months but since yesterday, the speed on the X520 started to rapidly (less than 3min) degrade and is solved only if we restart the interface (ifdown -> ifup). The Ceph health status is always OK (full green).
We have truied everything.
gro, gso , tso, ntuple off
MTU is all 9000
the connection between servers is DAC (no switch)
tried with vlan aware on and off with the same result
We did an update yesterday on hope that the ixgbe driver was corrupted but the update does not solve the problem.
I was thinking. Maybe it's fisical ? Maybe someone bumps into the cable or conector ?
Anyone had this issue or have any ideas ?
For now we are restarting the interface on one of the nodes every 2 min (crontab) but we now it's a bad idea.
Thanks in advance
Last edited: