This week i found a strange problem in a new DRBD Cluster using 2 Dells Servers (T610/ 2 x Xeon 5540 and T300/Xeon 3363):
When one VM using intensive write the I/O wait skyrocket.
After some hours studing the issue i see problem in BNC network card from Xeon 5540 server.. The server have 2 ports (same onboard card), one in local network switch (100) and another in cluster switch (Giga).
Sending ping to another node receive this (some packets have a very high ttl):
64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=0.247 ms
64 bytes from 192.168.0.3: icmp_seq=2 ttl=64 time=0.249 ms
64 bytes from 192.168.0.3: icmp_seq=3 ttl=64 time=0.174 ms
64 bytes from 192.168.0.3: icmp_seq=4 ttl=64 time=0.219 ms
64 bytes from 192.168.0.3: icmp_seq=5 ttl=64 time=0.202 ms
64 bytes from 192.168.0.3: icmp_seq=6 ttl=64 time=0.160 ms
64 bytes from 192.168.0.3: icmp_seq=7 ttl=64 time=0.173 ms
64 bytes from 192.168.0.3: icmp_seq=6 ttl=64 time=723 ms
64 bytes from 192.168.0.3: icmp_seq=7 ttl=64 time=0.247 ms
64 bytes from 192.168.0.3: icmp_seq=8 ttl=64 time=0.249 ms
64 bytes from 192.168.0.3: icmp_seq=9 ttl=64 time=0.174 ms
64 bytes from 192.168.0.3: icmp_seq=10 ttl=64 time=0.219 ms
64 bytes from 192.168.0.3: icmp_seq=6 ttl=64 time=410 ms
Pinging using the 100M switch no problem in same node.
inverting the ports - Same result
inverting the cables - Same result
pinging another host in giga swith - Same result
pinging from the another cluster node - ok - So the giga switch is not the problem
I change the cluster switch to a 100 mega and the ping problem is gone.
So i think this is a driver problem with BCM 5709 or a hardware fault in this particular nic.
This server is using the latest PVE version (1.6) and kernel 2.6.32-4, try 2.6.18-4 and same result.
Thanks in advance
When one VM using intensive write the I/O wait skyrocket.
After some hours studing the issue i see problem in BNC network card from Xeon 5540 server.. The server have 2 ports (same onboard card), one in local network switch (100) and another in cluster switch (Giga).
Sending ping to another node receive this (some packets have a very high ttl):
64 bytes from 192.168.0.3: icmp_seq=1 ttl=64 time=0.247 ms
64 bytes from 192.168.0.3: icmp_seq=2 ttl=64 time=0.249 ms
64 bytes from 192.168.0.3: icmp_seq=3 ttl=64 time=0.174 ms
64 bytes from 192.168.0.3: icmp_seq=4 ttl=64 time=0.219 ms
64 bytes from 192.168.0.3: icmp_seq=5 ttl=64 time=0.202 ms
64 bytes from 192.168.0.3: icmp_seq=6 ttl=64 time=0.160 ms
64 bytes from 192.168.0.3: icmp_seq=7 ttl=64 time=0.173 ms
64 bytes from 192.168.0.3: icmp_seq=6 ttl=64 time=723 ms
64 bytes from 192.168.0.3: icmp_seq=7 ttl=64 time=0.247 ms
64 bytes from 192.168.0.3: icmp_seq=8 ttl=64 time=0.249 ms
64 bytes from 192.168.0.3: icmp_seq=9 ttl=64 time=0.174 ms
64 bytes from 192.168.0.3: icmp_seq=10 ttl=64 time=0.219 ms
64 bytes from 192.168.0.3: icmp_seq=6 ttl=64 time=410 ms
Pinging using the 100M switch no problem in same node.
inverting the ports - Same result
inverting the cables - Same result
pinging another host in giga swith - Same result
pinging from the another cluster node - ok - So the giga switch is not the problem
I change the cluster switch to a 100 mega and the ping problem is gone.
So i think this is a driver problem with BCM 5709 or a hardware fault in this particular nic.
This server is using the latest PVE version (1.6) and kernel 2.6.32-4, try 2.6.18-4 and same result.
Thanks in advance
Last edited: