Hi all,
I have a cluster with four Dell servers, and am seeing network errors and experiencing web interface hangs on two of them of the last generation (PE R710 and PE R610). These errors do not appear on the other two, older servers (PE 2950 and PE 2900).
PE R710 is the master node, and when I try to get access for exemple to the hardware tab of a VM, it hangs.
This is what I just obtained in web interface:
Here are the errors I see in dmesg:
There are also errors in syslog:
I should add that I have two iSCSI storages, connected on interface eth1 of each servers.
It seems, in syslog, that the errors are related to iSCSI, but the web interface should not hang.
As I said, I see no problem with the other two nodes.
I wonder if it could be a problem with the ethernet controllers included in PE R710 and R610 ?
In PE 2950, I have NetXtreme II BCM5708...
All servers are running PVE 1.7 with kernel 2.6.35-1 (only KVM) :
Anyone can give me a clue on this annoying problem ?
Thanks.
Alain
I have a cluster with four Dell servers, and am seeing network errors and experiencing web interface hangs on two of them of the last generation (PE R710 and PE R610). These errors do not appear on the other two, older servers (PE 2950 and PE 2900).
PE R710 is the master node, and when I try to get access for exemple to the hardware tab of a VM, it hangs.
This is what I just obtained in web interface:
Code:
[3223]ERR: 24: Error in Perl code: 500 read timeout
Code:
connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4349549670, last ping 4349550171, now 4349550672
connection1:0: detected conn error (1011)
connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4349549745, last ping 4349550246, now 4349550747
connection2:0: detected conn error (1011)
Code:
Feb 1 06:29:31 srv-kvm1 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4351052228, last ping 4351052729, now 4351053230
Feb 1 06:29:31 srv-kvm1 kernel: connection1:0: detected conn error (1011)
Feb 1 06:29:32 srv-kvm1 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Feb 1 06:29:45 srv-kvm1 kernel: connection2:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4351053640, last ping 4351054141, now 4351054642
Feb 1 06:29:45 srv-kvm1 kernel: connection2:0: detected conn error (1011)
Feb 1 06:29:46 srv-kvm1 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
It seems, in syslog, that the errors are related to iSCSI, but the web interface should not hang.
As I said, I see no problem with the other two nodes.
I wonder if it could be a problem with the ethernet controllers included in PE R710 and R610 ?
Code:
# lspci
....
01:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
01:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.0 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme II BCM5709 Gigabit Ethernet (rev 20)
All servers are running PVE 1.7 with kernel 2.6.35-1 (only KVM) :
Code:
srv-kvm1:/var/log# pveversion -v
pve-manager: 1.7-10 (pve-manager/1.7/5323)
running kernel: 2.6.35-1-pve
proxmox-ve-2.6.35: 1.7-9
pve-kernel-2.6.35-1-pve: 2.6.35-9
pve-kernel-2.6.24-8-pve: 2.6.24-16
qemu-server: 1.1-28
pve-firmware: 1.0-10
libpve-storage-perl: 1.0-16
vncterm: 0.9-2
vzctl: 3.0.24-1pve4
vzdump: 1.2-10
vzprocps: 2.0.11-1dso2
vzquota: 3.0.11-1
pve-qemu-kvm: 0.13.0-3
ksm-control-daemon: 1.0-4
Thanks.
Alain