Hi,
I have few Blade Server HP C7000 and i use them to host Proxmox Cluster. My Cluster have 8 nodes (is this reliable quorum?) and sometime it rebooted all nodes hosted on Blade. Other server same blade have no impact.
Last 2 weeks i decided to add 1 Dell PowerEdge R620 and migrate some important customer to this node to avoid from this failure.
For first few incidents, i check and think this is HP firmware problem but after few incidents i connected all related information and firgure out this only happen to Proxmox node.
We have 16 nodes on this blade:
From other member node02, it show log in attachment: log_node02.txt
From log of node hv102 i could see this error:
My Proxmox version:
From my experiences, this happen only to server use by Proxmox VE.
Does anyone has exp on this and to get over this issues? Is this conflict by Proxmox software?
Thanks,
I have few Blade Server HP C7000 and i use them to host Proxmox Cluster. My Cluster have 8 nodes (is this reliable quorum?) and sometime it rebooted all nodes hosted on Blade. Other server same blade have no impact.
Last 2 weeks i decided to add 1 Dell PowerEdge R620 and migrate some important customer to this node to avoid from this failure.
For first few incidents, i check and think this is HP firmware problem but after few incidents i connected all related information and firgure out this only happen to Proxmox node.
We have 16 nodes on this blade:
- 6 server use with Virtuozzo (Old name Parrallels Cloud Server): this cluster is still up to day.
- 3 server use CentOS 7: no impact
- 7 server use Proxmox: whole these node down in the same incident.
From other member node02, it show log in attachment: log_node02.txt
From log of node hv102 i could see this error:
Code:
Jan 29 17:31:32 mycluster-hv102 systemd[1]: Started udev Coldplug all Devices.
My Proxmox version:
Code:
root@hv101:~# pveversion -v
proxmox-ve: 5.0-19 (running kernel: 4.10.17-2-pve)
pve-manager: 5.0-30 (running version: 5.0-30/5ab26bc)
pve-kernel-4.10.17-2-pve: 4.10.17-19
libpve-http-server-perl: 2.0-6
lvm2: 2.02.168-pve3
corosync: 2.4.2-pve3
libqb0: 1.0.1-1
pve-cluster: 5.0-12
qemu-server: 5.0-15
pve-firmware: 2.0-2
libpve-common-perl: 5.0-16
libpve-guest-common-perl: 2.0-11
libpve-access-control: 5.0-6
libpve-storage-perl: 5.0-14
pve-libspice-server1: 0.12.8-3
vncterm: 1.5-2
pve-docs: 5.0-9
pve-qemu-kvm: 2.9.0-3
pve-container: 2.0-15
pve-firewall: 3.0-2
pve-ha-manager: 2.0-2
ksm-control-daemon: 1.2-2
glusterfs-client: 3.8.8-1
lxc-pve: 2.0.8-3
lxcfs: 2.0.7-pve4
criu: 2.11.1-1~bpo90
novnc-pve: 0.6-4
smartmontools: 6.5+svn4324-1
zfsutils-linux: 0.6.5.9-pve16~bpo90
openvswitch-switch: 2.6.2~pre+git20161223-3
root@hv101:~#
From my experiences, this happen only to server use by Proxmox VE.
Does anyone has exp on this and to get over this issues? Is this conflict by Proxmox software?
Thanks,