On a four node PVE / Ceph Cluster running on identical HP DL380's with 2 x E5420 2.5GHz, 32G RAM, six 250G SSD's. Purchased Community Support :).
I have a problem when attempting to restore.
Here is pveversion -v
proxmox-ve: 4.1-34 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-5 (running...
This is a new four node install of PVE4 cluster with ceph.
It is built on four identical HP DL380 servers using six 250 SSD's
Here is pveversion -v which is the same for all four nodes.
root@pmc01:~# pveversion -v
proxmox-ve: 4.1-34 (running kernel: 4.2.6-1-pve)
pve-manager: 4.1-5 (running...
The only relief I have found is from running this command which I discovered in another posting -
"echo 1 > /sys/module/ipmi_si/parameters/kipmid_max_busy_us" which reduces kipmi0 CPU utilization to below 1%.
All four of these nodes were running PVE 3 for over a year with no problems.
I moved all VM's to another cluster in order to "upgrade by doing fresh install" to version 4.
After wiping all drives, I completed fresh installs of PVE4. Then I completed the ceph install. Nothing else has been...
BIOS & firmware identical on the four nodes that are running on identical hardware.
TOP cmd on nodes 2 & 3 shows process kipmi0 using 100% CPU.
I see that kipmi0 is running on nodes 1 & 4 also but using virtually no CPU.
Dell C6100, dual six core L5640, 24G RAM
1 x 120G SanDisk SSD
2 x 240G Crucial SSD
4 x 160G Fujitsu SAS 15K
The problem I am seeing is that two of the four nodes present a constant load average of 3.
The other two present load average of 0.
There are NO VM's currently confgured.
The plan is to...
The omping test was successful.
But the /etc/hosts file revealed the problem. We had installed Proxmox while this server was on a different network and then changed the ip addresses by editing the /etc/network/interfaces file. We never modified the hosts file. Now that the hosts file is...
Yes /etc/pve/cluster.conf is the same on all four nodes.
<?xml version="1.0"?>
<cluster name="PPC-Office" config_version="6">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey">
</cman>
<clusternodes>
<clusternode name="pmc2" votes="1" nodeid="2"/><clusternode...
Here is output from Node one which is the one we are attempting to get back into the cluster -
root@pmc1:~# pveversion -v
proxmox-ve-2.6.32: 3.4-163 (running kernel: 2.6.32-41-pve)
pve-manager: 3.4-11 (running version: 3.4-11/6502936f)
pve-kernel-2.6.32-41-pve: 2.6.32-163...
I have a four node PVE. All nodes are licensed with PVE Community Subscription.
Node one has failed and must be replaced.
The cluster and all VM's are working fine on the remaining three nodes.
Following suggestions on other posts I have re-installed Proxmox on new hardware to replace failed...
I have attempted to add node back to PVE Cluster using pvecm add ip-of-node2 but it has failed 'Waiting for quorum...'
And this shows up in the syslog -
Sep 27 18:59:35 pmc1 pmxcfs[3872]: [main] notice: teardown filesystem
Sep 27 18:59:47 pmc1 pmxcfs[551627]: [quorum] crit: quorum_initialize...
I have attempted to add node back to PVE Cluster using pvecm add ip-of-node2 but it has failed 'Waiting for quorum...'
And this shows up in the syslog -
Sep 27 18:59:35 pmc1 pmxcfs[3872]: [main] notice: teardown filesystem
Sep 27 18:59:47 pmc1 pmxcfs[551627]: [quorum] crit: quorum_initialize...
This command failed with -
Invalid command: invalid chars = in host=pmc1
osd crush remove <name> (<ancestor>) : remove <name> from crush map (everywhere, or just at <ancestor>)
So I tried this command -
ceph osd crush remove pmc1
which seemed to work as it returned -
removed item id -2 name...
OK, this removed pmc1 from showing up in the GUI under Datacenter.
And this took care of pmc1 showing up in the GUI under Ceph > Config & Monitor.
But it still exists in the GUI under Ceph > OSD & Crush
thanks - Ron
Both of those ceph commands worked but the Proxmox GUI still shows -
- in the Config panel "mon.0"
- in the Monitor panel "mon.0"
- in the OSD panel "pmc1" ( this is node 1)
- in the Crush panel
- - "device 0"
- - "device 1"
- - "device 2"
- - under buckets - "host pmc1"
See...
Not to worry - other three nodes in the cluster do not use RAID array. Only node 1 was configured this way because it was different hardware (Dell R620). The other three nodes are Dell R610's with hd controller that supports pass thru mode. We have ordered another R610 to rebuild the failed...
From node 2
root@pmc2:~# ceph -s
cluster 773e19fe-60e7-427d-bdc9-6cbcc1301f6e
health HEALTH_WARN
1 mons down, quorum 1,2,3 1,2,3
monmap e4: 4 mons at...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.