We are using a three node cluster (Vers.3.3.-5). Currently we are having an issue with rgmanager.
On each node Rgmanger doesn’t start after reboot.
So each time after boot I’ve to restart cman an afterwards I’ve to start rgmanager manually.
After this procedure everything works fine until the next reboot. I’ve no clue why rgmanager doesn’t start automatically. Currently we having one VM working in HA mode. So there is als a ‘rm’ section’ in the cluster config.
Here are my configs:
We are using three bonded networks. Bond0 ist used for Proxmox Cluster, Bond 1 for NFS backup connections and Bond3 for ceph communication.
We are also running into another problem. After each reboot of any of our nodes one or two of our ceph osd's are marked as 'out' on the rebooted node. I can manualy start the osd an afterwards it works fine.
Do we have an possibly network issue during the bootup phase? But I can't see any error messages after startup. Maybe I'm looking at the wrong places. Please give a hint what more information is needed to investigate such issues.
Any help is appreciated.
On each node Rgmanger doesn’t start after reboot.
So each time after boot I’ve to restart cman an afterwards I’ve to start rgmanager manually.
After this procedure everything works fine until the next reboot. I’ve no clue why rgmanager doesn’t start automatically. Currently we having one VM working in HA mode. So there is als a ‘rm’ section’ in the cluster config.
Here are my configs:
Code:
pveversion -v
proxmox-ve-2.6.32: 3.3-139 (running kernel: 2.6.32-34-pve)
pve-manager: 3.3-5 (running version: 3.3-5/bfebec03)
pve-kernel-2.6.32-32-pve: 2.6.32-136
pve-kernel-2.6.32-29-pve: 2.6.32-126
pve-kernel-2.6.32-34-pve: 2.6.32-140
lvm2: 2.02.98-pve4
clvm: 2.02.98-pve4
corosync-pve: 1.4.7-1
openais-pve: 1.1.4-3
libqb0: 0.11.1-2
redhat-cluster-pve: 3.2.0-2
resource-agents-pve: 3.9.2-4
fence-agents-pve: 4.0.10-1
pve-cluster: 3.0-15
qemu-server: 3.3-3
pve-firmware: 1.1-3
libpve-common-perl: 3.0-19
libpve-access-control: 3.0-15
libpve-storage-perl: 3.0-25
pve-libspice-server1: 0.12.4-3
vncterm: 1.1-8
vzctl: 4.0-1pve6
vzprocps: 2.0.11-2
vzquota: 3.1-2
pve-qemu-kvm: 2.1-10
ksm-control-daemon: 1.1-1
glusterfs-client: 3.5.2-1
We are using three bonded networks. Bond0 ist used for Proxmox Cluster, Bond 1 for NFS backup connections and Bond3 for ceph communication.
Code:
# network interface settings
auto lo
iface lo inet loopback
iface eth0 inet manual
iface eth1 inet manual
iface eth2 inet manual
iface eth3 inet manual
iface eth4 inet manual
iface eth5 inet manual
auto bond0
iface bond0 inet manual
slaves eth0 eth2
bond_miimon 100
bond_mode 802.3ad
auto bond1
iface bond1 inet manual
slaves eth1 eth3
bond_miimon 100
bond_mode 802.3ad
auto bond2
iface bond2 inet manual
slaves eth4 eth5
bond_miimon 100
bond_mode 802.3ad
auto vmbr1
iface vmbr1 inet static
address 192.168.151.3
netmask 255.255.255.0
bridge_ports bond1
bridge_stp off
bridge_fd 3
auto vmbr1:0
iface vmbr1:0 inet static
address 192.168.153.3
netmask 255.255.255.0
auto vmbr1:1
iface vmbr1:1 inet static
address 192.168.154.3
netmask 255.255.255.0
auto vmbr0
iface vmbr0 inet static
address 172.18.0.32
netmask 255.255.252.0
gateway 172.18.0.1
bridge_ports bond0
bridge_stp off
bridge_fd 3
auto vmbr2
iface vmbr2 inet static
address 192.168.152.3
netmask 255.255.255.0
bridge_ports bond2
bridge_stp off
bridge_fd 3
Code:
cat /etc/pve/cluster.conf<?xml version="1.0"?>
<cluster config_version="18" name="dmc-cluster-ni">
<cman keyfile="/var/lib/pve-cluster/corosync.authkey"/>
<fencedevices>
<fencedevice agent="fence_ipmilan" auth="password" ipaddr="172.18.0.33" lanplus="1" login="YY" name="lx-vmhost-ni0-ipmi" passwd="XX" power_wait="10"/>
<fencedevice agent="fence_ipmilan" auth="password" ipaddr="172.18.0.36" lanplus="1" login="YY" name="lx-vmhost-ni1-ipmi" passwd="XX" power_wait="10"/>
<fencedevice agent="fence_ipmilan" auth="password" ipaddr="172.18.0.38" lanplus="1" login="YY" name="lx-vmhost-ni2-ipmi" passwd="XX" power_wait="10"/>
</fencedevices>
<clusternodes>
<clusternode name="lx-vmhost-ni1" nodeid="1" votes="1">
<fence>
<method name="power">
<device name="lx-vmhost-ni1-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="lx-vmhost-ni0" nodeid="2" votes="1">
<fence>
<method name="power">
<device name="lx-vmhost-ni0-ipmi"/>
</method>
</fence>
</clusternode>
<clusternode name="lx-vmhost-ni2" nodeid="3" votes="1">
<fence>
<method name="power">
<device name="lx-vmhost-ni2-ipmi"/>
</method>
</fence>
</clusternode>
</clusternodes>
<rm>
<pvevm autostart="1" vmid="100"/>
</rm>
</cluster>
We are also running into another problem. After each reboot of any of our nodes one or two of our ceph osd's are marked as 'out' on the rebooted node. I can manualy start the osd an afterwards it works fine.
Do we have an possibly network issue during the bootup phase? But I can't see any error messages after startup. Maybe I'm looking at the wrong places. Please give a hint what more information is needed to investigate such issues.
Any help is appreciated.
Last edited: