Given Configuration:
4 node cluster Proxmox:
PVE0 : pve-manager/5.1-46/ae8241d4
PVE1 : pve-manager/5.1-46/ae8241d4
PVE2 : pve-manager/5.1-46/ae8241d4
PVE3 : pve-manager/5.0-30/5ab26bc
Networking:
Each HOST has 4 NICs
2 for Backend to SAN Switch (LACP 802.3ad) - one IP per HOST (no need for multi-path)
2 for Frontend to Network Switch (LACP 802.3ad) - one IP per HOST
OVS is used as a virtual switch:
cat /etc/network/interfaces (All HOSTs except IPs change from HOST to HOST):
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto eno2
iface eno2 inet manual
auto eno3
iface eno3 inet manual
mtu 8996
auto eno4
iface eno4 inet manual
mtu 8996
allow-vmbr1 bond1
iface bond1 inet manual
ovs_bonds eno1 eno2
ovs_type OVSBond
ovs_bridge vmbr1
ovs_options bond_mode=balance-tcp lacp=active
other_config lacp-time=fast trunks=11,12,13,14,15,16,30,31
allow-vmbr1 mgmt_vlan30
iface mgmt_vlan30 inet static
address 10.255.30.202
netmask 255.255.255.0
gateway 10.255.30.1
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=30
auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports bond1 mgmt_vlan30
allow-vmbr2 bond2
iface bond2 inet manual
ovs_bonds eno3 eno4
ovs_type OVSBond
ovs_bridge vmbr2
ovs_options bond_mode=balance-tcp lacp=active
other_config lacp-time=fast
mtu 8996
allow-vmbr2 iscsi_vlan5
iface iscsi_vlan5 inet static
address 10.0.5.12
netmask 255.255.255.0
ovs_type OVSIntPort
ovs_bridge vmbr2
mtu 8996
auto vmbr2
iface vmbr2 inet manual
ovs_type OVSBridge
ovs_ports bond2 iscsi_vlan5
mtu 8996
STORAGE:
SHARED by ALL OF THE HOSTS iSCSI SAN Storage (Nimble cs210 )
Problem:
After some time, LV Volumes disappear and VMs are obviously not bootable.
lvs shows no sign of LV.
Our findings (sometimes):
1. The VM that does not boot cannot find its lv disk
2. lv disk appear to be found on another data-store (Console output see below and GUI attached file do not match)
root@pve2:/dev/QA# ls -la
total 0
drwxr-xr-x 2 root root 100 Apr 6 10:10 .
drwxr-xr-x 24 root root 4940 Apr 6 10:10 ..
...
lrwxrwxrwx 1 root root 8 Apr 6 10:10 vm-132-disk-1 -> ../dm-20
...
Any thoughts ?? Please help
4 node cluster Proxmox:
PVE0 : pve-manager/5.1-46/ae8241d4
PVE1 : pve-manager/5.1-46/ae8241d4
PVE2 : pve-manager/5.1-46/ae8241d4
PVE3 : pve-manager/5.0-30/5ab26bc
Networking:
Each HOST has 4 NICs
2 for Backend to SAN Switch (LACP 802.3ad) - one IP per HOST (no need for multi-path)
2 for Frontend to Network Switch (LACP 802.3ad) - one IP per HOST
OVS is used as a virtual switch:
cat /etc/network/interfaces (All HOSTs except IPs change from HOST to HOST):
auto lo
iface lo inet loopback
auto eno1
iface eno1 inet manual
auto eno2
iface eno2 inet manual
auto eno3
iface eno3 inet manual
mtu 8996
auto eno4
iface eno4 inet manual
mtu 8996
allow-vmbr1 bond1
iface bond1 inet manual
ovs_bonds eno1 eno2
ovs_type OVSBond
ovs_bridge vmbr1
ovs_options bond_mode=balance-tcp lacp=active
other_config lacp-time=fast trunks=11,12,13,14,15,16,30,31
allow-vmbr1 mgmt_vlan30
iface mgmt_vlan30 inet static
address 10.255.30.202
netmask 255.255.255.0
gateway 10.255.30.1
ovs_type OVSIntPort
ovs_bridge vmbr1
ovs_options tag=30
auto vmbr1
iface vmbr1 inet manual
ovs_type OVSBridge
ovs_ports bond1 mgmt_vlan30
allow-vmbr2 bond2
iface bond2 inet manual
ovs_bonds eno3 eno4
ovs_type OVSBond
ovs_bridge vmbr2
ovs_options bond_mode=balance-tcp lacp=active
other_config lacp-time=fast
mtu 8996
allow-vmbr2 iscsi_vlan5
iface iscsi_vlan5 inet static
address 10.0.5.12
netmask 255.255.255.0
ovs_type OVSIntPort
ovs_bridge vmbr2
mtu 8996
auto vmbr2
iface vmbr2 inet manual
ovs_type OVSBridge
ovs_ports bond2 iscsi_vlan5
mtu 8996
STORAGE:
SHARED by ALL OF THE HOSTS iSCSI SAN Storage (Nimble cs210 )
Problem:
After some time, LV Volumes disappear and VMs are obviously not bootable.
lvs shows no sign of LV.
Our findings (sometimes):
1. The VM that does not boot cannot find its lv disk
2. lv disk appear to be found on another data-store (Console output see below and GUI attached file do not match)
root@pve2:/dev/QA# ls -la
total 0
drwxr-xr-x 2 root root 100 Apr 6 10:10 .
drwxr-xr-x 24 root root 4940 Apr 6 10:10 ..
...
lrwxrwxrwx 1 root root 8 Apr 6 10:10 vm-132-disk-1 -> ../dm-20
...
Any thoughts ?? Please help