Hello,
I just installed a new node (v8.3.4) and added it to my Proxmox cluster (nodes running v8.2.7), but for some reason, it is not able to connect to my external Ceph cluster; the two storage drives I have just show with grey question marks on them, and nothing I have done will allow it to connect. I have the networking and MTUs set identically to my other two hosts.
Here is the interfaces file from the new node:
The vmbr1.22 VLAN interface is the connection to the storage VLAN where the Ceph cluster is located.
and here is the interfaces file from one of my nodes that can connect to the Ceph storage:
Except for the obvious things like interface names and IP addresses, I am not seeing any difference, but maybe another set of eyes or two can spot one?
I can, of course, ping through the vmbr1.22 interface IP to the 10.22.0.x IPs of the Ceph nodes, so there *is* connectivity to the Ceph cluster. I have verified with the network admin who manages the switches that the two ports the 10GbE interfaces are connected to are configured as an LACP bonded pair, and that the MTU is set to 9000 on both interfaces as well as the LACP bond itself (he even sent me a screenshot of the config)
I did also check the '/etc/ceph' directory and the admin.client.keyring and ceph.conf files were missing, so I copied them over from one of the working nodes and then rebooted the new node, but that didn't fix anything; the Ceph storage still show up with a grey question mark on them and are inacessible. The journalctl shows timeouts (not surprisingly), but otherwise nothing at all helpful.
I am not sure what else to look at, or why else the new host cannot connect to the Ceph cluster?
The only thing I can think of is that maybe the node is trying to connect through the management connection (which is only 1Gbit), which the management VLAN is able to access. The idea of adding the vmbr1.22 VLAN interface was so that the nodes have a direct connection to the storage VLAN, so any traffic destined for it *should* automatically go out that interface as it is a lower-cost route.
I can, of course, provide any other info you might need.
Your insight is appreciated
I just installed a new node (v8.3.4) and added it to my Proxmox cluster (nodes running v8.2.7), but for some reason, it is not able to connect to my external Ceph cluster; the two storage drives I have just show with grey question marks on them, and nothing I have done will allow it to connect. I have the networking and MTUs set identically to my other two hosts.
Here is the interfaces file from the new node:
Code:
auto lo
iface lo inet loopback
auto eno2
iface eno2 inet manual
#1GbE
auto eno1
iface eno1 inet manual
#1GbE
auto ens1f0
iface ens1f0 inet manual
mtu 9000
#10GbE
auto ens1f1
iface ens1f1 inet manual
mtu 9000
#10GbE
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode active-backup
bond-primary eno1
#Mgmt Network Bond interface
auto bond1
iface bond1 inet manual
bond-slaves ens1f0 ens1f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9000
#VM Network Bond interface
auto vmbr0
iface vmbr0 inet static
address 10.3.127.16/24
gateway 10.3.127.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
#Management Network
auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9000
#VM Network
auto vmbr1.22
iface vmbr1.22 inet static
address 10.22.0.16/24
mtu 8972
#Storage Network
source /etc/network/interfaces.d/*
The vmbr1.22 VLAN interface is the connection to the storage VLAN where the Ceph cluster is located.
and here is the interfaces file from one of my nodes that can connect to the Ceph storage:
Code:
auto lo
iface lo inet loopback
auto eno8303
iface eno8303 inet manual
#1GbE
auto eno8403
iface eno8403 inet manual
#1GbE
auto eno12399np0
iface eno12399np0 inet manual
mtu 9000
#10GbE
auto eno12409np1
iface eno12409np1 inet manual
mtu 9000
#10GbE
auto bond1
iface bond1 inet manual
bond-slaves eno12399np0 eno12409np1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9000
#VM Network Bond interface
auto bond0
iface bond0 inet manual
bond-slaves eno8303
bond-miimon 100
bond-mode active-backup
bond-primary eno8303
#Mgmt Network Bond interface
auto vmbr0
iface vmbr0 inet static
address 10.3.127.14/24
gateway 10.3.127.1
bridge-ports bond0
bridge-stp off
bridge-fd 0
#Management Network
auto vmbr1
iface vmbr1 inet manual
bridge-ports bond1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9000
#VM Network
auto vmbr1.22
iface vmbr1.22 inet static
address 10.22.0.14/24
mtu 8972
#Storage Network
Except for the obvious things like interface names and IP addresses, I am not seeing any difference, but maybe another set of eyes or two can spot one?
I can, of course, ping through the vmbr1.22 interface IP to the 10.22.0.x IPs of the Ceph nodes, so there *is* connectivity to the Ceph cluster. I have verified with the network admin who manages the switches that the two ports the 10GbE interfaces are connected to are configured as an LACP bonded pair, and that the MTU is set to 9000 on both interfaces as well as the LACP bond itself (he even sent me a screenshot of the config)
I did also check the '/etc/ceph' directory and the admin.client.keyring and ceph.conf files were missing, so I copied them over from one of the working nodes and then rebooted the new node, but that didn't fix anything; the Ceph storage still show up with a grey question mark on them and are inacessible. The journalctl shows timeouts (not surprisingly), but otherwise nothing at all helpful.
I am not sure what else to look at, or why else the new host cannot connect to the Ceph cluster?
The only thing I can think of is that maybe the node is trying to connect through the management connection (which is only 1Gbit), which the management VLAN is able to access. The idea of adding the vmbr1.22 VLAN interface was so that the nodes have a direct connection to the storage VLAN, so any traffic destined for it *should* automatically go out that interface as it is a lower-cost route.
I can, of course, provide any other info you might need.
Your insight is appreciated
