Howdy,
First post here, I've tried to figured this out on my own but not working great for me.
I have a 3 node proxmox setup (r710s with quad 1Gb onboard NICs and add on 1 port 10Gb NIC eventually 2)
Only 1 Gb NIC is connected and the 10Gb NIC is connected to a 10Gb switch (eventually 2 separate switches)
Currently still learning and testing so breaking this isn't a big deal right now.
Each node has h700 controllers (which is a mistake, I'll replace them with h200 so can give the drives to ceph without using the dreaded R0 configuration).
There are 2 147Gb SAS in R1 for the Proxmox OS drives and added 2x3Tb NAS drives to test CEPH.
Public network 10.0.0.0/16
Storage network 10.1.1.0/24
Nodes - prox01, prox02, prox03
When I originally setup the CEPH cluster I used the IP for the 1Gb NIC as the public network and the 10Gb as the storage network. Where this all worked and the ceph cluster was up and running I did a speed test on a container and a VM built in Proxmox. These were stored on Ceph. The speeds were ok but not as fast as I would have expected, like maxing out a 1Gb link.
That's when I got to thinking that Proxmox is using the public network to connect to CEPH for it's VMs/CTs. So I attempted to change the public network to 10.1.1.x
I created the new monitors with the following commands:
I did that on all three nodes. Once proxmox showed Quorum Yes I then removed the old monitors that were on the 10.0.0.0/16 network. Then I edited the ceph.conf file, removed the old monitors and added the new ones.
Since then CEPH hasn't connected properly. The cluster looks to be running properly as I can access the VMs/CTs that are stored on ceph but Proxmox does show anything other than "HEALTH_WARN - no active mgr"
Snip of error from log file:
mon.prox01a mon.0 10.1.1.1:6789/0 59228 : cluster [WRN] overall HEALTH_WARN no active mgr
Now that the back story is out of the way, 2 questions:
How can I change the system setup to use the 10Gb NICs for both OSDs and proxmox 'public' connectors?
If there is no way to alter this nicely, should I just destroy all the nodes and rebuild to save time?
Thanks!
First post here, I've tried to figured this out on my own but not working great for me.
I have a 3 node proxmox setup (r710s with quad 1Gb onboard NICs and add on 1 port 10Gb NIC eventually 2)
Only 1 Gb NIC is connected and the 10Gb NIC is connected to a 10Gb switch (eventually 2 separate switches)
Currently still learning and testing so breaking this isn't a big deal right now.
Each node has h700 controllers (which is a mistake, I'll replace them with h200 so can give the drives to ceph without using the dreaded R0 configuration).
There are 2 147Gb SAS in R1 for the Proxmox OS drives and added 2x3Tb NAS drives to test CEPH.
Public network 10.0.0.0/16
Storage network 10.1.1.0/24
Nodes - prox01, prox02, prox03
When I originally setup the CEPH cluster I used the IP for the 1Gb NIC as the public network and the 10Gb as the storage network. Where this all worked and the ceph cluster was up and running I did a speed test on a container and a VM built in Proxmox. These were stored on Ceph. The speeds were ok but not as fast as I would have expected, like maxing out a 1Gb link.
That's when I got to thinking that Proxmox is using the public network to connect to CEPH for it's VMs/CTs. So I attempted to change the public network to 10.1.1.x
I created the new monitors with the following commands:
cd /home
mkdir tmp
ceph auth get mon. -o tmp/key-ceph-prox01a
ceph mon getmap -o tmp/map-ceph-prox01a
ceph-mon -i prox01a --mkfs --monmap tmp/map-ceph-prox01a --keyring tmp/key-ceph-prox01a
chown ceph:ceph -Rf /var/lib/ceph/mon/ceph-prox01a/
ceph-mon -i prox01a --public-addr 10.1.1.1:6789
mkdir tmp
ceph auth get mon. -o tmp/key-ceph-prox01a
ceph mon getmap -o tmp/map-ceph-prox01a
ceph-mon -i prox01a --mkfs --monmap tmp/map-ceph-prox01a --keyring tmp/key-ceph-prox01a
chown ceph:ceph -Rf /var/lib/ceph/mon/ceph-prox01a/
ceph-mon -i prox01a --public-addr 10.1.1.1:6789
I did that on all three nodes. Once proxmox showed Quorum Yes I then removed the old monitors that were on the 10.0.0.0/16 network. Then I edited the ceph.conf file, removed the old monitors and added the new ones.
Since then CEPH hasn't connected properly. The cluster looks to be running properly as I can access the VMs/CTs that are stored on ceph but Proxmox does show anything other than "HEALTH_WARN - no active mgr"
Snip of error from log file:
mon.prox01a mon.0 10.1.1.1:6789/0 59228 : cluster [WRN] overall HEALTH_WARN no active mgr
Now that the back story is out of the way, 2 questions:
How can I change the system setup to use the 10Gb NICs for both OSDs and proxmox 'public' connectors?
If there is no way to alter this nicely, should I just destroy all the nodes and rebuild to save time?
Thanks!