I actually ended up doing a backup of the VMs I wanted and reinstalling. I previously blew up a Ceph install, as well as a service that doesn't work with the new version of Proxmox (iDRAC Service Module), so I figured it was better to have a clean slate anyway.
Thank you though!
I'll give that a shot today, thank you. I only tried rebooting,
As far as the IP address of node1 being the networks gateway:
I just split the network from being one massive /16 network into multiple smaller /24 networks. PVE-1 had an IP address of .1, which is now that subnet's default...
So, I was following the directions to separate the cluster network, and things started to seem like they were going alright with the first node.
However, when I rebooted the node, Datacenter view\Cluster says, missing ':' after key 'interface' (500). I then check the /etc/pve/corosync.conf file...
Here's the solution for this, according to a page that was linked by @adamb
I just confirmed it working, and the change didn't require a reboot (well, I rebooted to test the HA, but the test VM migrated successfully, with no "after config" reboots)
Thomas Lamprecht 2019-01-07 14:02:28 CET
So, I ran
apt remove ceph ceph-base ceph-mon ceph-osd that seems to have worked fine, but pveceph purge only completed silently on one node (PVE-Witness)
The other two returned:
root@PVE-1:~# pveceph purge
unable to get monitor info from DNS SRV with service name: ceph-mon
Thank you, tried that and now:
root@PVE-2:~# ceph-mon -i PVE-2 --mkfs
2019-06-28 11:00:25.671562 7f0e2cc90100 -1 mon.PVE-2@-1(probing) e0 unable to find a keyring file on /etc/pve/priv/ceph.mon.PVE-2.keyring: (2) No such file or directory
2019-06-28 11:00:25.671607 7f0e2cc90100 -1 ceph-mon...
I was able to inject the tmpfile on PVE-1 and PVE-Witness (I had to delete /var/lib/ceph/mon/ceph-PVE-Witness/store.db/LOCK) but it completed for those two.
However for PVE-2, /var/lib/ceph/mon/ceph-PVE-2/store.db doesn't exist, and I didn't delete it...
I created the directories, and it still...
By "start the ceph-mon on this node" i'm assuming "service ceph start mon"?
I did that, and attempted to "systemctl start ceph.service" same thing.
Assuming the '-m' on the ceph command to specify the IP of the MON needs to be done before I can start the ceph.service, but I'm not sure which ceph...
Well, I'm embarrassed about that part. I even did a search for it and didn't find it previously.
So, I'm able to inject now, but root@PVE-1:~# ceph mon getmap -o tmpfile still returns timeout. What else do I need to get this to a healthy state?
Does ceph.service need to be running for that...
The LOCK file was able to be deleted, and now trying inject:
root@PVE-1:/var/lib/ceph/mon/ceph-PVE-1/store.db# ceph-mon -i PVE-1 --inject-monmap tmpfile
2019-05-21 22:38:49.287841 7f9d2c9de100 -1 unable to read monmap from tmpfile: can't open tmpfile: (2) No such file or directory