I have a 3 (S1, S2, S3) node CEPH cluster running plenty VMs and CTS.
At the moment I have a 4th server (S4/50TB) running baremetal TrueNAS which provides NFS share for the whole cluster for backups.
I also run Media server in a CT that uses S4 for data storage (no backups for media storage...
I actually found my issue. The drives had old zpool from previous installation on them.
I loaded gparted live boot, wiped everything clean and then had no issue at all.
Also pretty much everytime, adding "nomodeset" to launch params is helping me.
Thank you, updating the 10G subnet to /24 helped and at least I can ping each other but now I am not able to migrate still.
"2023-12-03 18:39:02 100-0: end replication job with error: command '/usr/bin/ssh -e none -o 'BatchMode=yes' -o 'HostKeyAlias=myst2' root@192.168.1.202 pvecm mtunnel...
Spent a while today learning a bit about networking, traffic and setup 3 different networks on my 3 node cluster.
Each of 3 node have 2x 1GB and 2x 10GB ports.
I used eno3 for management, eno4 for cluster and eno1/eno2 for point to point full mesh. Now when I try to start replication, I am...
I have a 3 node cluster, 3 identical Dell R630s. All were running Proxmox7 and eventually updated them to 8 ... all good.
Having played with stuff, decided upgrade storage, and do a clean installation on 3.
Got dell bios/idrac updated on all etc.
Now 2 of my nodes got installation done...
I have 3 node proxmox cluster on a managed switch (192.168.1.150/151/152)
All VMs are on VLAN 10.
All nodes have 2x1G + 2x 10G. Nodes are running ceph cluster. For each node,
1 x 1G (vmbr0) used for management (192.168.1.150/151/152)
2x 10G on used for IPV6/ospf6 Ring network but network unused...
I asked about this during setting up and people said it wont help much to use second interface on server.
If I am hearing you right.... Should I do this and use those 4 extra 10G ports on my netgear managed switch. Does it look right?
Or should I just skip 1G connection and use 10G port only.
I guess I wont be angry if put my name up in most stupid person of the month on top of this forum.
One of the node was not fully updated and repo was bad on that one. The thing is that when you update proxmox from webshell (running 3 nodes) and switch node while one node is updating... you dont...
I definitely didnt mess with external tools until exhausted other options. CEPH was non-responsive before. You think reinstall and reimporting OSDS is possible. I just dont want to lose the data that I thought is copied on 6 discs :)
Another info. A friend said RADOS is corrupted because of this message.
root@mystic1:/var/log/ceph# systemctl status ceph-radosgw.target
Unit ceph-radosgw.target could not be found.
root@mystic1:/var/log/ceph# systemctl status...
Hmmm , now suspecting that this whole thing also has to do something with following error. Should I try to update Ceph to Reef or something?
root@mystic1:~# ./cephadm install
Installing packages ['cephadm']...
Non-zero exit code 100 from apt-get install -y cephadm
apt-get: stdout Reading...
Also I tried to install cephadm and upgrade proxmox but now getting this error. Wondering if some packages are damaged may be? Can I reinstall ceph without losing data on my disks?
Setting up proxmox-kernel-6.2 (6.2.16-12) ...
Errors were encountered while processing:
cephadm
E: Sub-process...
Below is what I see with "ceph --admin-daemon /run/ceph/ceph-mon.localhost.asok mon_status"
No quorum .. I suspect monitor is not running on node2/3 may be?
The command runs on node1 (mystic1 even though monitor name is localhost). Command doesnt run on node2 and 3. Error says "admin_socket...
Another question would be, can I just setup ceph again and import these disk somehow without losing data?
Its not like data is super important but now that things have failed, I should learn recovery rather than fresh install. I still have about 20 LXC running.
1) PVE nodes are NOT on vlan (192.168.1.1). Though they can ping 192.168.10.1
auto lo
iface lo inet loopback
iface eno3 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.1.150/24
gateway 192.168.1.1
bridge-ports eno3
bridge-stp off
bridge-fd...
1) ceph status returns timeout as well.
2) Checked corosync.conf and ceph.conf and all looks ok
3) All hosts can ping each other.
My new setup : Modem - Firewall - Managed switch (VLAN config) - Unamanaged switch - Nodes
My old setup : Modem - Firewall - Managed switch (VLAN config) -...
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.