I am trying to build a new cluster. This will be my 3rd one in the last 3 years. I'd say im not a complete beginner. There are 3 nodes as a start. All nodes connected via switches with VLANs and bonding for Ceph network (2x10G).
Corosync and management are on separate physical interfaces. Via the bonding, i would send the 2 ceph networks and VM net.
I have this network config in all nodes:
I do the setup the usual way, install nodes, update nodes, create all network config and verify. All nodes can ping each other via all of their interfaces.
After i create the cluster, and verify connectivity, I start to install Ceph via the GUI on first node. I use the no-sub repo with reef (i tried quincy also), after package install, chose the 2 networks, clicked next, done. On this node, MON and MGR has the green pipe, everything looks OK.
When i go to the second node, do the same, install VIA GUI, it closes the shell window by itself, and saying "Got timeout (500)". It never goes until the last screen, like it should. After this, when i go to the 2nd node Ceph screen, it also states "Got timeout (500)".
pveversion-v on all nodes identical:
ceph -s gives me this on 1st node:
cluster:
id: 74a57cd3-5525-412e-bfa9-c20781d6e98c
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum pve1 (age 14m)
mgr: pve1(active, since 14m)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
On 2nd node:
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
I re-installed the cluster 3 times now, and i dont understand what goes wrong with a fresh setup like this. Did somebody ran into an issue similar to this?
I would apprecciate any help.
Thank You
Corosync and management are on separate physical interfaces. Via the bonding, i would send the 2 ceph networks and VM net.
I have this network config in all nodes:
Code:
auto lo
iface lo inet loopback
iface eno4 inet manual
#MGMT
auto eno1
iface eno1 inet manual
#BOND-1
auto eno2
iface eno2 inet manual
#BOND-2
auto eno3
iface eno3 inet static
address 10.25.20.3/24
#COROSYNC
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2
bond-miimon 100
bond-mode balance-xor
bond-xmit-hash-policy layer3+4
mtu 9582
#BOND0
auto vmbr0
iface vmbr0 inet static
address 10.231.10.3/24
gateway 10.231.10.1
bridge-ports eno4
bridge-stp off
bridge-fd 0
#MGMT-BR
auto vmbr1
iface vmbr1 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9582
#BOND-BR
auto vmbr2050
iface vmbr2050 inet static
address 10.205.20.3/24
bridge-ports bond0.2050
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9582
#CLUSTER
auto vmbr2060
iface vmbr2060 inet static
address 10.206.20.3/24
bridge-ports bond0.2060
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9582
#PUBLIC
auto vmbr100
iface vmbr100 inet manual
bridge-ports bond0.1
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
#LOSONCZI_LAN
source /etc/network/interfaces.d/*
I do the setup the usual way, install nodes, update nodes, create all network config and verify. All nodes can ping each other via all of their interfaces.
After i create the cluster, and verify connectivity, I start to install Ceph via the GUI on first node. I use the no-sub repo with reef (i tried quincy also), after package install, chose the 2 networks, clicked next, done. On this node, MON and MGR has the green pipe, everything looks OK.
When i go to the second node, do the same, install VIA GUI, it closes the shell window by itself, and saying "Got timeout (500)". It never goes until the last screen, like it should. After this, when i go to the 2nd node Ceph screen, it also states "Got timeout (500)".
pveversion-v on all nodes identical:
Code:
proxmox-ve: 8.2.0 (running kernel: 6.8.4-3-pve)
pve-manager: 8.2.2 (running version: 8.2.2/9355359cd7afbae4)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.4-3
proxmox-kernel-6.8.4-3-pve-signed: 6.8.4-3
proxmox-kernel-6.8.4-2-pve-signed: 6.8.4-2
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.6
libpve-cluster-perl: 8.0.6
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.2
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.2.1
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.3-1
proxmox-backup-file-restore: 3.2.3-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.6
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.6
pve-container: 5.1.10
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.0
pve-firewall: 5.0.7
pve-firmware: 3.11-1
pve-ha-manager: 4.0.4
pve-i18n: 3.2.2
pve-qemu-kvm: 8.1.5-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve2
ceph -s gives me this on 1st node:
cluster:
id: 74a57cd3-5525-412e-bfa9-c20781d6e98c
health: HEALTH_WARN
OSD count 0 < osd_pool_default_size 3
services:
mon: 1 daemons, quorum pve1 (age 14m)
mgr: pve1(active, since 14m)
osd: 0 osds: 0 up, 0 in
data:
pools: 0 pools, 0 pgs
objects: 0 objects, 0 B
usage: 0 B used, 0 B / 0 B avail
pgs:
On 2nd node:
Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)')
I re-installed the cluster 3 times now, and i dont understand what goes wrong with a fresh setup like this. Did somebody ran into an issue similar to this?
I would apprecciate any help.
Thank You