So I have been fighting this issue for a while, but can not seem to figure out what is happening. My setup is this:
ProxmoxVE 7.4-16
I have 2 datacenters, each datacenter has 3 hosts. My main VLAN (Proxmox) is separate from my CEPH VLAN. On both datacenters I have 1 complete host that refuses to get all OSD's from all hosts in a up and in the state. The issue tends to be isolated to a single host and found on all 4 OSD's, but the host does change.
I will post details from a single host as I am sure whatever the issue is if I can resolve it on the 1 datacenter, I should be able to follow that and resolve the other datacenter:
pve version:
ceph.conf
crush map
My network configuration
Any assistance here would be greatly appreciated!
ProxmoxVE 7.4-16
I have 2 datacenters, each datacenter has 3 hosts. My main VLAN (Proxmox) is separate from my CEPH VLAN. On both datacenters I have 1 complete host that refuses to get all OSD's from all hosts in a up and in the state. The issue tends to be isolated to a single host and found on all 4 OSD's, but the host does change.
I will post details from a single host as I am sure whatever the issue is if I can resolve it on the 1 datacenter, I should be able to follow that and resolve the other datacenter:
pve version:
Code:
proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.74-1-pve: 5.15.74-1
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx4
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1
Code:
[global]
auth_client_required = cephx
auth_cluster_required = cephx
auth_service_required = cephx
cluster_network = Y.Y.10.Y/27
fsid = [redacted]
mon_allow_pool_delete = true
mon_host = X.X.X.66 X.X.X.67 X.X.X.68
ms_bind_ipv4 = true
ms_bind_ipv6 = false
osd_pool_default_min_size = 2
osd_pool_default_size = 3
public_network = X.X.X.66/27
[client]
keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.vmhost1]
host = vmhost1
mds_standby_for_name = pve
[mds.vmhost2]
host = vmhost2
mds_standby_for_name = pve
[mds.vmhost3]
host = vmhost3
mds standby for name = pve
[mon.vmhost1]
public_addr = X.X.X.66
[mon.vmhost2]
public_addr = X.X.X.67
[mon.vmhost3]
public_addr = X.X.X.68
crush map
Code:
# begin crush map
tunable choose_local_tries 0
tunable choose_local_fallback_tries 0
tunable choose_total_tries 50
tunable chooseleaf_descend_once 1
tunable chooseleaf_vary_r 1
tunable chooseleaf_stable 1
tunable straw_calc_version 1
tunable allowed_bucket_algs 54
# devices
device 0 osd.0 class ssd
device 1 osd.1 class ssd
device 2 osd.2 class ssd
device 3 osd.3 class ssd
device 4 osd.4 class ssd
device 5 osd.5 class ssd
device 6 osd.6 class ssd
device 7 osd.7 class ssd
device 8 osd.8 class hdd
device 9 osd.9 class hdd
device 10 osd.10 class hdd
device 11 osd.11 class hdd
# types
type 0 osd
type 1 host
type 2 chassis
type 3 rack
type 4 row
type 5 pdu
type 6 pod
type 7 room
type 8 datacenter
type 9 zone
type 10 region
type 11 root
# buckets
host vmhost4 {
id -3 # do not change unnecessarily
id -2 class ssd # do not change unnecessarily
id -8 class hdd # do not change unnecessarily
# weight 3.49316
alg straw2
hash 0 # rjenkins1
item osd.0 weight 0.87329
item osd.1 weight 0.87329
item osd.2 weight 0.87329
item osd.3 weight 0.87329
}
host vmhost5 {
id -5 # do not change unnecessarily
id -4 class ssd # do not change unnecessarily
id -9 class hdd # do not change unnecessarily
# weight 3.49316
alg straw2
hash 0 # rjenkins1
item osd.4 weight 0.87329
item osd.5 weight 0.87329
item osd.6 weight 0.87329
item osd.7 weight 0.87329
}
host vmhost6 {
id -10 # do not change unnecessarily
id -6 class ssd # do not change unnecessarily
id -11 class hdd # do not change unnecessarily
# weight 3.63678
alg straw2
hash 0 # rjenkins1
item osd.8 weight 0.90919
item osd.9 weight 0.90919
item osd.10 weight 0.90919
item osd.11 weight 0.90919
}
root default {
id -1 # do not change unnecessarily
id -7 class ssd # do not change unnecessarily
id -12 class hdd # do not change unnecessarily
# weight 10.62311
alg straw2
hash 0 # rjenkins1
item vmhost4 weight 3.49316
item vmhost5 weight 3.49316
item vmhost6 weight 3.63678
}
# rules
rule replicated_rule {
id 0
type replicated
step take default
step chooseleaf firstn 0 type host
step emit
}
# end crush map
My network configuration
Code:
auto lo
iface lo inet loopback
####################
# 1G interfaces
auto eno1
iface eno1 inet manual
mtu 9000
auto eno2
iface eno2 inet manual
mtu 9000
auto enp175s0f0
iface enp175s0f0 inet manual
mtu 9000
auto enp175s0f1
iface enp175s0f1 inet manual
mtu 9000
auto enp59s0f0
iface enp59s0f0 inet manual
mtu 9000
auto enp59s0f1
iface enp59s0f1 inet manual
mtu 9000
auto bond0
iface bond0 inet manual
bond-slaves eno1 eno2 enp175s0f0 enp175s0f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
mtu 9000
#1GB Data Network
auto bond1
iface bond1 inet manual
bond-slaves enp59s0f0 enp59s0f1
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer2+3
mtu 9000
#10G CEPH network
auto bond1.102
iface bond1.102 inet manual
mtu 9000
#CEPH VLAN 102
auto bond0.101
iface bond0.101 inet manual
mtu 9000
#KVM VLAN 101
auto vmbr1
iface vmbr1 inet static
address X.X.X.66/27
gateway X.X.X.65
bridge-ports bond0.101
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
mtu 9000
bridge-vds 101
#KVM VLAN
auto vmbr2
iface vmbr2 inet manual
bridge-ports bond0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 100 103 104 105 106 107 108 109 110 111 112 113 114 115
mtu 9000
#1G DATA Network
auto cephbr0
iface cephbr0 inet static
address Y.Y.10.1/27
bridge_ports bond1.102
bridge_stp off
bridge_vids 102
bridge_vlan_aware 1
mtu 9000
auto ep59s0f0
iface ep59s0f0 inet manual
mtu 9000
post-up ip route add default via X.X.X.65 dev bond0.101
Any assistance here would be greatly appreciated!