Upgraded to this:
	
	
	
		
All nodes are the same version. Running a fully licensed cluster.
On boot a node will seem to cycle ceph osds in and out of service, monitor/manager on one node is down and I can't re-add it. We can't keep the cephFS mounts up so we can't load any VMs.
Syslog seems to report that osd heartbeat is trying to use the ceph 'public' network - is this right? Shouldn't it use the ceph cluster network? Public heartbeats are failing, and I'm not sure why so I'll try to figure that out - but I don't want this to be on the public network.
So what to do? I've been cruising the forums to try and fix this, but I'm really scratching my head.
Here's my ceph config:
	
	
	
		
And you can see that the osd's are trying to get heartbeat over the public network:
	
	
	
		
Help!
				
			
		Code:
	
	proxmox-ve: 7.4-1 (running kernel: 5.15.108-1-pve)
pve-manager: 7.4-16 (running version: 7.4-16/0f39f621)
pve-kernel-5.15: 7.4-4
pve-kernel-5.15.108-1-pve: 5.15.108-2
pve-kernel-5.15.104-1-pve: 5.15.104-2
ceph: 17.2.6-pve1
ceph-fuse: 17.2.6-pve1
corosync: 3.1.7-pve1
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown: residual config
ifupdown2: 3.1.0-1+pmx4
libjs-extjs: 7.0.0-1
libknet1: 1.24-pve2
libproxmox-acme-perl: 1.4.4
libproxmox-backup-qemu0: 1.3.1-1
libproxmox-rs-perl: 0.2.1
libpve-access-control: 7.4.1
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.4-2
libpve-guest-common-perl: 4.2-4
libpve-http-server-perl: 4.2-3
libpve-rs-perl: 0.7.7
libpve-storage-perl: 7.4-3
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 5.0.2-2
lxcfs: 5.0.3-pve1
novnc-pve: 1.4.0-1
proxmox-backup-client: 2.4.3-1
proxmox-backup-file-restore: 2.4.3-1
proxmox-kernel-helper: 7.4-1
proxmox-mail-forward: 0.1.1-1
proxmox-mini-journalreader: 1.3-1
proxmox-offline-mirror-helper: 0.5.2
proxmox-widget-toolkit: 3.7.3
pve-cluster: 7.3-3
pve-container: 4.4-6
pve-docs: 7.4-2
pve-edk2-firmware: 3.20230228-4~bpo11+1
pve-firewall: 4.3-5
pve-firmware: 3.6-5
pve-ha-manager: 3.6.1
pve-i18n: 2.12-1
pve-qemu-kvm: 7.2.0-8
pve-xtermjs: 4.16.0-2
qemu-server: 7.4-4
smartmontools: 7.2-pve3
spiceterm: 3.2-2
swtpm: 0.8.0~bpo11+3
vncterm: 1.7-1
zfsutils-linux: 2.1.11-pve1All nodes are the same version. Running a fully licensed cluster.
On boot a node will seem to cycle ceph osds in and out of service, monitor/manager on one node is down and I can't re-add it. We can't keep the cephFS mounts up so we can't load any VMs.
Syslog seems to report that osd heartbeat is trying to use the ceph 'public' network - is this right? Shouldn't it use the ceph cluster network? Public heartbeats are failing, and I'm not sure why so I'll try to figure that out - but I don't want this to be on the public network.
So what to do? I've been cruising the forums to try and fix this, but I'm really scratching my head.
Here's my ceph config:
		Code:
	
	[global]
     auth_client_required = cephx
     auth_cluster_required = cephx
     auth_service_required = cephx
     cluster_network = 10.128.16.0/24
     fsid = 2c88d85e-8a28-4cdc-800e-1979903a8d09
     mon_allow_pool_delete = true
     mon_host = 10.128.16.11 10.128.16.12 10.128.16.10
     ms_bind_ipv4 = true
     ms_bind_ipv6 = false
     osd_pool_default_min_size = 2
     osd_pool_default_size = 3
     public_network = 10.128.18.0/24
[client]
     keyring = /etc/pve/priv/$cluster.$name.keyring
[mds]
     keyring = /var/lib/ceph/mds/ceph-$id/keyring
[mds.VAN3PM1]
     host = VAN3PM1
     mds standby for name = pve
[mds.VAN3PM2]
     host = VAN3PM2
     mds_standby_for_name = pve
[mds.VAN3PM3]
     host = VAN3PM3
     mds_standby_for_name = pve
[mon.VAN3PM1]
     public_addr = 10.128.16.10
[mon.VAN3PM2]
     public_addr = 10.128.16.11
[mon.VAN3PM3]
     public_addr = 10.128.16.12And you can see that the osd's are trying to get heartbeat over the public network:
		Code:
	
	Aug 10 14:36:46 VAN3PM1 ceph-osd[38621]: 2023-08-10T14:36:46.066-0700 7fdf06b8c700 -1 osd.36 8808 heartbeat_check: no reply from 10.128.18.12:6810 osd.34 ever on either front or back, first ping sent 2023-08-10T14:33:17.768929-0700 (oldest deadline 2023-08-10T14:33:37.768929-0700)Help!
			
				Last edited: 
				
		
	
										
										
											
	
										
									
								 
	 
	 
 
		

