HA Ceph Breaks on Migration to 3rd Node...

AZDNice

Member
Jul 1, 2020
5
0
21
55
During testing My HA-Ceph installation I received this error when trying to migrate to my 3rd node. It Migrated but stopped. I was then able to migrate back to one of the other 2 nodes and restart Be gentile I am very New to HA-Ceph. lol
Any insight you can provide would be great.

Also no problems migrating 1st to 2nd and back.....Just any going to 3rd. No error was present until I tried migrating node 1 to 3 or 2 to 3.

Thanks in advance for ANY time giving a response.
I do understand for the time being I could make a group pool and restrict migration between 1-2, but that defeat the purpose of a total HA cluster

root@bass:~# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 4.06088 root default
-3 2.00130 host bass
0 nvme 2.00130 osd.0 up 1.00000 1.00000
-7 1.81940 host daygo
1 ssd 1.81940 osd.1 up 1.00000 1.00000
-10 0.24019 host york
2 ssd 0.24019 osd.2 up 1.00000 1.00000

root@bass:~# ceph -s
cluster:
id: 241c8887-31d8-44f4-b252-c6e4eb5a14ed
health: HEALTH_WARN
Degraded data redundancy: 39/3354 objects degraded (1.163%), 4 pgs degraded, 4 pgs undersized

services:
mon: 3 daemons, quorum bass,daygo,york (age 19m)
mgr: bass(active, since 3h), standbys: daygo, york
osd: 3 osds: 3 up (since 2h), 3 in (since 2h); 1 remapped pgs

data:
pools: 2 pools, 129 pgs
objects: 1.12k objects, 4.2 GiB
usage: 220 GiB used, 3.8 TiB / 4.1 TiB avail
pgs: 39/3354 objects degraded (1.163%)
7/3354 objects misplaced (0.209%)
124 active+clean
4 active+undersized+degraded
1 active+clean+remapped

io:
client: 0 B/s rd, 85 B/s wr, 0 op/s rd, 0 op/s wr
 
Hi,
please share the full migration task log of a problematic migration as well as the output of pveversion -v from source and target node of the migration. Is there anything interesting in the system logs/journal around the time the issue happens?

root@bass:~# ceph osd tree
Code:
ID   CLASS  WEIGHT   TYPE NAME       STATUS  REWEIGHT  PRI-AFF
 -1         4.06088  root default                            
 -3         2.00130      host bass                           
  0   nvme  2.00130          osd.0       up   1.00000  1.00000
 -7         1.81940      host daygo                          
  1    ssd  1.81940          osd.1       up   1.00000  1.00000
-10         0.24019      host york                           
  2    ssd  0.24019          osd.2       up   1.00000  1.00000
Tip, you can use [CODE]output here[/CODE] tags to keep your information more readable.
It seems like the weight for the york/osd.2 is much lower than for the other two, maybe that is causing the issue?
 
root@york:~# pveversion -v
proxmox-ve: 8.2.0 (running kernel: 6.8.8-3-pve)
pve-manager: 8.2.4 (running version: 8.2.4/faa83925c9641325)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.8: 6.8.8-3
proxmox-kernel-6.8.8-3-pve-signed: 6.8.8-3
ceph: 18.2.2-pve1
ceph-fuse: 18.2.2-pve1
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx9
ksm-control-daemon: 1.5-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.1
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.4
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.7
libpve-cluster-perl: 8.0.7
libpve-common-perl: 8.2.1
libpve-guest-common-perl: 5.1.3
libpve-http-server-perl: 5.1.0
libpve-network-perl: 0.9.8
libpve-rs-perl: 0.8.9
libpve-storage-perl: 8.2.3
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 6.0.0-1
lxcfs: 6.0.0-pve2
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.2.7-1
proxmox-backup-file-restore: 3.2.7-1
proxmox-firewall: 0.4.2
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-widget-toolkit: 4.2.3
pve-cluster: 8.0.7
pve-container: 5.1.12
pve-docs: 8.2.2
pve-edk2-firmware: 4.2023.08-4
pve-esxi-import-tools: 0.7.1
pve-firewall: 5.0.7
pve-firmware: 3.13-1
pve-ha-manager: 4.0.5
pve-i18n: 3.2.2
pve-qemu-kvm: 9.0.0-6
pve-xtermjs: 5.3.0-3
qemu-server: 8.2.2
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.4-pve1