in the last 6 hours no new hangs occurred. [ hangs = blocked for more than 122 seconds ] . i call hang because when that occurs keyboard hangs. certainly inside a KVM , not sure if at PVE cli ]
there are 3 nodes and a few kvm;s with hangs in...
from the new node. note we have not moved the subscription over so it is using testing
# pveversion -v
proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve)
pve-manager: 9.0.5 (running version: 9.0.5/9c5600b249dbfd2f)
proxmox-kernel-helper: 9.0.3...
using pve 9.0.5 . 5 node ceph cluster. nodes have a mix of zfs and non zfs root/boot disks along with one large nvme formatted ext4 for vzdumps. we also use pbs .
i have a cronscript which we have used for years that checks this:
dmesg -T |...
we had a crashed node. it is deleted from cluster. we did not have a chance to remove it as a ceph monitor. [ we did replace the monitor ].
ceph -s shows:
health: HEALTH_WARN
..
mon: 4 daemons, quorum pve11,pve2,pve5 (age 7m), out of...
generally I destroy the OSD on old node , move it to new node then create a new osd.
Or is there a stable way to do the osd ? A few months ago I checked threads and had no luck. Also I did not see an osd move in manual.
So I read the above, have a rpool device to replace. I read https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_zfs_administration and want to verify that just this one step is needed:
zpool replace -f <pool> <old-device> <new-device>