Search results

  1. R

    INFO: task kworker blocked for more than 122 seconds

    I also noticed the ext4 part. However the issue starts 100% of the time when a node is in the process of shutting down for a reboot - shortly after the point it turns off osd , the hang starts at some or all of the remaining nodes. I did not think that ext4 had hnything to do with ceph...
  2. R

    INFO: task kworker blocked for more than 122 seconds

    Also: has anyone else had the same issue? It could be that I have done something uniquely wrong.
  3. R

    INFO: task kworker blocked for more than 122 seconds

    so with no node restarts there have been no hangs on pve nodesand kvm's . I have this set in sysctl.d since 2019 per https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments#Sample-sysctlconf could these be causing an issue? fs.file-max = 6553600...
  4. R

    [PVE 8] ZFS Mirror Boot Pool on PVE: Guide to Replace Failing Disk in 2025?

    also one thing I ran into is that I tried to reboot the system after removing one of the rpool drives. I was unable to run zpool attach because for some reason the newly installed drive was in use. so i tried to reboot. reboot failed . changing the boot device in bios would have fixed the...
  5. R

    INFO: task kworker blocked for more than 122 seconds

    hung KVM's run both bookworm and trixie. so the kvm kernel version is probably not at fault. also the hang does not persist.
  6. R

    INFO: task kworker blocked for more than 122 seconds

    in the last 6 hours no new hangs occurred. [ hangs = blocked for more than 122 seconds ] . i call hang because when that occurs keyboard hangs. certainly inside a KVM , not sure if at PVE cli ] there are 3 nodes and a few kvm;s with hangs in dmesg in all cases the time of hang occurred...
  7. R

    INFO: task kworker blocked for more than 122 seconds

    other 4 nodes use enterprise repo # pveversion -v proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve) pve-manager: 9.0.5 (running version: 9.0.5/9c5600b249dbfd2f) proxmox-kernel-helper: 9.0.3 proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2 proxmox-kernel-6.14: 6.14.8-2...
  8. R

    INFO: task kworker blocked for more than 122 seconds

    from the new node. note we have not moved the subscription over so it is using testing # pveversion -v proxmox-ve: 9.0.0 (running kernel: 6.14.8-2-pve) pve-manager: 9.0.5 (running version: 9.0.5/9c5600b249dbfd2f) proxmox-kernel-helper: 9.0.3 proxmox-kernel-6.14.8-2-pve-signed: 6.14.8-2...
  9. R

    INFO: task kworker blocked for more than 122 seconds

    using pve 9.0.5 . 5 node ceph cluster. nodes have a mix of zfs and non zfs root/boot disks along with one large nvme formatted ext4 for vzdumps. we also use pbs . i have a cronscript which we have used for years that checks this: dmesg -T | grep hung | grep -v vethXChung ## **URGENT**...
  10. R

    [SOLVED] ceph - how to remove a monitor for a deleted node

    we had a crashed node. it is deleted from cluster. we did not have a chance to remove it as a ceph monitor. [ we did replace the monitor ]. ceph -s shows: health: HEALTH_WARN .. mon: 4 daemons, quorum pve11,pve2,pve5 (age 7m), out of quorum: pve4 how can I delete a monitor assigned...
  11. R

    [SOLVED] osd move to new server

    generally I destroy the OSD on old node , move it to new node then create a new osd. Or is there a stable way to do the osd ? A few months ago I checked threads and had no luck. Also I did not see an osd move in manual.
  12. R

    [PVE 8] ZFS Mirror Boot Pool on PVE: Guide to Replace Failing Disk in 2025?

    I see there are more steps to do. the above only takes care of partition 3 Okay thanks.
  13. R

    [PVE 8] ZFS Mirror Boot Pool on PVE: Guide to Replace Failing Disk in 2025?

    So I read the above, have a rpool device to replace. I read https://pve.proxmox.com/pve-docs/chapter-sysadmin.html#_zfs_administration and want to verify that just this one step is needed: zpool replace -f <pool> <old-device> <new-device>
  14. R

    [SOLVED] after upgrade to 9, one node has ceph fail

    to find device name 1 - plug in cable 2 - dmesg: [Fri Aug 8 09:19:59 2025] mlx5_core 0000:a8:00.1: Port module event: module 1, Cable plugged 3 - find device name ls /sys/bus/pci/devices/0000\:a8\:00.1/net/ and this is a great tool, I ran this first: see manual...
  15. R

    [SOLVED] after upgrade to 9, one node has ceph fail

    I found the cause. bad node can not reach cluster network.. this is the only node that i did not use systemctl aliases for network interface names...
  16. R

    [SOLVED] after upgrade to 9, one node has ceph fail

    osd's are mounted at bad node: tmpfs tmpfs 378G 24K 378G 1% /var/lib/ceph/osd/ceph-6 tmpfs tmpfs 378G 24K 378G 1% /var/lib/ceph/osd/ceph-2 tmpfs tmpfs 378G 24K 378G 1% /var/lib/ceph/osd/ceph-23 tmpfs...
  17. R

    [SOLVED] after upgrade to 9, one node has ceph fail

    I noticed osd's down at pve web page. tried to start , fail . at a not where ceph is up: # ceph -s cluster: id: 220b9a53-4556-48e3-a73c-28deff665e45 health: HEALTH_WARN noout flag(s) set 10 osds down 1 host (10 osds) down Degraded...
  18. R

    [SOLVED] No network after Update to 4

    I agree, and had to use PVE console for the kvm
  19. R

    [SOLVED] No network after Update to 4

    send output of ifup enp1s0 also in case it applies check: https://forum.proxmox.com/threads/main-exception-rawconfigparser-object-has-no-attribute-readfp.169352/#post-789444
  20. R

    [SOLVED] ifupdown2 and /etc/network/interfaces issue

    another kvm running trixie did not need the fix. source /etc/network/interfaces.d/* # The loopback network interface auto lo iface lo inet loopback # The primary network interface #allow-hotplug ens18 #iface ens18 inet dhcp auto ens18 iface ens18 inet static address 10.1.3.5/24...