Search results

  1. G

    ceph not working monitors and managers lost

    still can't get ceph monitors to respond and no mds server appears up...
  2. G

    ceph not working monitors and managers lost

    Looking at logs mon stack 1 critical on space... so I go to var/lib/ceph... see proxmox setup that /var is on root partition.. but root isn't full... kinda lost on why it would be reporting 1% available mon.stack1
  3. G

    [SOLVED] changed CEPH public network ... failed

    can you give me steps and commands you use to revert to backup and reinject monmap so I can get my managers and mds back and hopefully rebuild data from the ODS locations? I have similar issue in another thread with all my details.... If you have any help I would appreciate it. How did you fix...
  4. G

    ceph not working monitors and managers lost

    node 5 still timeout ceph -s just hangs too also, node 5 HDD does not show the OSD anymore... as if the drive is initialized and not used for OSD store... node5 systemctl status root@node5:~# systemctl status ● node5 State: running Jobs: 1 queued Failed: 0 units Since: Sun...
  5. G

    ceph not working monitors and managers lost

    Looks like mon node7 went critical on space Mon_Disk_LOW mon stack1 is 1% avail so a couple of them ran out of space... how? Where are they filling up? Is the map on pve root filling or just the osd and map being stored on osd? Wondering how exactly it stores data for mds and mon - and where...?
  6. G

    ceph not working monitors and managers lost

    You did ask for most of that I think.. lol yeah I know... just frustrated with this now and cannot seem to figure out how to get any manager back and get the ceph to respond ceph -s just sits there and hangs/freezes then timeout
  7. G

    ceph not working monitors and managers lost

    i reboot them all one by one doing apt update and upgrade then pvecm updatecerts all but node900 now show systemctl status -- running ok node 3 now off too..? all others are looking ok still, no mds server still need to see if the data map exists anywhere to be restored no managers seem...
  8. G

    ceph not working monitors and managers lost

    node 2 came back to life a little after reboot... humm
  9. G

    ceph not working monitors and managers lost

    ceph log on node2 2022-02-21T11:23:39.813865-0600 osd.0 (osd.0) 4244643 : cluster [WRN] slow request osd_op(client.471789829.0:1612332 7.dc 7.2b14fadc (undecoded) ondisk+retry+write+known_if_redirected e967246) initiated 2022-02-20T22:30:29.742607-0600 currently delayed...
  10. G

    ceph not working monitors and managers lost

    here is startup log for node 2 -> well at least some of it...
  11. G

    ceph not working monitors and managers lost

    So for giggles I rebooted node 2 and am looking at logs on reboot Feb 28 21:23:36 node2 systemd[1]: Started The Proxmox VE cluster filesystem. Feb 28 21:23:36 node2 systemd[1]: Started Ceph metadata server daemon. Feb 28 21:23:36 node2 systemd[1]: Started Ceph cluster manager daemon. Feb 28...
  12. G

    ceph not working monitors and managers lost

    Just tried to create manager on node1 (stack1) no managers will start... I dont see OSD listed on any node other than on the drive assignments themselves... wondering if map is lost and any way to restore the managers and metadata before I lose it from all the nodes
  13. G

    ceph not working monitors and managers lost

    All nodes are identical on packages and are up to date proxmox-ve: 7.1-1 (running kernel: 5.13.19-4-pve) pve-manager: 7.1-10 (running version: 7.1-10/6ddebafe) pve-kernel-helper: 7.1-12 pve-kernel-5.13: 7.1-7 pve-kernel-5.11: 7.0-10 pve-kernel-5.4: 6.4-4 pve-kernel-5.13.19-4-pve: 5.13.19-9...
  14. G

    ceph not working monitors and managers lost

    I guess I forgot to add an OSD on node 5 - or the drive was purged or something... not sure... "node 5" no longer has OSD on that 1TB HDD - like I have on all the others... I may not have set it up - skipped it or something - but pretty sure I had it in there... but the other nodes are all...
  15. G

    ceph not working monitors and managers lost

    OSD Layout Node 'node900' Node 'node8' Node 'node7' Node 'node5' Node 'node4' Node 'node3' Node 'node2' Node 'stack1'
  16. G

    ceph not working monitors and managers lost

    node6 - removed from cluster (fixing server - will add back asap) node 7 root@node7:~# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo...
  17. G

    ceph not working monitors and managers lost

    10.0.1.3 root@node3:~# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host...