Search results

  1. G

    ceph not working monitors and managers lost

    10.0.1.1 root@stack1:~# ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host...
  2. G

    ceph not working monitors and managers lost

    root@node7:/etc/pve# ceph auth ls 2022-02-27T17:17:55.616-0600 7f907359e700 0 monclient(hunting): authenticate timed out after 300 2022-02-27T17:22:55.618-0600 7f907359e700 0 monclient(hunting): authenticate timed out after 300 2022-02-27T17:27:55.615-0600 7f907359e700 0 monclient(hunting)...
  3. G

    Old server cluster - 6 x 1gb nic - best way to config?

    Also - it is weird that on most nodes it shows eno1 as the active port... but on like node 7 it shows enps0f0 as the active nic port... I currently only have one port connected - the on-board port which SHOULD reports as eno1 but for some odd reason ProxMox thinks its the physical 4 port card...
  4. G

    Old server cluster - 6 x 1gb nic - best way to config?

    ok so how are you saying to do this?
  5. G

    ceph not working monitors and managers lost

    I see host node 900 is down (bulk of the osd storage) so that may be one big thing that if I can get back online will help rebalance things again..
  6. G

    ceph not working monitors and managers lost

    current config: [global] auth_client_required = cephx auth_cluster_required = cephx auth_service_required = cephx cluster_network = 10.0.1.1/16 fsid = cfa7f7e5-64a7-48dd-bd77-466ff1e77bbb mon_allow_pool_delete = true mon_host = 10.0.1.2 10.0.1.1 10.0.1.6...
  7. G

    ceph not working monitors and managers lost

    @Tmanok didn't mean to sound unappreciative! I do need the help... so there is that. I have 9 nodes (8 that are active. Node 6 is out putting in new power supply so it is out of the cluster for now). They talk to each other fine... Node (Stack1-node8) all have a single 1TB HDD spinner that I...
  8. G

    Old server cluster - 6 x 1gb nic - best way to config?

    OK but assuming I dont want to redo everything and reinstall from scratch... just wondering where to change config or how it works to change it after its been on another subnet already... or how to change to different nic within config... oh well I will look some more
  9. G

    Old server cluster - 6 x 1gb nic - best way to config?

    how to define corosync and ceph affinity to specific nic?
  10. G

    ceph not working monitors and managers lost

    #1 - I have 9 nodes #2 - all nodes are plenty resources #4 - of course I rebooted all nodes... this happened after octo update to pacific with the automatic update and upgrade script.... Aside from generic info - your answer has no value to my case posted... ceph just hangs and mds and osd...
  11. G

    Ceph down after upgrade to Pacific

    any way to recover osd and get managers back and rescue map? Nodes can see each other fine - just missing managers for Ceph and no osd are showing up. ceph -s hangs timeout on any gui screen and most ceph commands root@node900:/etc/pve/nodes/node2# ha-manager status quorum OK master node5...
  12. G

    ceph not working monitors and managers lost

    Did you ever resolve this? I am having same issue. ceph -s jsut sites there and freezes timeout 500 on gui for ceph status page/dashboard config shows all the correct hosts for monitors and correct node ip proxmox node to node connectivity is fine - just ceph MANAGERS are missing and no OSDs...
  13. G

    Ceph Recovery after all monitors are lost?

    I somehow lost all my osd and map too - when I did pm gui update.. after reboot everything went to hell... any ideas on any of this? ceph osd setcrushmap -i backup-crushmap and just about any command for ceph just hangs and or times out... Monitors are listed but no quorum No OSDs are listed...
  14. G

    Ceph down after upgrade to Pacific

    To be honest - I did not even look to see what upgrades happened till it was too late. Octo to Pacific upgrade happened apparently with the automatic gui updates... I did not read the notes and now all my cluster ceph pool is dead as a rock. I noticed timeout after timeout... I manually...
  15. G

    Old server cluster - 6 x 1gb nic - best way to config?

    I have a bunch of older servers - almost all have a 4port x 1GB card and 2x onboard gb ports. right now I only am using one of the on board nics for all the nodes - one of the onboard ones... I have a linux bridge assigned vmbr0 to that on-board port and then all the VM's LXCs run over that...
  16. G

    Warning: unable to close filehandle GEN5 properly: No space left on device at /usr/share/perl5/PVE/APIServer/AnyEvent.pm line 1804.

    Hey, you're not alone.. been having same issues with pve-root maxing out.. seems something is getting stuck on recent upgrade... For me ceph log and other logs were HUGE and taking up all the space.. so I deleted the ceph log and removed the osd from that specific node... completed the update...
  17. G

    node root pve-root 100% full - delete log - autoremove now need to extend root

    So this caused all sorts of issues and node would not update then it froze up bad... restarted and it would not reconnect... got on local and saw it was up but out of root partition space... ssh to the node apt-get autoremove failed... no space everything I did failed... I found several...