Search results

  1. grin

    [SOLVED] px5+ceph : TypeError: checks[key].summary is undefined

    # ceph osd versions { "ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)": 15 } But indeed: # ceph mon versions { "ceph version 12.1.2 (cd7bc3b11cdbe6fa94324b7322fb2a4716a052a7) luminous (rc)": 2, "ceph version 12.2.0...
  2. grin

    [SOLVED] px5+ceph : TypeError: checks[key].summary is undefined

    # ceph health detail -f json-pretty { "checks": { "TOO_MANY_PGS": { "severity": "HEALTH_WARN", "message": "too many PGs per OSD (972 > max 300)", "detail": [] } }, "status": "HEALTH_WARN" }
  3. grin

    [SOLVED] px5+ceph : TypeError: checks[key].summary is undefined

    I can give you the complete error (which comes after 2-3 xhr's into ceph main dashboard). From then on monitor tab is empty, osd is ok, configuration is often left ok right 500/partial read, pools empty, log ok. Those which is broken there seem to be no XHRs towards the API anymore...
  4. grin

    [SOLVED] px5+ceph : TypeError: checks[key].summary is undefined

    ceph version 12.2.0 (36f6c5ea099d43087ff0276121fd34e71668ae0e) luminous (rc)
  5. grin

    [SOLVED] px5+ceph : TypeError: checks[key].summary is undefined

    Ceph dashboard became very unreliable, the javascript in the gui seems to neglect to even retrieve the data. I've seen one response with "500 / partial read" on trying to retrieve the pool data, but most often I see the webconsole: TypeError: checks[key].summary is undefined possibly it gets...
  6. grin

    New Ceph 12.1.1 packages for testing, GUI for creating bluestore OSDs

    Neither that nor /var/lib/ceph/mgr does exist. It seems that by some magical reason the ceph-mgr package doesn't work unless it's actually installed. :-] The moral of the story, though, is that pveceph should verify the directory existence (and thus the existence of the installed package)...
  7. grin

    New Ceph 12.1.1 packages for testing, GUI for creating bluestore OSDs

    root@bowie:~# pveceph createmgr creating manager directory '/var/lib/ceph/mgr/ceph-bowie' creating keys for 'mgr.bowie' unable to open file '/var/lib/ceph/mgr/ceph-bowie/keyring.tmp.24284' - No such file or directory
  8. grin

    [px5] new CT fail to start: mknod: …/rootfs/dev/rbd3: Operation not permitted

    Ah, damn, foreground puts the output on the console, but background doesn't capture them in the logfile. Stooopid! Thanks! Seems like the container is missing CAP_MKNOD, and the systemd (be it damned in the fires of hell forever) autodev feature is not used. That's a quite serious bug: no newly...
  9. grin

    [px5] new CT fail to start: mknod: …/rootfs/dev/rbd3: Operation not permitted

    [proxmox5] Newly created unprivileged lxc container fails to start. The failure is rather ugly, since there is basically no info on it: Aug 16 00:25:25 elton lxc-start[39248]: lxc-start: tools/lxc_start.c: main: 366 The container failed to start. Aug 16 00:25:25 elton lxc-start[39248]...
  10. grin

    proxmox 5.0 live migration fails

    Apart from that there has been no real problems so far upgrading 4.3 + ceph jewel to 5.0 + lumi (12.1.0). I even let all the stuff running on the server bing in-place upgraded, no problem observed, they were moved away before reboot.
  11. grin

    proxmox 5.0 live migration fails

    Why, proudly Freddy and Elton! ;-) [Someone joked about that along the line "they would fire me when I would name the servers as…" and I was, like, fuck yeah I can do that. ;-)] Okay, I said that your advice is obviously bullshit, why would the display(!!) setting matter when piping over the...
  12. grin

    proxmox 5.0 live migration fails

    This doesn't seem to be the already mentioned ssh problem (which is at 1:7.4p1-10 anyway): Jul 11 18:34:38 copying disk images Jul 11 18:34:38 starting VM 103 on remote node 'bowie' Jul 11 18:34:40 start remote tunnel Jul 11 18:34:40 starting online/live migration on...
  13. grin

    You are possibly using machine translation, and it doesn't work well. But try to open a new...

    You are possibly using machine translation, and it doesn't work well. But try to open a new forum thread and tell me where is it, and I try to read it.
  14. grin

    dist-upgrade / reboot / watchdog

    As we have already talked about reboots, here's one fresh. From 4.4-5 to 4.4-13. Reboot is at the end. Apr 12 14:55:47 srv-01-szd systemd[1]: Stopped Corosync Cluster Engine. Apr 12 14:55:47 srv-01-szd systemd[1]: Starting Corosync Cluster Engine... Apr 12 14:55:47 srv-01-szd corosync[26358]...
  15. grin

    temporarily disable out of the box self-fencing

    (Sidenote: I have just tried to dist-upgrade 4.4 from a previous state, and fence booted off the machine while configuring pve-kernel.)
  16. grin

    temporarily disable out of the box self-fencing

    Last time I tried to follow where it hangs and here's an strace fragment: 10:23:06.155779 socket(PF_LOCAL, SOCK_STREAM, 0) = 3 10:23:06.155798 fcntl(3, F_GETFD) = 0 10:23:06.155811 fcntl(3, F_SETFD, FD_CLOEXEC) = 0 10:23:06.155824 fcntl(3, F_SETFL, O_RDONLY|O_NONBLOCK) = 0...
  17. grin

    temporarily disable out of the box self-fencing

    1) If I dist-upgrade it and the disk is not magically fast there may be more than 60 seconds between upgrade start (stopping daemon) and setup finish (starting daemon), which causes reboot. When I upgrade all 3 node at once corosync may lose quorum for many minutes. Why? I have asked the same...
  18. grin

    temporarily disable out of the box self-fencing

    But the easiest example for you: upgrading the hosts. If you upgrade them at once, pve-cluster (and relared pkgs) upgrade almost always trigger a reboot. Today's wasn't that one but we have had full cluster reboots due to upgrades in the past way too many times.
  19. grin

    temporarily disable out of the box self-fencing

    Oh please, don't. I can detail you how it can f*ck itself over big time, apart from the bug you fixed around 4.4, but really, the question wasn't that why do I need it, but that when I do need it how should it be done. [Guess what, today another node rebooted since that rotten systemd didn't...