Search results

  1. R

    Proxmox on Ceph: After PG_NUM increase high read io on NVMEs - VMs unusable

    So, the monday was horrible, our customers started to hate us again... Carefully(!!!) restarting the OSD processes and using the recoveries to keep the backfilling from starting up got us over the day, and late in the evening we found the bluefs_buffered_io setting. So that shifted the read IO...
  2. R

    Proxmox on Ceph: After PG_NUM increase high read io on NVMEs - VMs unusable

    Hi, the autoscaler increased the number of PGs on our Ceph storage (Hardware like this but 5 nodes). As soon as the backfill starts the VMs become unusable and we startet killing OSD processes that cause high read io load. So as in this picture we would kill the ceph-osd process working on...
  3. R

    PBS server full: two days later almost empty?!?!?!?

    So, here comes the next uncertainity from my side: I generally disable atime on ZFS pools and this becomes inherited on the volumes then. If you rely on atime does this still work?
  4. R

    PBS server full: two days later almost empty?!?!?!?

    So I let the log output continue to run and after about 2 hours and about one GByte I stopped it again. The content is just about the same for the whole file.
  5. R

    PBS server full: two days later almost empty?!?!?!?

    Here are parts of the log: starting garbage collection on store pve-prod task triggered by schedule '15:40' Start GC phase1 (mark used chunks) marked 1% (17 of 1651 index files) WARN: warning: unable to access chunk 2cd5a53b5d8aa3c9d530c2ee2b89ccf7ed238ad7bf97afb1a3424666784656d0, required by...
  6. R

    PBS server full: two days later almost empty?!?!?!?

    Still venting air here... Underlying storage is a ZFS pool, which looks ok: There are so many warnings that the Browser stops working, when I tried to have a look at them: Mount looks ok: root@pbs01:~# mount | grep pbs rpool/ROOT/pbs-1 on / type zfs (rw,relatime,xattr,noacl) backup/pbs on...
  7. R

    PBS server full: two days later almost empty?!?!?!?

    No idea how that happened! But 206GByte ist definately not the size of my source five node PVE cluster. Any idea how that could happen? So on 27.01.2021 it went full and two days later it is almost empty... I am pretty shocked right now... Rainer
  8. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    But does not support properly exporting .snap directories... At least a bug report was created for this missing feature on NFS-Ganesha : https://tracker.ceph.com/issues/48991
  9. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    No idea why people are using NFS-Ganesha??? Created a fresh CT, copied, adjusted and reloaded an apparmor profile for it: root@proxmox07:~# cat /etc/apparmor.d/lxc/lxc-default-with-nfs2ceph # Do not load this file. Rather, load /etc/apparmor.d/lxc-containers, which # will source all profiles...
  10. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    ...and that was it with NFS-Ganesa: https://github.com/nfs-ganesha/nfs-ganesha/blob/4e0b839f74608ce7005e533eda1431c730257662/src/FSAL/FSAL_CEPH/export.c#L307 * Currently, there is no interface for looking up a snapped * inode, so we just bail here in that case. */...
  11. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    Yes it does. But as soon as the cron.d killed all the ganesha.nfsd process on all of the five CTs there is nowhere to move the IP to. This is a part of my keepalived config: rstumbaum@controlnode01.dc1:~$ cat keepalived/conf.d/check_proc_ganesha.conf vrrp_script check_proc_ganesha {...
  12. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    Excellent test! The nfs-ganesha systemd.unit file is crap! After a pkill -9 it does not start automatically again, so I am going to loose the NFS exports as soon as I am through with the cycle! Have to add Restart=always there...
  13. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    Good idea! Trying that now! Yes. The NFS servers have each 7 ethernet devices: admin access, Ceph Public Network, 5 storage networks dedicated to NFS traffic to the VMs. Each VM has two network interfaces: storage access and application network. Storage access is a MTU 9000 non-routed network...
  14. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    From the NFS client it is barely noticable. I currently run a cron.d reboot script like this 1-59/5 * * * * root hostname | grep -qE 'nfsshares-a' && /bin/systemctl reboot 2-59/5 * * * * root hostname | grep -qE 'nfsshares-b' && /bin/systemctl reboot 3-59/5 * * * * root hostname | grep -qE...
  15. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    I am building NFS-Ganesha now using a Docker container and the Debian build tools. rstumbaum@controlnode01.dc1:~/docker-nfs-ganesha-build$ cat Dockerfile ARG DEBIAN_RELEASE="buster" ARG CEPH_RELEASE_PVE="nautilus" FROM debian:${DEBIAN_RELEASE} AS build-env ARG DEBIAN_RELEASE ARG...
  16. R

    CEPHS NFS-Ganesha

    Ich habe das hier noch mal aufgegriffen, den Beitrag könnte man Liken: https://forum.proxmox.com/threads/ha-nfs-service-for-kvm-vms-on-a-proxmox-cluster-with-ceph.80967/post-363314
  17. R

    Proxmox VE 6.1 + in LXC: Ubuntu 18.04 + nfs-server: rpc-gssd.service: Job rpc-gssd.service/start failed with result 'dependency'.

    You might want to vote this up: https://forum.proxmox.com/threads/ha-nfs-service-for-kvm-vms-on-a-proxmox-cluster-with-ceph.80967/post-363314
  18. R

    nfs-kernel-server on lxc: yes or no?

    You might want to vote this up: https://forum.proxmox.com/threads/ha-nfs-service-for-kvm-vms-on-a-proxmox-cluster-with-ceph.80967/post-363314
  19. R

    nfs error in lxc

    You might want to vote this up: https://forum.proxmox.com/threads/ha-nfs-service-for-kvm-vms-on-a-proxmox-cluster-with-ceph.80967/post-363314
  20. R

    HA NFS service for KVM VMs on a Proxmox Cluster with Ceph

    So I am following down this path now: - On the 5 production nodes install 5 minimal CTs with NFS-Ganesha on Debian root@nfsshares-a:~# grep '^[[:blank:]]*[^[:blank:]#;]' /etc/ganesha/ganesha.conf NFS_CORE_PARAM { Enable_NLM = false; Enable_RQUOTA = false; Protocols = 3,4...