Search results

  1. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Solved !!!! Thank you so much, that's the problem. With routed mesh network it is working like before! I don't know what are the changes between proxmox 6.4-x and proxmox 7.0-x for ceph and broadcast network. but now it works fine :-)
  2. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Same problem under Proxmox 7.0-2 and Ceph 15.2-15, so it must something to do with Debian 11 and Proxmox. Proxmox 6.4-x everything runs fine. I have no more ideas. Any suggestions?
  3. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Using Mellanox OFED driver 5.4-3.0.3.0 does not resolve the issue. The performance is decreased with ofed drivers and mor slow ops occurs.
  4. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    this is the thing that happens in OSD.log: OSD.13 currently delayed 2021-11-11T21:41:00.063+0100 7f4184d78700 0 log_channel(cluster) log [WRN] : slow request osd_op(client.519295.0:13290 2.1b6 2:6d99ebb9:::rbd_data.7ec7985ce9a9f.0000000000000406:head [write 147456~4096 in=4096b] snapc 0=[]...
  5. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    I do not specially configure spanning tree protocols or rapid spanning tree protocol. Each server has a dual port connectx-6 card and each server is connected to the other servers. Linux Bond is configured as Broadcast, this works fine in Proxmox 6.4-13, but on Proxmox 7.x the configuration...
  6. D

    Ceph Outdated OSD's even though on 16.2.6

    I also had this "feature". Remove ceph entirly and reinstall on pve-manager: 7.0-14+1, create new osds and pool. Wait a little bit and the issue outdated osds is resolved.
  7. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Seems to be resolve with latest Mellanox firmware of ubuntu OFED driver package 5.4-3.0.3.0. No blocked queries and vms are responsible. Nope, if write access occurs while rebooting a node, same thing slow ops which ends in unresponsive vms.
  8. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Destroying the cluster, remove ceph and reinstall it solve the issue of outdated osds. Slow ops seems to be away. But I've got OSD_SLOW_PING_TIME_BACK and OSD_SLOW_PING_TIME_FRONT (Slow hartbeates) on Mellanox mesh interface, while rebooting a node. UI is getting also some timeouts. I use the...
  9. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    After recreation of OSD ceph shows Outdated OSDs... but version 16.2.6 is installed on mons, mgrs and osds. Restart mons node by node, than mgrs and after that each osd. Destroying monitor and recreate,then manager... but no success. Still Outdated OSDs but vms working fine, also if a node is down.
  10. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    After recreating of encrypted osds ceph config was changed: osd_pool_default_min_size = 1 osd_pool_default_size = 2 to incorrect values, so I got an unresponsible ceph storage.
  11. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Seems to be autoscaler on cluster. I rebooted the entire cluster and now I do not see any slow ops again. I recreate all osd again with encrypted option and reboot the entire cluster. Let's see if slow ops are gone and vms are resposible
  12. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Last Entry in ceph.log is: 2021-11-10T11:50:17.966964+0100 mon.pve-01 (mon.0) 4215 : cluster [WRN] Health check failed: 8 osds down (OSD_DOWN) 2021-11-10T11:50:17.967048+0100 mon.pve-01 (mon.0) 4216 : cluster [WRN] Health check failed: 1 host (8 osds) down (OSD_HOST_DOWN)...
  13. D

    Ceph Slow Ops if one node is rebooting (Proxmox 7.0-14 Ceph 16.2.6)

    Hello, I've upgraded a Proxmox 6.4-13 Cluster with Ceph 15.2.x - which works fine without any issues to Proxmox 7.0-14 and Ceph 16.2.6. The cluster is working fine without any issues until a node is rebooted. OSDs which generates the slow ops for Front and Back Slow Ops are not predictable...
  14. D

    Update PVE 6 to 7 with Installes Mellanox Connectx-6 Drivers DKMS Ceph not working

    Solved: Using Ubuntu 21.04 Mellanox OFED Drivers solved the problem. Use apt list: # # Mellanox Technologies Ltd. public repository configuration file. # For more information, refer to http://linux.mellanox.com # # [mlnx_ofed_latest_base] #deb...
  15. D

    Update PVE 6 to 7 with Installes Mellanox Connectx-6 Drivers DKMS Ceph not working

    Hello, I just made a in place upgrade from PVE 6.4-13 to PVE 7 with latest Mellanox OFED drivers (Debian 10.8). the Mellanox Connectx-6 dcards are used for a ceph nautilus cluster (latest version). The mellanox cards are running in ethernet mode with ROCEv2. I test a virtual pve cluster to...
  16. D

    Mellanox Connect-X 6 100G is limited to Bitrate ~34Gbits/s

    Hello, it was the cpu c-state, i have to set the cpu to performance with following for Intel Xeon Gold in /etc/default/grub: intel_idle.max_cstate=0 processor.max_cstate=0 idle=poll intel_pstate=disable Now I get it nearly 100 Gbit/s And VMs get throuput of 2.5 GB/s Write and 6.3 GB/s read...
  17. D

    Mellanox Connect-X 6 100G is limited to Bitrate ~34Gbits/s

    It is iperf, I did the same things except amd specific things as described in this thread Benchmark: 3 node AMD EPYC 7742 64-Core, 512G RAM, 3x3 6,4TB Micron 9300 MAX NVMe
  18. D

    Mellanox Connect-X 6 100G is limited to Bitrate ~34Gbits/s

    So I did some research. What i do is to tune with mellanox settings for tcp: sysctl -w net.ipv4.tcp_timestamps=0 sysctl -w net.ipv4.tcp_sack=1 sysctl -w net.core.netdev_max_backlog=250000 sysctl -w net.core.rmem_max=4194304 sysctl -w net.core.wmem_max=4194304 sysctl -w...
  19. D

    Mellanox Connect-X 6 100G is limited to Bitrate ~34Gbits/s

    Thanks for the hint, now I get round about 54 gbits/s with iperf3 -P 24 -l 64K -w 256K. As I understand the 24 streams belongs to one processor to get entire speed I need to run multiple iperf servers on different ports.
  20. D

    Mellanox Connect-X 6 100G is limited to Bitrate ~34Gbits/s

    We have 3 Nodes (Proxmox 6.4-13 latest version) with Mellanox dual port Connect-x 6 cards 100G connected as mesh network with mode eth and ROCEv2, driver OFED-5.4-1.0.3. The uses PCI x16 gen 3.0 8GB/s. MTU is configured to 9000, so they should have more throughput. 3b:00.0 Ethernet controller...