Search results

  1. P

    Separate Cluster Network Wiki "bindnet0_addr" command

    I am getting the same error on Proxmox 6.0.4 I tried with single - and double -- option pvecm create live-03 -bindnet0_addr 10.30.10.11 -ring0_addr 10.30.10.11 -bindnet1_addr 10.30.11.11 -ring1_addr 10.30.11.11 pvecm create live-03 --bindnet0_addr 10.30.10.11 --ring0_addr 10.30.10.11...
  2. P

    Proxmox VE Ceph Benchmark 2018/02

    Hello, I am a little confused by IOPS results in the Benchmark PDF, around 200 Write IOPS on 10Gb and around 300 IOPS on 100Gb/s net? Then in this thread people talk about reaching thousands of Write IOPS on Ceph, what am I missing? If I have 30 LXC that work with large number of small files...
  3. P

    Packet dropped on lxc

    Setting more channels on NIC seems to have great benefits (still getting a 2% packet loss with 6 channels): ethtool -L eno1 combined 6
  4. P

    Packet dropped on lxc

    The issue seems related to NIC card. In other cluster (working well) I have a BCM5720 and I see in /proc/interrupts 1 TX channel and 4 RX channels assigned to different CPU cores. In server with packet loss and I350 I see only 1 single TxRx assigned to a single CPU. ethtool -l eno1 for...
  5. P

    Packet dropped on lxc

    I tried removing the bond but I still have packet loss only in LXC ! Netstat output Kernel Interface table Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eno1 1500 15228877 0 193 0 200224 0 0 0 BMRU lo 65536 278432...
  6. P

    Moving a node with Ceph from one cluster to a different one

    Thank you. It worked. I stopped OSDs and removed them. Then I zapped them to bring disk at default status: ceph-volume lvm zap /dev/sdX --destroy Then I stopped and removed MON from GUI (seems to remove MGR too). I finally removed node with pvecm delnode NODE-NAME Only one thing : I...
  7. P

    Moving a node with Ceph from one cluster to a different one

    Hello, I decided to redistribute my cluster nodes in a different way, so I want to remove one node from one cluster to another. Which is the correct sequence to remove the node from Ceph cluster? Docs only describe node addition and not removal. Can it be done just in GUI? After node removal...
  8. P

    Traffic issue with Routed setup with multiple IP ranges

    Hello, I have 2 clusters sharing same switches and some large IP ranges. All servers have main public IP on 5.x.x.X assigned to bridge Servers may have 1 or more additional ranges added with routed setup on the bridge, so I can migrate LXCs with different ranges to all nodes of cluster...
  9. P

    Packet dropped on lxc

    Hello, I am asking suggestion on how to troubleshoot packets dropped on an LXC. Proxmox host has a public IP + a different public IP range routed on bridge. The bridge works on a linux bond. Ping between Proxmox IP (on same bond/bridge) and other servers works without packet loss. Ping and...
  10. P

    Need to disconnect nodes from switch one by one

    Hello, due to maintenance I need to reboot one switch. I have some nodes of a cluster connected to it for two networks, wan bridge and cluster network (not storage). If I am right a short disconnection from wan should not cause troubles, and Linux lxc should manage reconnections. What is the...
  11. P

    Analyzing Ceph load

    Yes but this excludes both disk slowness and HBA/Raid mode performance as causes.
  12. P

    Analyzing Ceph load

    One strange update. I had to restart all nodes one by one and after that the I/O delay is now normal with the same VM usage. Could it be a VM creating unusual load?
  13. P

    Analyzing Ceph load

    On the same network 10.10.20.0/24
  14. P

    Analyzing Ceph load

    Does the fact that even with no LXC running the ceph-osd daemon has > 100% cpu usage gives any hints ?
  15. P

    Analyzing Ceph load

    - Corosync is on net 10.10.10.0/24 on 1Gb NIC - Ceph public network is on net 10.10.20.0/24 on 10Gb NIC I will check better the raid controller options. Thank you P.
  16. P

    Analyzing Ceph load

    Thanks again Alwin. So this could be a reason. How can I measure the total IOPS ongoing on a OSD ? Do you think adding more OSDs of same disk type could help? Ceph net is on 10Gb Nic Cluster net and WAN net (bridge) are on two separate 1Gb NICs Is rados bench destructive? I need to stop...
  17. P

    Analyzing Ceph load

    Hello Alwin, thanks for your reply, after some days I am back at troubleshooting I/O wait. Cluster: - 6 Nodes with Ceph with PVE 5.4 - Wan network on NIC 1 1Gb - Cluster network on NIC 2 1Gb CEPH: - 12 OSD - 3 monitors - 256 pgs - data replication x3 (osd pool default size=3) - osd journal...
  18. P

    Analyzing Ceph load

    Hello, I want to understand if I am reaching a bottleneck in my hyperconverged Proxmox+ Ceph cluster. In moments of high load (multiple LXC with high I/O on small files) I see one node with: - IO delay at 40% - around 50% CPU usage - load at 200 (40 total cores / 2 x Intel(R) Xeon CPU E5-2660...
  19. P

    [SOLVED] LXC syslog messages related to different container

    Great thank you, one thing less to worry ;-)
  20. P

    [SOLVED] LXC syslog messages related to different container

    Hello, I found in syslog of LXC with CTID 121 messages related to other LXC with different CTID : May 23 12:09:30 dq kernel: [5441664.537117] dump_stack+0x63/0x8b May 23 12:09:30 dq kernel: [5441664.545080] mm_fault_error+0x8f/0x190 May 23 12:09:30 dq kernel: [5441664.552711] RBP...