pvedaemon (API) becomes slow (optimization tips?)

Discussion in 'Proxmox VE: Installation and configuration' started by encore, Jul 8, 2018.

Tags:
  1. encore

    encore Member

    Joined:
    May 4, 2018
    Messages:
    38
    Likes Received:
    0
    Hi there,

    we are hosting thousands of CTs on proxmox using your API.
    From time to time it seems like pvedaemon worker processes get stucked.
    I see 3x pve daemon worker process with 100% each permanent. When I kill them and restart the pvedaemon service, they run smooth again on 10-40% cpu usage.

    When it comes to these 100% issue, the API responses very slow.
    A kill & restart fix it for a while.

    Any idea to optimize the pvedaemon? Maybe tell him to use more workers? Is there any log where I might find a reason for those stucking processes?

    Thank you,
    Marvin
     
  2. Alwin

    Alwin Proxmox Staff Member
    Staff Member

    Joined:
    Aug 1, 2017
    Messages:
    1,972
    Likes Received:
    168
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  3. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    501
    Likes Received:
    55
    I am having a problem with this as well. on some clusters api responses are reasonable, on others it can be really slow (as in 20sec for a reply.) Each node has between 80-100 cts. I tried sending a request to different nodes but the end result is the same:

    Code:
    # time pvesh get /nodes/node15/lxc
    ...
    real    0m19.848s
    user    0m1.148s
    sys     0m0.150s
    
    Code:
     #pveversion -v
    # pveversion -v
    proxmox-ve: 5.2-2 (running kernel: 4.15.17-2-pve)
    pve-manager: 5.2-1 (running version: 5.2-1/0fcd7879)
    pve-kernel-4.15: 5.2-2
    pve-kernel-4.15.17-2-pve: 4.15.17-10
    pve-kernel-4.13.13-6-pve: 4.13.13-42
    pve-kernel-4.13.13-2-pve: 4.13.13-33
    ceph: 12.2.5-pve1
    corosync: 2.4.2-pve5
    criu: 2.11.1-1~bpo90
    glusterfs-client: 3.8.8-1
    ksm-control-daemon: 1.2-2
    libjs-extjs: 6.0.1-2
    libpve-access-control: 5.0-8
    libpve-apiclient-perl: 2.0-4
    libpve-common-perl: 5.0-31
    libpve-guest-common-perl: 2.0-16
    libpve-http-server-perl: 2.0-8
    libpve-storage-perl: 5.0-23
    libqb0: 1.0.1-1
    lvm2: 2.02.168-pve6
    lxc-pve: 3.0.0-3
    lxcfs: 3.0.0-1
    novnc-pve: 0.6-4
    openvswitch-switch: 2.7.0-2
    proxmox-widget-toolkit: 1.0-18
    pve-cluster: 5.0-27
    pve-container: 2.0-23
    pve-docs: 5.2-4
    pve-firewall: 3.0-9
    pve-firmware: 2.0-4
    pve-ha-manager: 2.0-5
    pve-i18n: 1.0-5
    pve-libspice-server1: 0.12.8-3
    pve-qemu-kvm: 2.11.1-5
    pve-xtermjs: 1.0-5
    qemu-server: 5.0-26
    smartmontools: 6.5+svn4324-1
    spiceterm: 3.0-5
    vncterm: 1.5-3
    zfsutils-linux: 0.7.9-pve1~bpo9
    Code:
    # pvecm status
    Quorum information
    ------------------
    Date:             Fri Aug 24 11:16:35 2018
    Quorum provider:  corosync_votequorum
    Nodes:            9
    Node ID:          0x00000008
    Ring ID:          1/752
    Quorate:          Yes
    
    Votequorum information
    ----------------------
    Expected votes:   9
    Highest expected: 9
    Total votes:      9
    Quorum:           5
    Flags:            Quorate
    
    Membership information
    ----------------------
        Nodeid      Votes Name
    0x00000001          1 10.19.1.8
    0x00000002          1 10.19.1.9
    0x00000003          1 10.19.1.10
    0x00000004          1 10.19.1.11
    0x00000005          1 10.19.1.12
    0x00000006          1 10.19.1.13
    0x00000007          1 10.19.1.14
    0x00000008          1 10.19.1.15 (local)
    0x00000009          1 10.19.1.16
    
    sysctl fs variables:
    fs.aio-max-nr = 1048576
    fs.aio-nr = 62896
    fs.binfmt_misc.status = enabled
    fs.dentry-state = 2829612 2404098 45 0 0 0
    fs.dir-notify-enable = 1
    fs.epoll.max_user_watches = 40549478
    fs.file-max = 19769117
    fs.file-nr = 73680 0 19769117
    fs.inode-nr = 1748660 130375
    fs.inode-state = 1748660 130375 0 0 0 0 0
    fs.inotify.max_queued_events = 16384
    fs.inotify.max_user_instances = 131072
    fs.inotify.max_user_watches = 524288
     
  4. Ahmet Bas

    Ahmet Bas New Member

    Joined:
    Aug 3, 2018
    Messages:
    15
    Likes Received:
    0
    Were you able to fix your slow api response ?
     
  5. alexskysilk

    alexskysilk Active Member

    Joined:
    Oct 16, 2015
    Messages:
    501
    Likes Received:
    55
    no. I just live with it and know that 100-120cts/node is the operational ceiling. mind you this isnt the only limit; number of nodes in the cluster also makes a difference. Does anyone have experience in running nodes/clusters more densely?
     
  6. Ahmet Bas

    Ahmet Bas New Member

    Joined:
    Aug 3, 2018
    Messages:
    15
    Likes Received:
    0
    We have a cluster with 5 HV 50-60 VM per node but the API response is very slow it takes 5-20 sec before we all data is loaded. We are curious if there is a way to make this faster. The cluster is healthy but the API response is to slow.
     
  7. encore

    encore Member

    Joined:
    May 4, 2018
    Messages:
    38
    Likes Received:
    0
    we still having this problem, very annoying. We implemented a workaround by caching informations from proxmox, but this is a really dirty solution.
     
  8. Ahmet Bas

    Ahmet Bas New Member

    Joined:
    Aug 3, 2018
    Messages:
    15
    Likes Received:
    0
    Could you explain what your workaround is exactly and how resolved it for now? Is there nobody who can help us identify our problem and resolve it or someone which had the problem before and was able to resolve it.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice