Snmp monitoring

Discussion in 'Proxmox VE: Installation and configuration' started by caplam, Nov 27, 2018.

  1. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    I try to setup librenms monitoring server in a lxc container.
    I can monitor my switch, a qnap nas, a vm but i'm unable to get it working for a proxmox host.
    i installed snmpd with a standard debian conf. The service start without problem and is active.
    But when i try snmpwalk from proxmox host or from nms server i have a timeout.
    Still the snmpd service remains active.
    I tried to restart the container i see this in /var/log/messages on the host:
    Code:
    pve kernel: [511726.443462] audit: type=1400 audit(1543264741.686:173): apparmor="DENIED" operation="mount" info="failed flags match" error=-13 profile="lxc-108_</var/lib/lxc>" name="/" pid=953 comm="(ionclean)" flags="rw, rslave
    
    Here is the conf file of the container:
    Code:
    arch: amd64
    cores: 1
    hostname: nms
    memory: 512
    net0: name=eth0,bridge=vmbr0,hwaddr=2E:76:DB:26:95:E2,ip=dhcp,type=veth
    ostype: debian
    rootfs: local-lvm:vm-108-disk-0,size=16G
    swap: 512
    
    Edit:
    I can add that starting and stopping snmpd service is fast.
    Once i have run a snmpwalk stopping service or geeting its status is very slow.

    Edit2: with ss -ulnp i can see proxmox host is not listening port udp 161. It's not listening at all and yet systemctl status snmpd tells me snmpd service is running fine.
     
    #1 caplam, Nov 27, 2018
    Last edited: Nov 27, 2018
  2. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    I made some changes in my snmp config. I copied from https://www.svennd.be/how-to-install-snmp-service-on-proxmox/
    Now i see that there is something listening on port 161.
    with ss -au i can see that Recv-Q is growing which means that there is no service which reads data from the socket. It stops growing at 166656
    I suppose it's the size of the buffer.
    sytemctl status snmpd show service running.
     
  3. vshaulsk

    vshaulsk New Member

    Joined:
    Oct 24, 2017
    Messages:
    19
    Likes Received:
    1
    I wonder if the issue has to do with running librenms as a container and trying to listen to the host.

    I have librenms installed as full VM and it monitors the various proxmox nodes through SNMP (I am actually in the middle of moving librenms to a container....)

    You may want to try to install the librenms VM from their website and see if it connects without issue...... just to check if your proxmox snmp is setup properly.
     
  4. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    I migrate the container to another host.
    Even when i try snmpwalk from proxmox host itself i get a timeout.
    I don't think it's related to librenms.
    It also has no problem to get data from a procurve switch, a qnap nas or a debian vm on the host.
     
  5. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    I finally got it working with ........ a reboot but only for a few days.
    This night my 2 nodes were unreachable by snmp.
    I tried to stop/start snmpd daemon. I tried to kill snmpd and restart. I tried to restart networking.service.
    When i restarted networking all my guests on the host became unreachable even by ping.
    vmbr0 was not bridging any guests interface.
    The only things which worked was to reboot the host.

    I dig a bit with my very limited knowledge. I found that when my host become unreachable by snmp, snmpd is still running.
    ss -au shows me that RecV-Q is growing.
    There might be something with proxmox because my 2 hosts are the only machines which behave like that. Guests continue to answer to snmp requests (lxc or qemu with debian)
    This night my 2 hosts went down ( snmp check) in 5 mins interval.
    I will continue to dig around.
     
  6. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    391
    Likes Received:
    32
    This is (sadly) the normal behaviour of ifupdown - you could try to change to ifupdown2 (in our repositories since a short while), which should support network reloading.

    as for the snmpd - do you see anything from it in the logs (by default it logs to the journal on debian and thus PVE)?
    else you could configure a higher log-level and see if it shows something relevant:

    https://prefetch.net/blog/2009/04/16/debugging-net-snmp-problems/
    http://net-snmp.sourceforge.net/wiki/index.php/Debug_tokens
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  7. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    in the journal i see an error on boths hosts which is not time related to the message in the nms.
    the error is:
    Code:
    inetNetToMediaTable:_add_or_update_arpentry: unsupported address type, len = 0
    
    I didn't find other examples of this error.
    It seems linked to a change in the bridge config which doesn't update arp table. Therefore a message comes with no ip address and should be dropped which is not.
    I don't understand why the daemon doesn't read (and empty) the buffer.
    I need to find more debug messages.
    Not sure of what i wrote above i'm far from field of expertise.
     
  8. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    391
    Likes Received:
    32
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  9. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    I may have an idea.
    I run strace on the host. I run snmpwalk in another shell. This host is not responsive to snmp. Snmpwalk returns dozen of lines and hangs until time out
    The last line of the trace is:
    Code:
    statfs("/mnt/pve/TS639",
    
    Where /mnt/pve/TS639 is the mount point of nfs shared storage which is unresponsive. I deactivated it in gui.

    On the other host which have rebooted and is snmp responsive i tried the same.
    snmpwalk ends normally and in the trace i have no occurence of /mnt/pve/TS639.

    My guess is that if i deactivate the share the host acts as if it were still active until reboot.
    I guess that if i reboot the first host it will be ok.
    I suppose that both my hosts became unresponsive at the time were the share became unresponsive.
    This shared storage is a very old qnap nas which is 99% full of backup. It answers only to ping. ssh is not possible. I think i will have to hard reboot it what i don't want because i'll have to run fsck but there is not enough memory so i have to mount an usb key to make a swapfile on it.

    And that makes me think about the initial installation of snmpd. I had to reboot both my hosts to make it work.
    I think it's because i had another network share, a synology ds1815, that went down cause of intel atom bug. I deactivated it as network share in proxmox gui but hadn't rebooted the hosts.

    So i have a question: Is it possible to really deactivate a network share storage without rebooting ?

    Edit : I found on the second host the same error (inetNetToMediaTable:_add_or_update_arpentry: unsupported address type, len = 0) with no effect on the snmp communication.
     
    #9 caplam, Dec 5, 2018
    Last edited: Dec 5, 2018
  10. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    391
    Likes Received:
    32
    * check the output of `mount` on your host - if the nfs-share is still mounted, you can try to unmount it.
    * hanging nfs-shares, are quite tricky (the kernel hangs indefinitely waiting for them) and can require a reboot to be removed - YMMV
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
    caplam likes this.
  11. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    here is the output of mount
    Code:
    sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
    proc on /proc type proc (rw,relatime)
    udev on /dev type devtmpfs (rw,nosuid,relatime,size=3965716k,nr_inodes=991429,mode=755)
    devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
    tmpfs on /run type tmpfs (rw,nosuid,noexec,relatime,size=803944k,mode=755)
    /dev/mapper/pve-root on / type ext4 (rw,relatime,errors=remount-ro,data=ordered)
    securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
    tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
    tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
    tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
    cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/lib/systemd/systemd-cgroups-agent,name=systemd)
    pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
    cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
    cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio)
    cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
    cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
    cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct)
    cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
    cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
    cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
    cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
    cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
    cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,rdma)
    systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=37,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=12836)
    hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime,pagesize=2M)
    debugfs on /sys/kernel/debug type debugfs (rw,relatime)
    mqueue on /dev/mqueue type mqueue (rw,relatime)
    sunrpc on /run/rpc_pipefs type rpc_pipefs (rw,relatime)
    configfs on /sys/kernel/config type configfs (rw,relatime)
    fusectl on /sys/fs/fuse/connections type fusectl (rw,relatime)
    lxcfs on /var/lib/lxcfs type fuse.lxcfs (rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other)
    binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
    192.168.0.2:/Proxmox on /mnt/pve/TS639 type nfs (rw,relatime,vers=3,rsize=32768,wsize=32768,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=192.168.0.2,mountvers=3,mountport=30000,mountproto=udp,local_lock=none,addr=192.168.0.2)
    tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=803940k,mode=700)
    /dev/fuse on /etc/pve type fuse (rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other)
    tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
    
    so it seems the share is still mounted.
    If i run the same command on the host which has been rebooted i don't see the share.
    I deactivated the share in the datacenter gui yesterday.

    My nas (TS639) is back online since 1 hour (finished e2fsck and volume remounted). It's still deactivated for the time being.
     
  12. Stoiko Ivanov

    Stoiko Ivanov Proxmox Staff Member
    Staff Member

    Joined:
    May 2, 2018
    Messages:
    391
    Likes Received:
    32
    try to unmount it?
    `umount /mnt/pve/TS639`
     
    Stop hovering to collapse... Click to collapse... Hover to expand... Click to expand...
  13. caplam

    caplam New Member

    Joined:
    Nov 14, 2018
    Messages:
    16
    Likes Received:
    0
    The share has been properly unmounted and snmpd daemon can normally read the buffer.
    snmpwalk is ok and i guess in 5 minutes i will see the host in my nms.
     
  1. This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
    By continuing to use this site, you are consenting to our use of cookies.
    Dismiss Notice