[SOLVED] snmp not responding

Dec 31, 2019
35
0
11
37
Hello,
I have a cluster of 5 nodes, it is in version 7.3.3.
They are supervised by Cacti and observium (LibreNMS).
Since one week my two supervisions are not able to join in SNMP two nodes on the five...
Before your work properly.
I don't think it's related to the upgrade to 7.3.3, the problem didn't happen at the same time.

When I do a tcpdump on the proxmox server, I can see the request coming but it doesn't answer ...

What is strange is that when I do a "systemctl restart snmpd" it takes a long time, but once the service is restarted it works for a while.

I am completely lost and I do not understand.

my snmp configuration :
rocommunity MyCommunity
agentAddress udp:161

Did you have a similar problem?

Thank you.
Anthony
 
anything in the journal from snmpd ?
 
I found nothing in /var/log/message and /var/log/syslog
is snmpd running at all? - what gets printed to the journal when your restart it:
* open `journalctl -f` in a shell
* run `systemctl restart snmpd` in another
 
below the results of the 2 orders

we see that the restart of the snmpd takes more than 1 min ...

-- Journal begins at Fri 2022-08-05 11:31:56 CEST. --
Nov 30 10:51:30 pxe1-infra systemd-logind[1104]: New session 115682 of user root.
Nov 30 10:51:30 pxe1-infra systemd[1]: Started Session 115682 of user root.
Nov 30 10:51:56 pxe1-infra systemd[1]: Stopping Simple Network Management Protocol (SNMP) Daemon....
Nov 30 10:52:54 pxe1-infra pveproxy[2801197]: Clearing outdated entries from certificate cache
Nov 30 10:53:00 pxe1-infra pveproxy[2801196]: Clearing outdated entries from certificate cache
Nov 30 10:53:26 pxe1-infra systemd[1]: snmpd.service: State 'stop-sigterm' timed out. Killing.
Nov 30 10:53:26 pxe1-infra systemd[1]: snmpd.service: Killing process 2500905 (snmpd) with signal SIGKILL.
Nov 30 10:53:26 pxe1-infra systemd[1]: snmpd.service: Main process exited, code=killed, status=9/KILL
Nov 30 10:53:26 pxe1-infra systemd[1]: snmpd.service: Failed with result 'timeout'.
Nov 30 10:53:26 pxe1-infra systemd[1]: Stopped Simple Network Management Protocol (SNMP) Daemon..
Nov 30 10:53:26 pxe1-infra systemd[1]: Starting Simple Network Management Protocol (SNMP) Daemon....
Nov 30 10:53:26 pxe1-infra systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon..
Nov 30 10:53:26 pxe1-infra snmpd[3246061]: User ID has already been set -- can not change

root@pxe1-infra:~# systemctl status snmpd
● snmpd.service - Simple Network Management Protocol (SNMP) Daemon.
Loaded: loaded (/lib/systemd/system/snmpd.service; enabled; vendor preset: enabled)
Active: active (running) since Wed 2022-11-30 10:53:26 CET; 1min 38s ago
Process: 3246060 ExecStartPre=/bin/mkdir -p /var/run/agentx (code=exited, status=0/SUCCESS)
Main PID: 3246061 (snmpd)
Tasks: 1 (limit: 309324)
Memory: 10.2M
CPU: 55ms
CGroup: /system.slice/snmpd.service
└─3246061 /usr/sbin/snmpd -LOw -u Debian-snmp -g Debian-snmp -I -smux mteTrigger mteTriggerConf -f -p /run/snmpd.pid

Nov 30 10:53:26 pxe1-infra systemd[1]: Starting Simple Network Management Protocol (SNMP) Daemon....
Nov 30 10:53:26 pxe1-infra systemd[1]: Started Simple Network Management Protocol (SNMP) Daemon..
Nov 30 10:53:26 pxe1-infra snmpd[3246061]: User ID has already been set -- can not change
 
Hello,
one of my 5 nodes doesn't have a VM, so I restarted it and since then it works fine.
it's very strange.
I don't see where it can come from.
 
Dear Forum.

I don't want to hijack this thread but I'm also having problems with snmpd.
No cluster just one Proxmox hypervisor running PVE 7.2.3
If I start the service I can snmpwalk the Proxmox partially only once and then it times out. If I try to snmpwalk again, nothing happens.
If I restart the service, I'm only able to snmpwalk only for one time until it times out again. I'm getting a "Timeout no response from ... " while executing the first timeout.
Neither syslog nor journalctl are showing an error.
Could please someone help?

Thank you.

Kind regards,
J.
 
If I start the service I can snmpwalk the Proxmox partially only once and then it times out.
Just a blind guess: SNMP reports also information about filesystems. If (for example) an NFS storage is configured but not really accessible it will fail eventually. To check this specific aspect: can you run df without a timeout / error message?

What was the error message? Just "Timeout no response from ... "? No hint regarding the triggering OID? What is listed directly before that error? Please post your actual command line and the last 10 lines plus the error message. (And please use [CODE]xyz[/CODE]-Tags for this.)
 

About

The Proxmox community has been around for many years and offers help and support for Proxmox VE, Proxmox Backup Server, and Proxmox Mail Gateway.
We think our community is one of the best thanks to people like you!

Get your subscription!

The Proxmox team works very hard to make sure you are running the best software and getting stable updates and security enhancements, as well as quick enterprise support. Tens of thousands of happy customers have a Proxmox subscription. Get yours easily in our online shop.

Buy now!