Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polling the Host Stops when sysDescr returns "noSuchObject" #366

Open
penfold1972 opened this issue Feb 5, 2025 · 2 comments
Open

Polling the Host Stops when sysDescr returns "noSuchObject" #366

penfold1972 opened this issue Feb 5, 2025 · 2 comments

Comments

@penfold1972
Copy link

penfold1972 commented Feb 5, 2025

Describe the bug
When updating from 1.2.16 to 1.2.22, I had some devices stop graphing, but spine 1.2.21 works fine. I packet capture the difference between polling the host between 1.2.21 and 1.2.22 and I see that the latter only polls the uptime and then sysDescr and stops. The packet capture shows the response is "noSuchObject" for either version.

To Reproduce
Steps to reproduce the behavior:

  1. I have compilied binaries spine-1.2.21 and spine-1.2.22 in /usr/local/spine/bin
  2. A symlink for /usr/local/spine/bin/spine points to spine-1.2.21 and this device's graphs populate as expected
  3. unlink spine && ln -s spine-1.2.22 spine # Change symlink to 1.2.22
  4. Graphs stop updating until poller is changed back.
  5. Spine appears to continue polling the remaining hosts

Expected behavior
I expect spine to receive the "noSuchObject" and continue gathering data like the versions previous to 1.2.22 did and continue to process the data.

Screenshots
If applicable, add screenshots to help explain your problem.

Server (please complete the following information):

  • OS: Rocky Linux
  • Version 9

Compiling (please complete the following information):

  • compiler: gcc version 11.5.0 20240719 (Red Hat 11.5.0-2) (GCC)
  • autoconf: autoconf (GNU Autoconf) 2.69
  • glibc: ldd (GNU libc) 2.34
  • source: release tarballs

Note: you can find glibc version by running ldd --version

Note: if source is github, please include last commit reference

Additional context
I am migrating a CentOS 7 server to Rocky Linux 9, and I originally had this issue on the CentOS 7. Since I am rebuilding the server and trying to get everything right, I have been able to dig into problem and narrow it down to what I have found. Currently, the production server is running cacti-1.2.27 with spine-1.2.16. It appears I just happened to upgrade from 1.2.16 when 1.2.22 was the current version, and put it on a TODO when I noticed the issue a few days later and just used the previous working version of spine.

No idea if this is helpful, but I captured a log of both versions running against the host and was able to get a diff between the logs. Here is a screenshot of the portion I think shows one (-) gathering all the data while the other (+) stops after it gets the results for '.1.3.6.1.2.1.1.1.0' which would agrees with the packet capture. (should "with Status[1]" be on it's own line in 1.2.22?)
Image

@penfold1972 penfold1972 changed the title Polling Stops when sysDescr returns "noSuchObject" Polling the Host Stops when sysDescr returns "noSuchObject" Feb 5, 2025
@penfold1972
Copy link
Author

I have figured out that the 7453f83 commit for issue #272 appears to be the change related to my issue. I tested by commenting out lines 445 and 446 in snmp.c and spine polling worked as before.

            if (vars->type == SNMP_NOSUCHOBJECT) {
                /* SET_UNDEFINED(result_string);
                status = STAT_ERROR; */
                snprint_value(temp_result, RESULTS_BUFFER, vars->name, vars->name_length, vars);
                SPINE_LOG_HIGH(("ERROR: No such Object for oid '%s' for Device[%d] with Status[%d] temp_result='%s'",  snmp_oid, current_host->id, status, temp_result));`

So it doesn't set the status to an error, but does generate the log and I see it several times in the debug:

grep temp_result ~/cacti/spine/testing/spine-1_2_21-xyz.txt
Total[0.0558] ERROR: No such Object for oid '.1.3.6.1.2.1.1.3.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0567] ERROR: No such Object for oid '.1.3.6.1.2.1.1.1.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0575] ERROR: No such Object for oid '.1.3.6.1.2.1.1.2.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0582] ERROR: No such Object for oid '.1.3.6.1.6.3.10.2.1.3.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0589] ERROR: No such Object for oid '.1.3.6.1.2.1.1.3.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0596] ERROR: No such Object for oid '.1.3.6.1.2.1.1.4.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0603] ERROR: No such Object for oid '.1.3.6.1.2.1.1.5.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0610] ERROR: No such Object for oid '.1.3.6.1.2.1.1.6.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0623] ERROR: No such Object for oid '.1.3.6.1.6.3.10.2.1.3.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'
Total[0.0631] ERROR: No such Object for oid '.1.3.6.1.2.1.1.3.0' for Device[68] with Status[0] temp_result='No Such Object available on this agent at this OID'

I've applied this change to my copy of 1.2.27 since that is the same version of cacti i am running for testing.

@penfold1972
Copy link
Author

I have submitted PR 367 with the change I tested on version 1.2.27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant