Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Traceroutes via ROUTER_LATE node dont end up in the TX Queue after modification (to add self hop in rebroadcast) on return path! #5951

Open
Talie5in opened this issue Jan 28, 2025 · 8 comments
Labels
question Further information is requested

Comments

@Talie5in
Copy link
Contributor

Talie5in commented Jan 28, 2025

Category

Other

Hardware

Linux Native

Firmware Version

2.5.20

Description

When doing a traceroute via a ROUTER_LATE node, traceroutes are seen leaving, and coming back, but not modifying and putting itself into the TX queue to rebroadcast, therefor traceroute never gets returned to source node - so a return path only issue.

Captured in DEBUG Log on node setup in ROUTER_LATE

TR_Repose_NoRequeue.txt

Relative Isolated Test Environment,
If i take source node for a walk to where it does in fact get direct sight to another CLIENT node or the destination node directly, traceroute works from a RAK4631 in CLIENT_MUTE and receives the response.

Relevant log output

@Talie5in Talie5in added the bug Something isn't working label Jan 28, 2025
@Talie5in Talie5in changed the title [Bug]: Traceroutes via router_late dont end up in the TX Queue after modification (add self hop in rebroadvast) [Bug]: Traceroutes via ROUTER_LATE node dont end up in the TX Queue after modification (to add self hop in rebroadcast) Jan 28, 2025
@Talie5in Talie5in changed the title [Bug]: Traceroutes via ROUTER_LATE node dont end up in the TX Queue after modification (to add self hop in rebroadcast) [Bug]: Traceroutes via ROUTER_LATE node dont end up in the TX Queue after modification (to add self hop in rebroadcast) on return path! Jan 28, 2025
@erayd
Copy link
Contributor

erayd commented Jan 28, 2025

Thanks @Talie5in - I will try to tackle this tomorrow or Thursday. I need to dig into the traceroute code anyway for #5534, so this is good additional motivation for me to do so!

@erayd
Copy link
Contributor

erayd commented Jan 28, 2025

Have managed to replicate the lack of traceroute response via ROUTER_LATE. Now I just need to figure out why it's happening...

@GUVWAF
Copy link
Member

GUVWAF commented Jan 30, 2025

@Talie5in Which exact commit were you using when testing this? I can't find the string "Incoming msg will be filtered, from" from your log in the source code. So I'm not sure where that is coming from, but a bit later it mentions cancelSending id=0x827fe923, removed=1 meaning it removed it from the Tx queue.

@GUVWAF
Copy link
Member

GUVWAF commented Jan 30, 2025

It seems to be coming from your modified firmware:
https://github.com/Talie5in/mt-device-firmware/blob/02b2ee8883663618a0c6319fc37fe137a6d2ac25/src/mesh/Router.cpp#L574

I believe this is your issue. You're canceling a packet in the Tx queue when another arrives. For ROUTER_LATE this is more likely to happen as it delays the rebroadcast.

@GUVWAF GUVWAF added question Further information is requested and removed bug Something isn't working labels Jan 30, 2025
@erayd
Copy link
Contributor

erayd commented Jan 30, 2025

I wonder what it was that I was reproducing then? Because I can get the behaviour to recur here.

@Talie5in
Copy link
Contributor Author

Talie5in commented Feb 3, 2025

@GUVWAF Yup, appears that is the culprit in those logs - just got around to retesting this (back to 2.5.20.4c97351) and i do eventually get the TR back (which is valid and inline with ROUTER_LATE) - i was switching between that build and the original meshtastic release while testing things - didnt realise I didnt do it on the right build at the time.

Apologizes for delayed response.

However I am still getting some that just never make it back (but do see them hit the device in the debug logs, just never make it to the source device), but curious if that's just hitting some kind of "took to long for a result so I stopped tracking the traceroute".

@erayd Not sure if you've come across anything further?

If I can find more time in the coming week i'll trail logs between router_late on the roof and the node on my desk and see if I can line them up for a submission.

@erayd
Copy link
Contributor

erayd commented Feb 3, 2025

Not sure if you've come across anything further?

Not yet, but I haven't yet had the opportunity to watch the logs of a ROUTER_LATE in a location where there's no other path back to my test node. Downside of having a mesh with quite good coverage.

It's easy enough to engineer the no-response thing by just going to one of the infill areas. But I can't watch the logs at the same time. Need to find a time to enlist help I think. Get someone else to run the traces while I sit up at the RL site and watch the logs.

@todd-herbert
Copy link
Contributor

Just thinking out loud: I wonder if you could reproduce it at home with a three node test setup. The three nodes on their own frequency slot, tx power turned down, with nodes A and C placed far enough apart to ensure that they hop through B.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

4 participants