-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] coreMQTT keep alive handling fails and never reconnects #48
Comments
Hey @lhammond, thanks for submitting this issue! I've reached out to the team that works directly with coreMQTT and with the ESP32 boards to see if we can get to the bottom of what is causing these issues. |
Hello :) and thank you.
We have a small batch of production hardware that needs this fix and are
eager to support resolution.
cheers from our team!
…On Tue, Sep 5, 2023 at 8:12 PM Soren Ptak ***@***.***> wrote:
Hey @lhammond <https://github.com/lhammond>, thanks for submitting this
issue!
It appears this is an issue many people are running since #41
<#41>, #45
<#45>, #46
<#46>, and #47
<#47> all appear
to similar issues with the coreMQTT connection to the AWS IoT broker.
I've reached out to the team that works directly with coreMQTT and with
the ESP32 boards to see if we can get to the bottom of what is causing
these issues.
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABYBYUCXHM3V6WUDJPBR7LXY65XFANCNFSM6AAAAAA4MCFBUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@Skptak Is there anything we can do to help push towards resolution? Should I be monitoring this situation in another place? Thank you! |
@lhammond, I was not able to reproduce your problem and here is a small sectional screenshot of my logs For tour reference I followed this readme: https://github.com/FreeRTOS/iot-reference-esp32c3/blob/2dccbcad1a0e54ec2e32cc242d4bf4f4ab6c1274/GettingStartedGuide.md |
Can you also please passte your skdconfig file for s3 |
@rawalexe Have you tested it over a long period of time? |
Hi @rawalexe you can see the version at the top of this issue thread .. v202212.00-23-gd25036b You will see the issue I'm experiencing in the screenshot in the original post. If left alone, the device eventually disconnects and continues to output the "no command structure" forever. |
Hey @lhammond, sorry for the delay in getting back to you. The team has been looking into this issue to try and provide support. I've ordered an ESP32-S3 so I can try and replicate your exact environment as we can't seem to replicate this issue on the ESP32-C3. While I wait for the board to get here I'm wondering if you tried this potential fix that @ActoryOu mentioned in #46?
I'm wondering if the timeout on the TLS transport send/receives might be what is causing the MQTT agent to go down. Thanks again for your patience with this! |
@gavin-hy what MCU are you running? |
Hey @Skptak .. I'm away from my lab for a day and will try the potential fix you mention upon return. Thanks! |
ESP32-C3 |
Hello @lhammond and @gavin-hy |
great, thank you very much.
…On Thu, Sep 14, 2023 at 1:47 PM rawalexe ***@***.***> wrote:
Hi @rawalexe <https://github.com/rawalexe> you can see the version at the
top of this issue thread .. v202212.00-23-gd25036b I am using the LED pub
sub demo, not OTA.
You will see the issue I'm experiencing in the screenshot in the original
post. If left alone, the device eventually disconnects and continues to
output the "no command structure" forever.
I am running all the demos, if you look into my attached screenshot. I'll
try to replicate your issue by just running the temp sub pub over long
period of time.
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABYBYSQFEZZZ566DJEWGRDX2M7MDANCNFSM6AAAAAA4MCFBUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@lhammond can you please provide your whole skdconfig file for S3. With your endpoint removed. So that I have a 1-1 for replication your issue for S3. Thank you, |
@Skptak there was no change in behavior by changing the TLS Transport Send / Receive timeout to 10000 |
@rawalexe yes, I have made some modifications. How can I send you the zip file? |
I am not using OTA demo nor S3 .. do you still need the sdkconfig? |
@rawalexe @anubhavrawal It's too big to email, I just shared a google drive link to your email .. let me know if you can't down load it. I'm happy to get on a google meet if you'd like. My edits were intended to comment out the publish loop ( not using a temp sensor ) and add a few helper functions to control a neopixel 16 ring. Here's my git status ![]() |
@lhammond , Thank you for the file, I was able to download it and will try to replicate it today. I'll keep you posted on my progress Best Regards, |
great thanks @rawalexe
…On Tue, Sep 19, 2023 at 12:07 PM rawalexe ***@***.***> wrote:
@lhammond <https://github.com/lhammond> ,
Thank you for sharing the code, I was running all the demos to see if any
other fail as well. However, for my other runs I disabled the demos from
menuconfig and proceeded with the possible replication process of killing
the internet connection.
Thank you for the file, I was able to download it and will try to
replicate it today. I'll keep you posted on my progress
Best Regards,
AR
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABYBYSCSLJYSKIUTIPHG53X3G7NVANCNFSM6AAAAAA4MCFBUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@lhammond If you try to use the tagged version at commit Best Regards, |
@rawalexe I will try with the versions above and let you know |
@rawalexe I am preparing to test with the new versions. I did want to point out that I am using a NeoPixel ring with (RMT - Addressable LED ) .. this option does not appear in menuconfig for commit 2dccbca. I'm guessing all of that functionality is implemented in the the demo's .c or app_driver.c and I can port it over. ![]() |
@rawalexe After adding I am seeing the below ![]() |
@rawalexe I kept the publish while(true) loop but commented out the logic and it resolved the above sensor-related error. I have the pub/sub temperature LED demo running now with the versions you requested. I started it at 3:03 PM EST .. going to monitor it long term. |
@rawalexe there is no TLS timeout in this version .. but maybe CONNACK is the same .. I made these changes and rerunning the test ![]() |
@rawalexe @anubhavrawal @Skptak The versions above with a CONNACK of 10000 has been running for two days. I commented out the publish logic inside the while(true) loop. So the question is, do I back port the LED demo logic to this version or is there a plan to fix latest branches to address the connectivity issue? thanks |
Hello @lhammond, Best Regards, |
@rawalexe ok, I'll see what I can do. I need to push these production devices out asap, so will probably backport the LED control logic first and will try to find some time to look around for the connectivity issue. Do you have an idea of which repo to look in? Is it a submodule or in this repo? Would you guys be looking to apply any fixes to version 5.x? |
The repo is aimed to work with the latest esp-idf. But after observing the issues for a while it looks like having a single submodule esp-idf might be a better idea and support for latest esp-idf will be at best effort. The next fixes will be to ensure full compatibility with v5.x. |
I can confirm that the commit d25036b is definitely the cause of it not reconnecting. I had previously reported here #34 (comment), when it was still a patch. After reverting the changes to the previous version of the agent manager, the device reconnected on any timeout or disconnection. |
Hello @lhammond @txf-, https://github.com/rawalexe/iot-reference-esp32c3/tree/newEsp It has few updated instructions and esp-idf v5.1.1 submodule. The are some build warnings but will be improved further. Best Regards, |
I'm UTC -5 and will do first thing tomorrow AM.
awesome that you did this.
…On Sun, Oct 1, 2023 at 6:01 PM Anubhav Rawal ***@***.***> wrote:
Hello @lhammond <https://github.com/lhammond> @txf-
<https://github.com/txf->,
I have created this repo and tested out on my local device, can either of
you test out to fit in your use cases?
https://github.com/rawalexe/iot-reference-esp32c3/tree/newEsp
It has few updated instructions and esp-idf v5.1.1 submodule
Best Regards,
AR
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABYBYWVDWO7JNWUMS3TAB3X5HR2TANCNFSM6AAAAAA4MCFBUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
The reconnection issues were fixed by the reversion of the optimizations in core_mqtt_agent_manager.c. I can't actually tell what changes were made in newEsp that affects this. Is this repo just adjustments to make it work with idf 5.x? |
Yes the changes are the documentation on using Amazon's version of FreeRTOS and submodule to latest esp-idf. I did not have any problem building the project or running them so am looking for verification that this works on the previously problematic scenarios. Best Regards, |
I'm working on this now .. i'm using a idf.py I had installed already. Do
I need to install 5.x tools?
…On Sun, Oct 1, 2023 at 6:51 PM Anubhav Rawal ***@***.***> wrote:
Yes the changes are the documentation on using Amazon's version of
FreeRTOS and submodule to latest esp-idf. I did not have any problem
building the project or running them so am looking for verification that
this works on the previously problematic scenarios.
Best Regards,
AR
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABYBYSZKR5ANYSE3GPRGCLX5HXVVANCNFSM6AAAAAA4MCFBUU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I do not see the Amazon kernel option in newEsp branch. Screenshot of
menuconfig below
[image: image.png]
On Tue, Oct 3, 2023 at 3:46 PM C.L. Hammond ***@***.***>
wrote:
… I'm working on this now .. i'm using a idf.py I had installed already. Do
I need to install 5.x tools?
On Sun, Oct 1, 2023 at 6:51 PM Anubhav Rawal ***@***.***>
wrote:
> Yes the changes are the documentation on using Amazon's version of
> FreeRTOS and submodule to latest esp-idf. I did not have any problem
> building the project or running them so am looking for verification that
> this works on the previously problematic scenarios.
>
> Best Regards,
> AR
>
> —
> Reply to this email directly, view it on GitHub
> <#48 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AABYBYSZKR5ANYSE3GPRGCLX5HXVVANCNFSM6AAAAAA4MCFBUU>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Hello @lhammond , I would recommend using the submoduled esp-idf for standardization purpose but any 5.0+ idf should mostly behave the similar. I am sorry but I was not able to see the attached image within the comments as it only shows like Thank you Best Regards, |
@rawalexe ok, finally got it sorted and just started a long running test. stay tuned! |
That's. quite unfortunate, Give me some time I'll spend some time to see if I we can fix this. Best Regards, |
Hi @rawalexe .. I'm back on this project again. Have you made any progress? |
Hello @lhammond, |
Hello @lhammond , Best Regards, |
@rawalexe I just kicked off a test using that commit hash.
I followed the esp-idf setup from here : https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/get-started/linux-macos-setup.html
…On Fri, Nov 17, 2023 at 7:27 PM Anubhav Rawal ***@***.***> wrote:
Hello @lhammond <https://github.com/lhammond> ,
After talking with espressif, they mentioned that adding process loop was
indeed a known problem and noticed that the file you sent us didn't contain
the commit reverting the process loop, commit id :
f4fe11e. I apologize to ask you to run
these on different condition but just want to make sure that the known
issues aren't creating any problems. Can you please make sure that you are
using the latest changes in the main and still facing the issues? I
attempted these changes on my device, didn't replicate test for long hours
though and the demo ran as expected for 30 mins or so.
Best Regards,
AR
—
Reply to this email directly, view it on GitHub
<#48 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABYBYQYVBYXGE6SCVELNM3YE76GJAVCNFSM6AAAAAA4MCFBUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJXGI4DGOJXGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I provided the commit id to make sure that it's included in with the repo you are testing with. if that commit id is in your git log history your code should run without any problem. Now that you are actually running the demo, please let us know how it goes. If it fails can you also make sure that your internet connect isn't down by visiting a website, just in case |
@rawalexe The LED/temperature demo at commit hash at f4fe11e has been running for about 3.5 days. I have not yet tried pulling the AP's network cable to check behavior, but this is encouraging. I will try that test sometime this weekend. If I understand your message, I need to make sure that commit has is in git log. I will try with main now. |
Hello @lhammond, Best Regards, |
As there is no further concern from you, I am closing this issue as resolved, if the problem persists please feel free to reopen the issue or open a new one. |
Describe the bug
Please provide a clear and concise description explaining the bug.
System information
Expected behavior
Expected behavior would be for the MQTT subsystem to continue retries until reconnected.
Screenshots or console output

Steps to reproduce bug
Example:
1. "I am using project [ ... ], and have configured with [ ... ]"
2. "When run on [ ... ], I observed that [ ... ]"
Code to reproduce bug
The text was updated successfully, but these errors were encountered: