-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arduino LMIC stops after some time (6-9 hours) and doesn’t sends data #968
Comments
Hi, Additional information as i think I have the same problem with an STM32L081CZ and an sx1272 radio. [2024-10-20 00:26:40.749] Current interrupts:1000 Or the next with bit more details: Current interrupts:1000 The ostime_t is a uint32_t, it should not be an overflow, neither became a negative value so I dont understand that behavior. |
Negative time values are expected and not a problem. The LMIC is coded taking that overflow into account. It's one of the two possible styles of handling time overflow. The classic one is using |
Thank you very much for your quick reply, I really appreciate it! |
Sorry you're having problems. Because other users don't have this problem (specifically, because MCCI has run many devices for many months at a time), I have to guess that it's a system problem (the combination of LMIC and the rest of the firmware on the system) rather than a bug that is localized to the LMIC. To solve these, one needs to look at the entire code base. I suggest that you file a separate ticket, and attach a complete code base that demonstrates the problem -- of course, if you don't want to disclose your full code, you may need to develop a toy version that shows the bug. Sometimes preparing the toy version makes the bug go away; and that also gives a clue as to what to look for. |
Thank You, I will do so! |
Please remember to tag me there, I am also having the problem. |
In the last few days I have been testing my node with the basic ttn-otaa example program that I have attached too, but the problem still exists. The behavior of the node is the same. The next is from the screenshot from the ChiprStack server DeviceData: It can be seen that the scheduled data transmission is missed after a time (it is usually 7-10 hours) and the node start uploading data once or twice a day in apparently random cases. The original frequency of data transmission was 10 minutes, but the phenomenon is the same in case of 5 minutes. as well. When restarting, everything starts from the beginning - that is the workaround applied by @mirhamza708 |
Once again. I'm sure everyone thinks they've given me an exact repro, but nobody has. It does not happen for me, with thousands of devices deployeed and a varient of applciations. In order to debug this, I need;
I know this might be too much work. But I won't be able to respond unless someone undertakes this. This is free software, folks: if you have a problem, the onus is partially on you to help replicate in order to get support. Best regards, |
I will try to share the requested information. I have not used TTN but I will set this up. I will try to share a complete example with the issue being properly elaborated. Thank you. |
Hi, Many thanks and Regards, |
Thank you for posting the files. I have reviewed the code. I see you have serial debug output enabled. Can you please also post the debug output when things start to malfunction? Best regards, --Terry |
Hi, Current interrupts:1000 Also, attached a whole debug log file containing when a node is starting until the malfunction happens. Thanks & Regards, |
Thank you for posting the log file. At line 1509, I see I see that ALL transmits in the recent path past prior to the failure are on channel 867,700,000 MHz. This behavior started at around line 269; prior to that it was using the other channels. I don't know why the LMIC started to stick on one channel. The device is transmitting at SF12. Based on the times printed, it appears that your message takes about 2.3 seconds of airtime. The fact that there are no more launches of the transmission strongly indicates that It is possible that on your BSP, something bad is happening to the LMIC's idea of time -- however, that wouldn't account for the missing text between 1511 and 1512. In addition, by the way, you have an error in your
suggesting that the sent was scheduled, even though it was not. The Try fixing the |
Sadly I have the same problem here: getting stuck in a loop after 6-9h. I added some code inside the engineUpdate function to help me tag the problem (basically printing NOIDLE) but it did not help.. the code is basically still the same as the original ttn-otaa example. |
Hi, I modified the do_send() function of my otaa sketch to the same like in the ttn-otaa.ino example:
The test with the sketch ends with the same result new log file is uploaded. However I have an important feedback regarding the time, when the problem occurs. [2024-11-14 20:55:49.721] Leaving onEvent There are no lost lines, logging is continuous when the system time changes its sign. Regards, |
@robepapp Thanks for the log. @Gooseman42 sorry, wanted to get v5 out, and so that took all my time this weekend. @robepapp Your code needs to capture (and print) the result of calling However, in this case, it probably didn't fail. The opmode shows that the LMIC has chosen a new channel ( I think the thing to do is confirm the path the code is taking. Please change line 2726 of LMIC_X_DEBUG_PRINTF("%"LMIC_PRId_ostime_t": next engine update in %"LMIC_PRId_ostime_t"\n", now, txbeg-TX_RAMPUP); to LMIC_DEBUG_PRINTF("%"LMIC_PRId_ostime_t": next engine update in %"LMIC_PRId_ostime_t"\n", now, txbeg-TX_RAMPUP); I'm pretty certain that this path is being taken. If you are feeling brave, please add a couple of extra items to that print: LMIC_DEBUG_PRINTF("%"LMIC_PRId_ostime_t": next engine update in %"LMIC_PRId_ostime_t
". globalDutyRate=%" LMIC_PRId_ostime_t
" globalDutyAvail=%" LMIC_PRId_ostime_t
" txend=%" LMIC_PRId_ostime_t"
" txcChnl=%d "
"band=%d "
"band.avail=%" LMIC_PRId_ostime_t
"\n",
now, txbeg-TX_RAMPUP, LMIC.globalDutyRate, LMIC.globalDutyAvail, LMIC.txend,
LMIC.txChnl, LMIC.channelFreq[LMIC.txChnl] & 0x3,
LMIC.bands[LMIC.channelFreq[LMIC.txChnl] & 0x3].avail); This will show exactly what is going into the incorrect decision and will probably lead to a patch. |
Hello guys,
I need help with this because I am trying for a month now to fix this. I have a node device and I have done all the configuration according to the github documentation of MCCI
I use RFM96 chip with stm32f103c8t6 and the pinmap is as follows
my platformio.ini file is as follows
The devices functions properly just the Lorawan side of things doesnt seem right, here is an image from chirpstack server.
a screenshot from the serial port the join at 9:07 pm is here:
I don’t understand why the time gets overflown? its a signed integer in the library, can someone guide why used a signed integer for time keeping? I saw a comment in the source that the negative means the time has already passed. Its confusing so please if someone can guide I will be really thankful.
here is an image from the chirpstack server
I have placed an 8 hours reset of the lorawan stack so it does this every 8 hours just to avoid a complete stop:
Without the 10% clock error (clock is setup at 24MHz using the 8MHz external oscillator and passed through the pll mul) set the join procedure takes 10-15 minutes. The hardware was designed by someone else so I can't really say much about it but the question is if it runs for 7-8 hours what's the reason that makes it stall.
Thank you guys, and if anyother information is required then please let me know.
The text was updated successfully, but these errors were encountered: