-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kernel panic on Arduino v2.0 #181
Comments
@olicooper sorry for the trouble. Thanks for the detailed report. Must be an portability problem revealed by the new compiler. The code looks correct, so you might have found a compiler bug. If you can get the assembly-language instructions preceding the fault, it will help with diagnosis. I don't have a test platform, unfortunately. |
Thanks for the quick response. I will have to look in to how I can the the ASL for you because I don't have a debugging facility for the device and I haven't done it before. |
We haven't tested on Arduino v2.0.0, and I unfortunately don't even use PlatformIO, though of course we support it -- it doesn't have support for our BSPs. And.. this is the first I've heard of Arduino 2.0. So I'm really out in the weeds and not able to invest much time short term. You can possibly use |
I see this is Arduino v2 -- which is still beta. That's good, because if you can identify a compiler issue, they can back up to a working release or try to get a fix. Since freq is a uint8_t &, the compiler should probably not be making assumptions about alignment; but from the fault is clearly is. (It's doing a store-halfword to an odd addresss, almost certainly.) This happens from time to time; the compiler writers find a new trick that ought to work, is arguably portable; and then we find out where that breaks existing code. There is in fact a union involved in this part of the code, and it's possible that this is confusing the compiler or me into thinking that an operation is safe. No pointers involved so I think the compiler has dropped a stitch, but it's possible I've once again been too clever by half. Were there no templates involved, I would say that you should try splitting out the inline (since you're on an ESP32, there's no real need). But... there's a template... no wonder the compiler is doing the wrong thing. But the template and channel numbers were only to do type checking, so we can refactor. In the header file: // outside the template, e.g. at line 275:
static bool setFrequencyRaw(uint8_t *pFreq, unsigned iCh, uint32t_t frequency); Then change setFrequency: bool setFrequency(uint8_t (&freq)[nCh * 3], unsigned iCh, uint32_t frequency)
{
// check the template parameter while it's in scope.
if (iCh > nCh)
return false;
// call an external function to make it harder for optimizer to do the wrong thing (for test)
return Arduino_LoRaWAN::setFrequencyHelper(freq, ich, frequency);
} And then in a separate C++ file: // this is the former body of setFrequency(), with &freq changed to pFreq
/* static */
bool Arduino_LoRaWAN::setFrequencyRaw(uint8_t *pFreq, unsigned iCh, uint32_t frequency)
{
const uint32_t reducedFreq = frequency / 100;
if (reducedFreq > 0xFFFFFFu)
return false;
auto const chPtr = pFreq + iCh * 3;
chPtr[0] = uint8_t(reducedFreq >> 16);
chPtr[1] = uint8_t(reducedFreq >> 8);
chPtr[2] = uint8_t(reducedFreq);
} If this fixes the problem, it may make it easier for the compiler people to either tell me what I've gotten wrong, or for them to figure out what has gone wrong. |
Sorry found an error in review; moved an if from the external function to the inline. |
I've managed to generate some assembly code for you but there are too many lines to strip sensitive info so I have emailed them to you instead. I hope that is okay? How to get assembly codeFor community reference, I was able to use Espressif's SDK to get the assembly code. It looks like it outputs the source code too. There are multiple ways of doing it but I used the following methods. PlatformIO (the one I used)
Command lineEither run the command on the
|
@olicooper email is fine, but it hasn't come yet. Checked spam filters also, nothing there. |
Sorry about that, I think gmail is causing the issue. I've resent it. |
got it. |
can you send the corresponding link map? |
The problem is not what I thought. The code for doing the store is correct. Removing the debug
Those are three byte writes to the pointer. So I need the link map to find exactly where things are failing. But... could this be related to initialization and switching back and forth (described in other report)? The LMIC does not do much checking for invalid state, and since the compiled code is correct, my guess is that some preconditions we are depending on are not true. |
In my code I read from NVS storage during boot to decide if I should start the lora 'module' or not. If I have enabled the module it will start normally by calling the required |
To simplify things I have created a simple reproducible example for you which can be found here on the
|
I know you said that the problem wasn't what you originally thought, but I tried the modifications you suggested above #181 (comment) and it fixed the issue!.. so I have updated the lmic code here https://github.com/olicooper/arduino-lorawan/tree/arduino-v2-fix and I will try the changes in my main project as I haven't fully tested it yet. Great work! What's the next step? Is it possible for a patch be written to your library to help in the interim if this is the culprit? |
Well, there's no real problem with a patch, but... it looks like a compiler issue to me, and should be brought to their attention, as it seems unlikely that this would only affect my code (out of the whole world). First, however, I can believe, especially as this is a beta project. Since the projects are all open source, it should not be necessary to do the usual heroic efforts needed to demo a compiler issue to the maintainers. (With global optimization, anyway, tomorrow they could apply the same change globally and crash the library again.) |
Hi @olicooper and @terrillmoore, I'd like to point out one thing from the original crash report:
Here EXCVADDR points to the memory address which the load or store instruction has tried to access. On the ESP32, 0x40013b1e is an address inside ROM, in the middle of .text section. It looks very odd that the code of this library would be performing a load or especially store from this address. It doesn't look like an alignment issue, because even if the alignment was correct, it wouldn't make sense to access (read or write) this ROM location. To me it looks like memory corruption, where some pointer gets overwritten by a garbage value, and then the value gets dereferenced. @olicooper since you have a reproducer for this issue, could you please upload the ELF file without the workaround and the matching crash dump? I am sorry that we aren't able to reproduce based on your example, because it requires PlatformIO and seemingly a LoRa module... |
@igrr I mentioned on the other ticket that this should be the correct ELF, but I will recompile the elf again against the Arduino |
Thanks, I have checked the ELF you posted, and like @terrillmoore has observed here I don't see anything wrong with code generation, at least as far as alignment is concerned. The PC at the point of the exception points to:
which is a store instruction, which should most definitely not be trying to write to an area in ROM. I have tried tracing where the corrupted value could come from by looking at the assembly code, but didn't get too far unfortunately. A compiler bug is a rare occasion but if there is a strong hint that one is present we will definitely investigate this. However, at the moment it looks more like heap corruption. Upgrading from one Arduino-esp32 and IDF release to another can change many things, including the layout of objects in heap. This might cause, for example, an out-of-bounds write to show up in such a way, while previously it could be corrupting some padding space or a mostly-unused variable. |
Thanks for posting the two ELF files, I will definitely look more into the differences between them. |
@terrillmoore Quick update, I have tested against an the v4.0.2 release of PlatformIO today. It still seems to be broken, but I get an assert issue instead of a kernel panic now.. espressif/arduino-esp32#5783 (comment) The assert that is failing is this: https://github.com/mcci-catena/arduino-lmic/blob/8d378ea410887d3fb08ea2c9acce95cc4c047788/src/lmic/radio.c#L1164 |
The assert issue is due to calling |
Missing return statement on Arduino_LoRaWAN::setFrequency - causes compile errors. Could be related to an earlier issue I raised? mcci-catena#181
The library worked fine on Arduino 1.0.6, but after upgrading to v2.0.0 the device reboots on the next loop cycle after calling
Arduino_LoRaWAN::SendBuffer()
. I have aLora
wrapper class which simply callsArduino_LoRaWAN::loop()
- it is failing at this point.environment:
mem-optimized
branch)code
core dump
Related source code
arduino-lorawan/src/lib/arduino_lorawan_sessionstate.cpp
Line 190 in caceeb4
arduino-lorawan/src/lib/arduino_lorawan_sessionstate.cpp
Line 114 in caceeb4
arduino-lorawan/src/Arduino_LoRaWAN.h
Lines 359 to 371 in caceeb4
The text was updated successfully, but these errors were encountered: