Mixing OTA and serial on development builds. Any of you get stuck on an old executable? #449
Replies: 7 comments 1 reply
-
Thanks figured i would then be the first to comment. Is propearly also due to this OTA. In short i wont use OTA now unless i am about to put the device up in the addict |
Beta Was this translation helpful? Give feedback.
-
I am having this problem with OTA as well. Thought I was going crazy when the firmware wasn't changing, but platformio was reporting a successful upload. I am seeing inconsistent behavior, some days Ill upload over OTA and serial with no issues. Sometimes one of my esp's will get 'stuck' and stop accepting new firmware. Verified this by changing FLASH_VERSION Did anyone ever find a fix to this? |
Beta Was this translation helpful? Give feedback.
-
Did anyone ever find a fix to this?
Yes. Since OTA is sooo much slower than a real download any and sometimes
sends you on that hours-long journey of wondering WTH?!?!, I just learned
to not do that. ("Doctor, Doctor, it hurts when I do that...") Since
"Arduino" is literally in the function names, it fits well with my snobbish
desire to avoid as much code from that ecosystem as I can. :-) From the
angry page of words I typed didn't show anyone else was ever affected, I
assumed I was just somehow uniquely blessed, but it seems like we have a
growing number of affected people.
I was also observing it in a time when our partition table was unstable, so
I tended to shirk it off.
My working theory is that it's writing into one partition but not then
booting from it, like some validation check is failing on the new
segment and it's not updating the active partition before rebooting.
{shrug} That seems to be way below the level where we seem to be operating
where we receive a packet and call handle() - it looks like it SHOULD BE
out of our hands.
I've poked through https://github.com/espressif/arduino-esp32/issues and
don't know why they're not buried in this issue since it seems like several
of us have seen this. I'd think if it was a library problem of this
severity, there would be much wailing there. The code in
https://github.com/espressif/arduino-esp32 seems quite actively maintained
(last updated 59 minutes ago. It's 5:44am my time...) and quite clueful
(maintenance activity, support of latest chips, etc.) given my generally
low expectation of any project bearing that name.
If anyone can produce a 100% reproducible test case that starts with a
blank (esptool erase_flash) device then proceeds through any given
combination of wired and wireless updates that results in it NOT working,
I'd help debug it. If you can script a failure, I'll chase it to the ends
of the earth. I'm just not able to justify chasing the test case.
What hardware are the two of you on?
…On Thu, Feb 8, 2024 at 8:32 PM David Woodward ***@***.***> wrote:
I am having this problem with OTA as well. Thought I was going crazy when
the firmware wasn't changing, but platformio was reporting a successful
upload. I am seeing inconsistent behavior, some days Ill upload over OTA
and serial with no issues. Sometimes one of my esp's will get 'stuck' and
stop accepting new firmware. Verified this by changing FLASH_VERSION
Did anyone ever find a fix to this?
—
Reply to this email directly, view it on GitHub
<#449 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD35MRH3XB3NGJ2OT4YLYSWDDVAVCNFSM6AAAAAA5KNQIHCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DIMJUGM2DA>
.
You are receiving this because you authored the thread.Message ID:
<PlummersSoftwareLLC/NightDriverStrip/repo-discussions/449/comments/8414340
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
Im running on a stock ESP-WROOM-32 Yeah I wish I knew a way to 100% reproduce this, its very inconsistant. Ive been uploading with serial and OTA intermixed for about 10 days now, and only 2 of the days I was seeing this issue. Some notes
RAM: [== ] 18.7% (used 61248 bytes from 327680 bytes) |
Beta Was this translation helpful? Give feedback.
-
Ill switch to serial for now, but Im planning to do a permanent install and wont have physical access to the devices. OTA would be very handy. |
Beta Was this translation helpful? Give feedback.
-
Okay I have a board right now thats in the 'stuck' state. It consistently won't take new firmware. I can ship you this one if you want to take a look @robertlipe . Send me an email with your address [email protected] |
Beta Was this translation helpful? Give feedback.
-
I'd rather have a formula for getting it stuck, but having the bug in
captivity is pretty important. I'll reach out to you. Thanx!
…On Fri, Feb 9, 2024 at 4:49 PM David Woodward ***@***.***> wrote:
Okay I have a board right now thats in the 'stuck' state. It consistently
won't take new firmware. I can ship you this one if you want to take a look
@robertlipe <https://github.com/robertlipe> . Send me an email with your
address ***@***.***
—
Reply to this email directly, view it on GitHub
<#449 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCSD36XGBRVD5TRAXLC6K3YS2RXTAVCNFSM6AAAAAA5KNQIHCVHI2DSMVQWIX3LMV43SRDJONRXK43TNFXW4Q3PNVWWK3TUHM4DIMRTHE4DO>
.
You are receiving this because you were mentioned.Message ID:
<PlummersSoftwareLLC/NightDriverStrip/repo-discussions/449/comments/8423987
@github.com>
|
Beta Was this translation helpful? Give feedback.
-
I just spent about 90 minutes in the twilight zone. I was debugging a visual issue which included a crash that I knew I'd fixed earlier. I ran an objdump --disassemble on my elf (sooo slow) and found the $PC was before the beginning of the start of my text section. Weird. I kind of shrugged that off as I'm usually not desparate enough to need the registers in a crash.
It finally hit me that I was staring at exactly the visuals of where I was an hour+ ago because I'd made some visual changes. My added debug prints in weren't printing. I finally put a return as the first line of Draw() and I still got the display I had (including the bug I'd fixed) from long ago. Cleaning .pio changed nothing.
I don't normally do OTAs on mesmerizer because it's WAY slower than a serial line. Earlier, I thought I'd give that a try. Nope. Still slow. So I just reset PLATFORMIO_UPLOAD_PORT back to $(ioreg -l | awk -F" '/IOCalloutDevice.*usbserial/ {P=$4}; END{print P}') and moved on.
It was much later that I realized that this might have been the point where I was trapped.
My workign theory is that my device was executing in one partition and my ~/.platformio/penv/bin/pio run --target upload -e mesmerizer was dutifully uploading to the OTHER partition.
I did a esptool.py -p $PLATFORMIO_UPLOAD_PORT erase_flash, followed by an upload and an uploadfs and suddenly, all my debugs and crazy "are you alive, color the screen red" testing and my empty Draw() call hit. I fixed all that back, clicked my heels, and poof, I was back in Kansas.
I dont KNOW that estting PLATFORMIO_UPLOAD_PORT to 192.168.2.165 and then running was my downfall, but the timeframe would have been about right and that was really the only thing I hadn't done a hundred other times while working on this during the edit/load/run cycle.
Have any of you had a case where the code you build and upload isn't the code you're actually running? It's a serious sanity check when that happens.
I've not pencil-whipped the OTA code. Is it possible that it's not setting the active partition to the partition that was most recently uploaded?
The good news is that my environment is now working sanely again.
The bad news is that I nuked the flash contents that would have probably let me debug this, It's bound to be some goofy case like when partition (A|B) is active and you get an OTA, the OTA marks (B|A) as active, but then the next serial upload doesn't and then continues to flash into the hardcoded offset frrom the partition table without consulting which is active and then there's nobody left to set the partition back because the new code isn't actually running.
That's my theory, at least.
Beta Was this translation helpful? Give feedback.
All reactions