PoC: config write guard #2360

schlimmchen · 2024-10-17T20:13:29Z

I was trying to tackle application of changed Dynamic Power Limiter settings in my feature branch implementing support for multiple inverters managed by the DPL. Then I noticed, that a bunch of nasty problem would go away if I knew that I could trust the config being consistent, i.e., if the config was only written synchronously with the main loop.

The latter ist currently not true: The config is usually manipulated in the context of the async webserver task, while it is read in the context of the main loop.

I wanted to keep the overhead when reading the config to a minimum, so locking a mutex each time the config shall be read is out of the picture, IMHO.

Writing the config happens rather seldom. So it should be okay if that takes a couple of milliseconds longer -- in the worst case.

My approach: Writing the config is only possible when acquiring a WriteGuard, which locks the "config write mutex" and waits for the signal (from the main loop) to start writing. The signal is needed to make writing the config synchronous with the main loop, and the mutex is needed to use the condition_variable but also to lock multiple writers from each other (which is less of a concern, to be honest).

I like this approach, because I like RAII. However, I am not sure if this is the best approach. Another idea would be to enqueue a callback which is then executed within the context of the main loop. We would need to copy and bind to that callback a significant amount of data, which seems off-putting. Also, access to the queue of callbacks then also need to be locked by a mutex.

@tbnobody I would like to know your opinion on this. Do you like this approach?

@LennartF22 We talked about this briefly. What are your thoughts on this approach?

LennartF22 · 2024-10-17T21:03:48Z

@schlimmchen I have not looked at the changes in detail yet, but it's definitely an improvement.

There are more synchronization issue between the WebApi and the main loop though, like here (if I'm not mistaken):

OpenDTU/src/WebApi_inverter.cpp

Line 144 in dc5eb96

auto inv = Hoymiles.addInverter(inverter->Name, inverter->Serial);

tbnobody · 2024-10-18T06:33:01Z

Looks great! I am just curious if we would ever have to write the config in the context of the main loop (as it will generate a dead lock). Currently I cannot imagine any use case.

schlimmchen · 2024-10-18T19:42:42Z

Thanks for your feedback!

I am just curious if we would ever have to write the config in the context of the main loop (as it will generate a dead lock).

Yes. I saw that, too. If one would inadvertently try to do this, the deadlock would happend when testing the change for the first time, and the config would not be saved at all, so it should be noticed early. And if we need to do that, we can adjust the WriteGuard c'tor to see whether the current task it the main loop task and then only lock the mutex, not waiting for the signal.

Let me polish and double-check this, then I will un-draft this PR (next week).

schlimmchen · 2024-10-18T19:43:57Z

@tbnobody are you passionate about

auto const& config = Configuration.get();

versus

const auto& config = Configuration.get();

or whether it's consistent across the code base?

the configuration write guard is now required when the configuration struct shall be mutated. the write guards locks multiple writers against each other and also, more importantly, makes the writes synchronous to the main loop. all code running in the main loop can now be sure that (1) reads from the configuration struct are non-preemtive and (2) the configuration struct as a whole is in a consistent state when reading from it. NOTE that acquiring a write guard from within the main loop's task will immediately cause a deadlock and the watchdog will trigger a reset. if writing from inside the main loop should ever become necessary, the write guard must be updated to only lock the mutex but not wait for a signal.

schlimmchen · 2024-10-22T19:29:05Z

I am still happy with this. I double checked that generating a DTU serial works as expected using an erased ESP32:

[2024-10-22 21:27:07.684] rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
[2024-10-22 21:27:07.690] configsip: 0, SPIWP:0xee
[2024-10-22 21:27:07.690] clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
[2024-10-22 21:27:07.695] mode:DIO, clock div:2
[2024-10-22 21:27:07.701] load:0x3fff0030,len:1184
[2024-10-22 21:27:07.701] load:0x40078000,len:13232
[2024-10-22 21:27:07.707] load:0x40080400,len:3028
[2024-10-22 21:27:07.707] entry 0x400805e4
[2024-10-22 21:27:08.379] E (694) esp_core_dump_flash: No core dum���ѥѥ���found!
[2024-10-22 21:27:08.379] E (694) esp_core_dump_flash: No core dump partition found!
[2024-10-22 21:27:08.398] 
[2024-10-22 21:27:08.398] Starting OpenDTU
[2024-10-22 21:27:08.398] Initialize FS... E (13) esp_littlefs: ./components/esp_littlefs/src/littlefs/lfs.c:1367:error: Corrupted dir pair at {0x0, 0x1}
[2024-10-22 21:27:08.409] 
[2024-10-22 21:27:08.409] E (20) esp_littlefs: mount failed,  (-84)
[2024-10-22 21:27:08.415] E (23) esp_littlefs: Failed to initialize LittleFS
[2024-10-22 21:27:08.415] [    37][E][LittleFS.cpp:98] begin(): Mounting LittleFS failed! Error: -1
[2024-10-22 21:27:08.426] failed... trying to format...E (46) esp_littlefs: ./components/esp_littlefs/src/littlefs/lfs.c:1367:error: Corrupted dir pair at {0x0, 0x1}
[2024-10-22 21:27:08.437] 
[2024-10-22 21:27:08.437] E (47) esp_littlefs: mount failed,  (-84)
[2024-10-22 21:27:08.437] E (51) esp_littlefs: Failed to initialize LittleFS
[2024-10-22 21:27:08.475] failedReading configuration... [   105][E][vfs_api.cpp:105] open(): /littlefs/config.json does not exist, no permits for creation
[2024-10-22 21:27:08.486] Failed to read file, using default configuration
[2024-10-22 21:27:08.486] Check for default DTU serial... generate serial based on ESP chip id: 199980103760... done
[2024-10-22 21:27:08.754] done
[2024-10-22 21:27:08.754] Reading PinMapping... [   384][E][vfs_api.cpp:105] open(): /littlefs/pin_mapping.json does not exist, no permits for creation
[2024-10-22 21:27:08.767] using default config done
[2024-10-22 21:27:08.767] Initialize Network... done
[2024-10-22 21:27:09.039] Setting Hostname... Initialize NTP... done
[2024-10-22 21:27:09.039] Initialize SunPosition... done
[2024-10-22 21:27:09.044] Initialize MqTT... done
[2024-10-22 21:27:09.044] Initialize WebApi... done
[2024-10-22 21:27:09.054] Initialize Display... done
[2024-10-22 21:27:09.054] Initialize LEDs... done
[2024-10-22 21:27:09.054] Initialize Hoymiles interface... Invalid pin config
[2024-10-22 21:27:09.060] Switch to WiFi mode
[2024-10-22 21:27:09.064] Setting Hostname... done
[2024-10-22 21:27:39.403] Disable search for AP... done

mathieucarbou · 2024-10-23T10:10:39Z

include/Configuration.h

    bool read();
    bool write();
    void migrate();
-    CONFIG_T& get();
+    CONFIG_T const& get();


this getter could become problematic: how it ensures that reads going through it will be protected by the mutex ?
What if another task / core modifies the config while another one calls this method to access a value ?

yes, there is no assurance. I did not want to create overhead when reading the config structure. we assume the config structure is only read in context of the main loop. since writing is synchronized with the main loop using the changes in this PR, there is no issue when reading the structure in context of the main loop. however, other tasks also do read the structure, and indeed, those reads are still not protected through the changes in this PR. IMHO, that's a next step. it's going to be a pain to address this...

No that's not complicated: the config class just has to expose the mutex (or 2 methods) and where you read the config in the code, you lock the mutex before then unlock it.

The ESP-IDF and Arduino Core are full of designs like that, especially in the LWIP layer which now require to lock/unlock all operations since Matter was introduced.

I did not write it was complicated, just a pain 😉 What I meant was: large changeset. If we want to use a lock guard as well, we also need an additional scope, meaning indenting code, meaning even larger changeset. And things do indeed get complicated, or at least nasty, if the mutex must be released before a function is called, as to avoid recursive locking. I assume recursive locks are to be avoided, especially since we also use other mutexes and deadlocks are looming once we need to juggle multiple locks simultaneously.

The ESP-IDF and Arduino Core are full of designs like that

You are suggesting that we should not be afraid of the overhead of locking the same mutex dozens of times in the main loop in every round of the loop? Usually, I am very liberal when it comes to trading some performance for new and useful features, but locking mutexes? Am I paranoid?

I agree!
All depends on who reads, when and at which frequency.
I don't know enough the app internally.
Maybe otherwise returning copies or partial copies of the config when reading would be an option also
I was just highlighting the fact that there is no protection theorically.
But if in practice it never happens then it's good.

stefan123t mentioned this pull request Oct 18, 2024

Memory corruption when saving network settings #2313

Open

4 tasks

schlimmchen marked this pull request as draft October 18, 2024 19:42

schlimmchen force-pushed the config-write-guard branch from 9f4f4fa to b1edb13 Compare October 22, 2024 19:15

schlimmchen marked this pull request as ready for review October 22, 2024 19:29

mathieucarbou reviewed Oct 23, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PoC: config write guard #2360

PoC: config write guard #2360

schlimmchen commented Oct 17, 2024

LennartF22 commented Oct 17, 2024

tbnobody commented Oct 18, 2024

schlimmchen commented Oct 18, 2024

schlimmchen commented Oct 18, 2024

schlimmchen commented Oct 22, 2024

mathieucarbou Oct 23, 2024 •

edited

Loading

schlimmchen Oct 27, 2024

mathieucarbou Oct 27, 2024

schlimmchen Oct 27, 2024

mathieucarbou Oct 27, 2024

PoC: config write guard #2360

Are you sure you want to change the base?

PoC: config write guard #2360

Conversation

schlimmchen commented Oct 17, 2024

LennartF22 commented Oct 17, 2024

tbnobody commented Oct 18, 2024

schlimmchen commented Oct 18, 2024

schlimmchen commented Oct 18, 2024

schlimmchen commented Oct 22, 2024

mathieucarbou Oct 23, 2024 • edited Loading

Choose a reason for hiding this comment

schlimmchen Oct 27, 2024

Choose a reason for hiding this comment

mathieucarbou Oct 27, 2024

Choose a reason for hiding this comment

schlimmchen Oct 27, 2024

Choose a reason for hiding this comment

mathieucarbou Oct 27, 2024

Choose a reason for hiding this comment

mathieucarbou Oct 23, 2024 •

edited

Loading