-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tizen] Modify dnssd resolve operation process #37003
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: hyunuk.tak <[email protected]>
PR #37003: Size comparison from f25f635 to 12bcab0 Full report (70 builds for bl602, bl702, bl702l, cc13x4_26x4, cc32xx, cyw30739, efr32, esp32, linux, nrfconnect, nxp, psoc6, qpg, stm32, telink, tizen)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hyunuktak the Tested locally with TestDnssd test on Tizen.
is too brief for manual testing. We are actively trying to get manual testing to be detailed both because it is not as obvious unless familiar with the platform and to make it somewhat longer to write to encourage writing automated tests if possible.
Please describe how you tested: what commands did you run, what environment and what was observed. Also explain why adding automated tests for this is not possible (probably because platform specific ... however I believe tizen also has some qemu setup ... is that usable?).
Samsung's SmartThings and similar apps and services utilize 'Matter' to establish a hub environment. In a situation where various Things are connected, when the Hub is restarted or rebooted, it executes the dnssd resolve process to rediscover the existing Things. During this process, we confirmed that timing issues caused conflicts between multi-threading and lock/unlock operations due to asynchronous execution of the dnssd resolve process for many Things. The sequence of this process could be as follows: |
@hyunuktak Due to this PR, there is a segfault in Tizen integration test, please look at it: https://github.com/project-chip/connectedhomeip/actions/runs/12738786647/job/35501600912?pr=37003#step:5:1996 |
chip::DeviceLayer::StackLock lock; | ||
rCtx->Finalize(CHIP_NO_ERROR); | ||
} | ||
rCtx->Finalize(CHIP_NO_ERROR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we get a description in the summary? You updated the testing section, however the description is still vague: In multi-thread environment, Tizen dnssd's resolve operation process may cause deadlock.
.
it seems the deadlock is probably because the stack lock already being aquired. Why is that? Why is the comment that was here before (saying this must be locked) not applicable? Should we have a assertChipStackLockedByTheCurrentThread
in here instead since we assume already locked? There should be a comment why we know this is already called in the chip stack context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the reason for calling stack lock was for safety in callback function. However, if the app and service (e.g., smartthings) already make a stack lock and call, the possibility of a deadlock due to a double stack lock call was found during the asynchronous process.
Problem
In multi-thread environment, Tizen dnssd's resolve operation process may cause deadlock.
Changes
Modify the resolve operation process so that it can operate on one thread.
Testing
Tested locally with TestDnssd test on Tizen:
Samsung's SmartThings and similar apps and services utilize 'Matter' to establish a hub environment. In a situation where various Things are connected, when the Hub is restarted or rebooted, it executes the dnssd resolve process to rediscover the existing Things. During this process, we confirmed that timing issues caused conflicts between multi-threading and lock/unlock operations due to asynchronous execution of the dnssd resolve process for many Things. The sequence of this process could be as follows:
Connect two or more Matter Things.
Restart the Matter Hub.
Duplicate calls to the dnssd resolve process.
Discovery of timing issues with multi-thread lock/unlock for asynchronous resolve processes.
It needs to be verified whether this can be automated.