Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread safety bug in ValidationSupportChain #6675

Open
Boereck opened this issue Feb 3, 2025 · 0 comments
Open

Thread safety bug in ValidationSupportChain #6675

Boereck opened this issue Feb 3, 2025 · 0 comments

Comments

@Boereck
Copy link
Contributor

Boereck commented Feb 3, 2025

Describe the bug
Recently, I had a look at the ValidationSupportChain class and found that the logic for caching of StructureDefinitions is distributed around the class and feels a bit brittle. I thought a bit about the code and found two execution sequences that can be problematic:

fetchAllStructureDefinitions() returns a copy of myStructureDefinitionsAsList, but does not use a synchronized block. If this is called concurrently to invalidateCaches() all bets are off (according to the API guarantees of ArrayList) how myStructureDefinitionsAsList looks like when being read. Since instructions may be re-orderd, until the end of the synchronized block in invalidateCaches(), the resulting list could e.g. contain an array partially or completely filled with null values (at least in OpenJDK implementation). This seems very unlikely to happen (the critical re-ordering has to occur and may depend on the processor architecture), but is still a possibility.

There is another interaction between fetchAllStructureDefinitions() and invalidateCaches() that is problematic: If invalidateCaches() is called, the flag myHaveFetchedAllStructureDefinitions is set to false. If, after this instruction, fetchAllStructureDefinitions() is called concurrently, all StructureDefinitions are cached and myHaveFetchedAllStructureDefinitions is set to true. When invalidateCaches() continues, myStructureDefinitionsByUrl and myStructureDefinitionsAsList are cleared. This leaves myHaveFetchedAllStructureDefinitions set to true, but the collections are empty. When fetchAllStructureDefinitions() is called again, an empty list is returned.

To Reproduce
I created a simple test case, that tries to trigger the condition (by running in the loop until the race condition is met) and the failure occurs on every run on my machine.

Expected behavior
A thread safe ValidationSupportChain.

Environment (please complete the following information):

  • HAPI FHIR master

Additional context
I find it much easier to wrap my head around thread safety of multiple values (that have to be kept in a consistent state) by using immutable state that is completely swapped out leveraging compare-and-swap instructions (using AtomicReference). This makes it impossible to read inconsistent state.

In an branch of mine, I refactored the code of ValidationSupportChain to use these principles and extracted the logic for management of the cached StructureDefinitions into a separate class.
You can have a look at the commit here, to see how this may look like:
master...Boereck:hapi-fhir:feature/cas_structdef_cache

In theory the lockfree copy-on-write approach can be more expensive on writing and more performant when reading, compared to the current appoach using synchronized blocks. However, in a few local tests I did with JMH, performing some operations onValidationSupportChain
showed no measurable difference (differences are in the margin of error).

Another advantage of factoring the caching logic into a seperate class is that the caching logic can be tested in isolation, as done on the branch.

The proposed API has method names aligned to naming from java collections, so it feels familiar. The API can, of course, be changed if alignment with the collections API is not valued as much. E.g. methods names starting with compute could have a fetch prefix instead. The computeIfAbsent fetching function takes the URL as parameter, which is alligned to the compute methods from java.util.Map and is implemented this way to potentially avoid lambdas capturing the key variable from the context (since creation of capturing lambdas is more expensive). However, the calls in ValidationSupportChain do capture the this reference anyway, so we could turn the Function into a Supplier and just capture the URL parameter.

Let me know if you consider this a good way to pursue, if I should open a PR, or if the APIs need some change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant