-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Exi encoding is not thread safe and mix data between threads #15
Comments
Hi, I was aware that the EXIFactory (at least the Grammars) need to be unique across several threads. The reason is that for finding the right EXI event code for a value "12.34" it needs to be checked first whether the value is compliant with the according datatype (e.g., float). This is a 2 step process.
Anyhow, it looks like that a "local" EXI Factory (with schema-informed EXI Grammars) for each thread causes the same issue. Let me look into that. Thanks for reporting this finding! |
One more feedback for the time being. It look like the datatype encoding causes these issues. Setting
prevents any issue on my side. Can you confirm this situation on your side also? |
yes it makes the tests pass. |
Thanks for getting back to us. Unfortunately I don't see right away what in the datatype handling causes this issue. Will try to get back to it later this week.... |
ok, thanks for the solution |
Any help/advice with possible threading issues in the code are is useful! |
I looked at your updates in ParallelTest (and also fixed on minor issue in exificient-grammars that might have causes threading issues). I think the SingleFactory tests will fail always as expected. The ThreadLocalFactory and PerTaskFactory might encounter another issue. I think the XML schema parser library Xerces2-J which is used to load & create EXI grammars is not thread-safe either (nor SAXParserFactory, DocumentBuilderFactory, ....). Hence, my assumption is that one needs to make sure that first all factories are created in a single-threaded manner. Afterwards all those factories (according to the numbers of threads) should be able to work in parallel. Does this make sense? |
I understand that a Document object for example is not thread safe, I made at one point the test use a single Document concurrently but it used to throw exceptions and I changed the code to only start from the file path. But two different Document used concurrently, each one created from a file path, shouldn't give any problem. The method getExiFactory is static and don't use any static member, it only use the xsd file path so I don't understand how the non thread safety of Xerces objects could cause any problem. I don't think there is a static cache of something in Xerces, or it would mean that nobody could use Xerces using a webserver. |
I added some more tests to ParallelTest w.r.t. loading XSDs or EXI grammars in parallel. The results seem to be fine. Anyhow, somewhere in the chain instances of
works. |
I think I found the problematic datatype ... DEFAULT_DATATYPE in BuiltIn.java, if you just replace it with a static getter that return a new Datatype each time the error for testParallelEncodeWithThreadLocalFactory and testParallelEncodeWithPerTaskFactory vanish, but not for testParallelEncodeWithSingleFactory. |
It makes a lot of test failing because there is a lot of part of the code using reference equality instead of the overrided equals method. |
Thank you for finding this issue! I updated the code as suggested and I do see now only failures for the 2 SingleFactory test-cases which I expected. The reason is that with a single factory the grammars and also its datatypes are shared across multiple threads. The Datatype class has 3 methods that should not be handled as part of the class but rather external to it.
I think this should resolve all issues. |
Yes I agree, also Datatype classes, that belong to grammars, shouldn't keep values in them as fields. |
The writeValue method should take a value and not use the last stored value. |
The reason is to avoid parsing a string-based value a second time on encoder side. Having said this, I think this part of the code needs to be moved to |
Maybe there could be a method writeValueIfValid that returns a boolean if it is valid and do the write, or else return false and don't do it, but maybe there are some intermediate work between the two sometimes ? Anyway moving this code to TypeEncoder that are owned by the Exi Encoder is the right idea. |
I moved I believe this should resolve the issue and EXIGrammars can be shared across parallel threads. The test-cases don't show further issues either. May I ask you to check and provide feedback. Thanks! |
Yes it seems to fix all the issues I had. Thanks a lot. |
Hi,
it seems the encoding in Exi is not thread safe, here is a simple test that shows it :
I added a pull request for the is at #14.
This test use a xsd file : schema-for-xslt20.xsd to construct the grammar, and a list of xsl files in a directory. This is executed with 8 concurrent threads. The generated Exi is different than it should for 8 thread, but not for a single thread (newFixedThreadPool parameter), it is growing with more threads. It also doesn't happen without a grammar. It also doesn't happen with the same file being tested 100 times in parallel. I've also tested with a thread-local ExiFactory it also gives errors.
The text was updated successfully, but these errors were encountered: