Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing takes over 100x longer on big file #322

Closed
reinhapa opened this issue Aug 24, 2023 · 15 comments · Fixed by #347
Closed

Indexing takes over 100x longer on big file #322

reinhapa opened this issue Aug 24, 2023 · 15 comments · Fixed by #347
Assignees
Labels
serialization-format-change Changes that affect index serialization format
Milestone

Comments

@reinhapa
Copy link

When indexing a big file that contains around 16000 generated classes, the indexing process takes around 100 more time.

Jandex 3.1.0 ~ 5 seconds
Jandex 3.1.1 ~ 490 seconds

@Ladicek
Copy link
Contributor

Ladicek commented Aug 24, 2023

OK, that is weird. The only change in 3.1.1 that could realistically cause this is #308. Could it be that you have a lot of recursive type parameters (T extends Something<T>) that have the same name? That would lead to a lot of hash table collisions, which could explain the issue.

reinhapa added a commit to reinhapa/jandex that referenced this issue Aug 24, 2023
@reinhapa
Copy link
Author

In our test case reverting the internHashCode did fix the performance loss when indexing our JAR file

@reinhapa
Copy link
Author

OK, that is weird. The only change in 3.1.1 that could realistically cause this is #308. Could it be that you have a lot of recursive type parameters (T extends Something<T>) that have the same name? That would lead to a lot of hash table collisions, which could explain the issue.

Actually #308 has caused this issue. In our case we need to revert the version delivered in WildFly 29 back to 3.1.0 due to fact, that the deployment will take 15 instead of around 1 minute.

I wonder, that no benchmark test seem to have caught this problem unfortunately...

@Ladicek Ladicek self-assigned this Aug 24, 2023
@Ladicek
Copy link
Contributor

Ladicek commented Aug 24, 2023

This seems to be a reasonable approach: https://github.com/Ladicek/jandex/commits/type-variable-reference-intern-hash-collisions Could you please build Jandex from that branch and test with it? Thank you!

@Ladicek
Copy link
Contributor

Ladicek commented Aug 24, 2023

Note that the branch actually changes the persistent index format, so it will require releasing Jandex 3.2. Also older versions of Jandex won't be able to read an index produced by Jandex built from that branch, so take care.

@reinhapa
Copy link
Author

This seems to be a reasonable approach: https://github.com/Ladicek/jandex/commits/type-variable-reference-intern-hash-collisions Could you please build Jandex from that branch and test with it? Thank you!

@Ladicek I will run a check tomorrow and give you feedback

@reinhapa
Copy link
Author

@Ladicek The proposed fix seems to have more ore less the same performance as the version 3.1.0 taking slightly longer:

Version 3.1.0
JUnit_3 1 0

Proposed fix
JUnit_3 x-SNAPSHOT

@Ladicek
Copy link
Contributor

Ladicek commented Aug 25, 2023

Yeah, it does take a little more time -- the JAR from your repo requires cca 8 seconds to index with Jandex 3.1.0 on my machine, and roughly 10 seconds with the suggested patch. I would think that's acceptable, but I can try to reduce it further.

I was more interested in actual usage. If you replace Jandex in WildFly with the version from my branch, does that keep working for you? (I don't see why it shouldn't, all the existing Jandex tests pass, but real world feedback is always best.)

@reinhapa
Copy link
Author

Hi @Ladicek I did some measurements for comparison as follows:

Jandex 3.1.0 without index file in jar (17s)
Rolled back library version on WildFly 29.

2023-08-25 10:00:57,318 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-25 10:01:14,771 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-4) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.

Jandex 3.1.2 without index file in jar (12min 17s)
Current state using WildFly 29 without any changes.

2023-08-21 10:29:27,960 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-21 10:41:44,325 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-2) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230821_093101/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.

Jandex 3.1.3-SNAPSHOT with index file in (22s)
Proposed library version on WildFly 29 accessing the big file with an index generated using Jandex 3.0.5.

2023-08-25 09:33:29,978 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-25 09:33:34,381 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-2) [Usr#] WFLYSRV0002: Loading failed for the annotation index "/C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/CH.obj.Application.generated.jar/META-INF/jandex.idx" with the following exception: java.lang.IllegalStateException: Invalid tag: 14
2023-08-25 09:33:51,020 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-2) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.

Jandex 3.1.3-SNAPSHOT without index file in jar (20s)
Proposed library version on WildFly 29.

2023-08-25 09:37:44,489 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 71) [Usr#] WFLYCLINF0002: Started http-remoting-connector cache from ejb container
2023-08-25 09:38:04,063 WARN  [org.jboss.as.server.deployment] (MSC service thread 1-3) [Usr#] WFLYSRV0059: Class Path entry mchange-commons-java-0.2.15.jar in /C:/bison/base/20230824_131254/wildfly-29.0.0.Final/standalone/deployments/BisonProcess.ear/lib/c3p0.jar  does not point to a valid jar for a Class-Path reference.

@Ladicek
Copy link
Contributor

Ladicek commented Aug 25, 2023

OK, thanks. I was more interested in hearing about any possible issues when running the application, but I realized that there's most likely nothing in WildFly bothering about type variable references, so we should be good there.

The 3rd experiment is interesting, Jandex 3.1.3-SNAPSHOT should very much be able to read an index generated by 3.0.5. There must be a bug somewhere, likely similar to #320, which I need to investigate.

In any case, as I mentioned, this will need Jandex 3.2, because of the change of persistent index format.

@Ladicek
Copy link
Contributor

Ladicek commented Aug 25, 2023

OK, I fixed the case when Jandex 3.1.3-SNAPSHOT couldn't read an index generated by 3.0.5, that was some poor handling of versioning on my side. Thanks for highlighting that.

@reinhapa
Copy link
Author

In any case, as I mentioned, this will need Jandex 3.2, because of the change of persistent index format.

@Ladicek thanks for your fast feedback. For providing an index within our big libraries can mitigate the issue and we would patch our WildFly instance if needed. I guess that as long as we keep generating the index using a Jandex version that aligns the one for WildFly we should be save. Having a fix in the future will be perfectly ok for us.

@Ladicek
Copy link
Contributor

Ladicek commented Aug 25, 2023

If indexing takes longer time, pregenerating the index (using the Maven plugin for example) is a good choice indeed.

@Ladicek Ladicek added this to the 3.2.0 milestone Aug 28, 2023
@Ladicek Ladicek added the serialization-format-change Changes that affect index serialization format label Dec 11, 2023
@reinhapa
Copy link
Author

@Ladicek does this mean, that you create a version 3.2.0 now?

@Ladicek
Copy link
Contributor

Ladicek commented May 14, 2024

There's one more PR that needs some more work (#361), so it will take a bit more time, but 3.2.0 is near indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
serialization-format-change Changes that affect index serialization format
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants