-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218
base: hibernate-search
Are you sure you want to change the base?
Conversation
babe208
to
a2724c0
Compare
ecbe5bb
to
103a456
Compare
@matthias-ronge : a hopefully short general question: is it possible to use different indices with Hibernate-Search? Currently this is possible through different values with the |
The index names for the individual objects are contained in the annotations as a string. I cannot estimate whether it is even possible to use variables here, or whether these have to be hard-coded strings at compile time; but I suspect the latter. Index access is controlled via properties such as port. You could install several index services on different ports and set the port at runtime before the program starts, or change the index data directory (as a symbolic link). Such a feature is currently not in the scope of our development. |
Thank you @matthias-ronge for the explanation. I know and I did not expect that this usage scenario is part of the current development to use different hibernate search indices. Edit: Maybe indexlayout-strategy-custom is a way to archive this. But this is nothing for now. |
3d33e50
to
6f8657d
Compare
ec00505
to
945cc24
Compare
945cc24
to
c1bbea7
Compare
hibernate.search.enabled=true | ||
hibernate.search.backend.hosts=localhost:9200 | ||
hibernate.search.backend.protocol=http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question must this content not be added to the existing hibernate.cfg.xml
file or did we need two configuration files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, this is the configuration file for the Hibernate Search framework. I would be surprised if we could mix the two configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean, that everyone must run ElasticSearch / Opensearch on localhost and port 9200? If so I'm unable to do this in my development system nor on a productive environment.
@@ -31,9 +32,11 @@ | |||
@Table(name = "property") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an @Indexed(index = "kitodo-property")
annotation not missing like in the other bean files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that the annotation only applies to objects to be indexed. Standalone properties are not indexed as separate objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an @Indexed(index = "kitodo-folder")
annotation not missing like in the other bean files?
@@ -26,19 +26,23 @@ | |||
import javax.persistence.OneToMany; | |||
import javax.persistence.Table; | |||
|
|||
import org.hibernate.search.mapper.pojo.mapping.definition.annotation.GenericField; | |||
import org.kitodo.data.database.persistence.UserDAO; | |||
|
|||
@Entity | |||
@Table(name = "user") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is an @Indexed(index = "kitodo-user")
annotation not missing like in the other bean files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Users are not indexed because otherwise you cannot log in before the index is created. Then you cannot create the index because you cannot log in. This was not planned at the very beginning but is now the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. One of the chicken and egg problems at least and even an issue of privacy restrictions.
hibernate.search.enabled=true | ||
hibernate.search.backend.hosts=localhost:9205 | ||
hibernate.search.backend.protocol=http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above at the first hibernate.properties
file.
*/ | ||
class ServerConnectionChecker implements Runnable { | ||
private static final Logger logger = LogManager.getLogger(ServerConnectionChecker.class); | ||
private static final Pattern PATTERN_SERVER = Pattern.compile("cluster_name\\W+([^\"]*).*?number\\W+([^\"]*)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a pattern which is working with OpenSearch and ElasticSearch servers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know yet, right now I'm focused on getting the existing code to work. If not, it could be extended. I assume both are the same, but I haven't tried it yet. First, it's focused on finishing the minimal development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, fine for me and hopefully it is working without a change.
@@ -37,6 +37,9 @@ | |||
<property name="hibernate.connection.verifyServerCertificate">false</property> | |||
<property name="hibernate.connection.useSSL">false</property> | |||
|
|||
<!-- Hibernate search --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are here the other Hibernate Search parameters are missing like used URI, port, ... which are added in the hibernate.properties
file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that I also notice the similarity. Needs testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice if the Hibernate-Search properties are stored in one place / file. If this is not possible it would be bad at least for me.
@@ -158,7 +158,7 @@ public void runNotExistingScriptAsync() throws InterruptedException { | |||
String commandString = scriptPath + "not_existing_script" + scriptExtension; | |||
CommandService service = new CommandService(); | |||
service.runCommandAsync(commandString); | |||
Thread.sleep(1000); // wait for async thread to finish; | |||
Thread.sleep(2000); // wait for async thread to finish; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this and the lines below are changed from 1 to 2 seconds? Is this really needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, my development machine is sometimes pretty slow and then the build aborts because of an error here. Maybe this should be handled completely differently than just waiting an arbitrary amount of time, but that would be something for a separate branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. But this will increase the build and test time for everyone. But fine for now.
hibernate.search.enabled=true | ||
hibernate.search.backend.hosts=localhost:9205 | ||
hibernate.search.backend.protocol=http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on the first hibernate.properties
file.
hibernate.search.enabled=true | ||
hibernate.search.backend.hosts=localhost:9205 | ||
hibernate.search.backend.protocol=http |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment on the first hibernate.properties
file.
@matthias-ronge I checked out your branch, and took notes of my testing experience.
Unfortunately, at this state, it is not possible to do further testing. @matthias-ronge In case you have not done this yet, please test your branch with a large amount of test data. Otherwise, let me know, and I will try to figure out why pages are loading so slowly on my machine. |
I tried to start the indexing. Some entities were indexed within a few seconds. The remaining entities (processes, projects, tasks, templates) stay at 0% for at least the last 5 minutes. After ~10 minutes all entities except processes and tasks were indexed at 100%. Processes and tasks have only 60 indexed entities (of 80.000 and 4.000 respectively). |
Thank you for this testing and your insights. However, this is not as I expected. I have not tested with such large data yet, I will have to inspect it myself first. General assumption is that framework works reasonably well, it could be due to some small thing. If I can confirm it works for large data, I will let you know. The code is not manually creating an index at startup, but I also saw it delay first, but only a few seconds. It is clear that I have to check this. |
I logged the SQL statements having checked out the branch and just scrolling through the list of processes (10 per page) floods my database with queries. I have around 1000 processes in my database. It takes very long to jump to the next 10 entries. Hundreds of requests are made for one page:
from time to time (while issuing many smaller queries as well) really complex queries are fired. |
Issue #5760 2a) and 2b)
Follow-up pull request to #6209 (immediate diff)
The three numbers before the slash in “Indexed entries” represent the number of objects that Hibernate has already loaded from the database, the number of objects that have been prepared as indexable documents (JSONs), and finally the number of indexed documents.
Basic experience: Hibernate Search and lazy loading don't mix. It looks like we have to accept that. As a result, I have deactivated lazy loading wherever the number of members of a set is typically small (< 25). This affects most sets, e.g. projects of a template, tasks, users or properties of a template or a process, etc. If the set can typically be large (> 1000), the elements of the set are not indexed. Example: Processes of a batch. Consideration: If the number of subelements to be indexed in an object is very large, the findability of the object approaches infinity (it becomes increasingly likely that it will be found with any search query). Such indexing also makes the index enormously large. Therefore, it can be considered justifiable not to index these fields.