[hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218

matthias-ronge · 2024-09-04T14:58:37Z

Issue #5760 2a) and 2b)

Follow-up pull request to #6209 (immediate diff)

The three numbers before the slash in “Indexed entries” represent the number of objects that Hibernate has already loaded from the database, the number of objects that have been prepared as indexable documents (JSONs), and finally the number of indexed documents.

Basic experience: Hibernate Search and lazy loading don't mix. It looks like we have to accept that. As a result, I have deactivated lazy loading wherever the number of members of a set is typically small (< 25). This affects most sets, e.g. projects of a template, tasks, users or properties of a template or a process, etc. If the set can typically be large (> 1000), the elements of the set are not indexed. Example: Processes of a batch. Consideration: If the number of subelements to be indexed in an object is very large, the findability of the object approaches infinity (it becomes increasingly likely that it will be found with any search query). Such indexing also makes the index enormously large. Therefore, it can be considered justifiable not to index these fields.

Kitodo/src/main/resources/log4j2.xml

henning-gerhardt · 2024-09-09T11:48:47Z

@matthias-ronge : a hopefully short general question: is it possible to use different indices with Hibernate-Search? Currently this is possible through different values with the elasticsearch.index configuration. Is this or something similar still possible? I'm asking because I'm working with different Kitodo.Production versions which has separated meta data directories on my local file system, different databases in a MariaDB database and different search prefixes in a ElasticSearch instance. This must not working in the current state of the changes nor is this a current goal but maybe something for later?

matthias-ronge · 2024-09-09T13:02:06Z

is it possible to use different indices with Hibernate-Search?

The index names for the individual objects are contained in the annotations as a string. I cannot estimate whether it is even possible to use variables here, or whether these have to be hard-coded strings at compile time; but I suspect the latter. Index access is controlled via properties such as port. You could install several index services on different ports and set the port at runtime before the program starts, or change the index data directory (as a symbolic link).

Such a feature is currently not in the scope of our development.

henning-gerhardt · 2024-09-09T13:22:28Z

Thank you @matthias-ronge for the explanation. I know and I did not expect that this usage scenario is part of the current development to use different hibernate search indices.

Edit: Maybe indexlayout-strategy-custom is a way to archive this. But this is nothing for now.

henning-gerhardt · 2024-09-13T10:40:15Z

Kitodo-DataManagement/hibernate.properties

+hibernate.search.enabled=true
+hibernate.search.backend.hosts=localhost:9200
+hibernate.search.backend.protocol=http


Question must this content not be added to the existing hibernate.cfg.xml file or did we need two configuration files?

As I understand it, this is the configuration file for the Hibernate Search framework. I would be surprised if we could mix the two configurations.

Does this mean, that everyone must run ElasticSearch / Opensearch on localhost and port 9200? If so I'm unable to do this in my development system nor on a productive environment.

henning-gerhardt · 2024-09-13T10:44:14Z

Kitodo-DataManagement/src/main/java/org/kitodo/data/database/beans/Property.java

@@ -31,9 +32,11 @@
 @Table(name = "property")


Is an @Indexed(index = "kitodo-property") annotation not missing like in the other bean files?

My understanding is that the annotation only applies to objects to be indexed. Standalone properties are not indexed as separate objects.

I understand.

henning-gerhardt · 2024-09-13T10:44:48Z

Kitodo-DataManagement/src/main/java/org/kitodo/data/database/beans/Folder.java

Is an @Indexed(index = "kitodo-folder") annotation not missing like in the other bean files?

henning-gerhardt · 2024-09-13T10:45:15Z

Kitodo-DataManagement/src/main/java/org/kitodo/data/database/beans/User.java

@@ -26,19 +26,23 @@
 import javax.persistence.OneToMany;
 import javax.persistence.Table;

+import org.hibernate.search.mapper.pojo.mapping.definition.annotation.GenericField;
 import org.kitodo.data.database.persistence.UserDAO;

 @Entity
 @Table(name = "user")


Is an @Indexed(index = "kitodo-user") annotation not missing like in the other bean files?

Users are not indexed because otherwise you cannot log in before the index is created. Then you cannot create the index because you cannot log in. This was not planned at the very beginning but is now the case.

I see. One of the chicken and egg problems at least and even an issue of privacy restrictions.

henning-gerhardt · 2024-09-13T10:46:44Z

Kitodo-DataManagement/src/test/resources/hibernate.properties

+hibernate.search.enabled=true
+hibernate.search.backend.hosts=localhost:9205
+hibernate.search.backend.protocol=http


See my comment above at the first hibernate.properties file.

henning-gerhardt · 2024-09-13T10:54:32Z

Kitodo/src/main/java/org/kitodo/production/services/index/ServerConnectionChecker.java

+ */
+class ServerConnectionChecker implements Runnable {
+    private static final Logger logger = LogManager.getLogger(ServerConnectionChecker.class);
+    private static final Pattern PATTERN_SERVER = Pattern.compile("cluster_name\\W+([^\"]*).*?number\\W+([^\"]*)",


Is this a pattern which is working with OpenSearch and ElasticSearch servers?

I don't know yet, right now I'm focused on getting the existing code to work. If not, it could be extended. I assume both are the same, but I haven't tried it yet. First, it's focused on finishing the minimal development.

Ok, fine for me and hopefully it is working without a change.

henning-gerhardt · 2024-09-13T10:56:18Z

Kitodo/src/main/resources/hibernate.cfg.xml

@@ -37,6 +37,9 @@
        <property name="hibernate.connection.verifyServerCertificate">false</property>
        <property name="hibernate.connection.useSSL">false</property>

+        <!-- Hibernate search -->


Are here the other Hibernate Search parameters are missing like used URI, port, ... which are added in the hibernate.properties file?

I can confirm that I also notice the similarity. Needs testing.

It would be nice if the Hibernate-Search properties are stored in one place / file. If this is not possible it would be bad at least for me.

henning-gerhardt · 2024-09-13T10:57:23Z

Kitodo/src/test/java/org/kitodo/production/services/command/CommandServiceTest.java

@@ -158,7 +158,7 @@ public void runNotExistingScriptAsync() throws InterruptedException {
        String commandString = scriptPath + "not_existing_script" + scriptExtension;
        CommandService service = new CommandService();
        service.runCommandAsync(commandString);
-        Thread.sleep(1000); // wait for async thread to finish;
+        Thread.sleep(2000); // wait for async thread to finish;


Why this and the lines below are changed from 1 to 2 seconds? Is this really needed?

Yes, my development machine is sometimes pretty slow and then the build aborts because of an error here. Maybe this should be handled completely differently than just waiting an arbitrary amount of time, but that would be something for a separate branch.

I understand. But this will increase the build and test time for everyone. But fine for now.

henning-gerhardt · 2024-09-13T10:57:48Z

Kitodo/src/test/resources/hibernate.properties

+hibernate.search.enabled=true
+hibernate.search.backend.hosts=localhost:9205
+hibernate.search.backend.protocol=http


See my comment on the first hibernate.properties file.

henning-gerhardt · 2024-09-13T10:58:12Z

Kitodo/src/test/resources/selenium/resources/hibernate.properties

+hibernate.search.enabled=true
+hibernate.search.backend.hosts=localhost:9205
+hibernate.search.backend.protocol=http


See my comment on the first hibernate.properties file.

thomaslow · 2024-10-21T10:22:57Z

@matthias-ronge I checked out your branch, and took notes of my testing experience.

I built a new war file, and deployed it to Tomcat. At first, there was an error message in the log stating that hibernate-search could not connect to my elastic search instance (which is fine, because I do not run it on localhost).

Unable to detect the Elasticsearch version running on the cluster: HSEARCH400007: Elasticsearch request failed: Connection refused

Then, I copied the hibernate.properties into my config-local directory, and changed the host name. Now the application starts without any error messages.
I tried to log in to kitodo-production with my admin account. Nothing happens. The page keeps loading forever. No errors, but lots of CPU activity for mariadb and tomcat. Maybe indexes are being created in the background? But there is no user interface or message.
After ~15minutes, the kitodo dashboard is shown, but my CPU is still active. The System - Indexing page does not show any progress. Only 0% everywhere. After ~30 minutes without any page loading, the CPU load is normal again. Maybe disabling lazy-loading triggers thousands of database queries when loading the dashboard? My test database contains ~80.000 processes.

Unfortunately, at this state, it is not possible to do further testing.

@matthias-ronge In case you have not done this yet, please test your branch with a large amount of test data. Otherwise, let me know, and I will try to figure out why pages are loading so slowly on my machine.

thomaslow · 2024-10-21T10:50:54Z

I tried to start the indexing. Some entities were indexed within a few seconds. The remaining entities (processes, projects, tasks, templates) stay at 0% for at least the last 5 minutes.

After ~10 minutes all entities except processes and tasks were indexed at 100%. Processes and tasks have only 60 indexed entities (of 80.000 and 4.000 respectively).

matthias-ronge · 2024-10-28T12:40:32Z

Thank you for this testing and your insights. However, this is not as I expected. I have not tested with such large data yet, I will have to inspect it myself first. General assumption is that framework works reasonably well, it could be due to some small thing. If I can confirm it works for large data, I will let you know.

The code is not manually creating an index at startup, but I also saw it delay first, but only a few seconds. It is clear that I have to check this.

BartChris · 2024-11-04T09:05:53Z

4. Maybe disabling lazy-loading triggers thousands of database queries when loading the dashboard? My test database contains ~80.000 processes.

I logged the SQL statements having checked out the branch and just scrolling through the list of processes (10 per page) floods my database with queries. I have around 1000 processes in my database. It takes very long to jump to the next 10 entries.

Hundreds of requests are made for one page:

2024-11-04T09:01:04.540144Z	   23 Query	rollback
2024-11-04T09:01:04.540190Z	   23 Query	SET autocommit=1
2024-11-04T09:01:04.540239Z	   22 Query	SET autocommit=0
2024-11-04T09:01:04.540308Z	   22 Query	select batches0_.process_id as process_2_2_0_, batches0_.batch_id as batch_id1_2_0_, batch1_.id as id1_1_1_, batch1_.title as title2_1_1_, batch1_.type as type3_1_1_ from batch_x_process batches0_ inner join batch batch1_ on batches0_.batch_id=batch1_.id where batches0_.process_id=2310
2024-11-04T09:01:04.540415Z	   22 Query	rollback
2024-11-04T09:01:04.540450Z	   22 Query	SET autocommit=1
2024-11-04T09:01:04.540493Z	   23 Query	SET autocommit=0
2024-11-04T09:01:04.540573Z	   23 Query	select workpieces0_.process_id as process_1_36_0_, workpieces0_.property_id as property2_36_0_, property1_.id as id1_22_1_, property1_.choice as choice2_22_1_, property1_.creationDate as creation3_22_1_, property1_.dataType as datatype4_22_1_, property1_.obligatory as obligato5_22_1_, property1_.title as title6_22_1_, property1_.value as value7_22_1_ from workpiece_x_property workpieces0_ inner join property property1_ on workpieces0_.property_id=property1_.id where workpieces0_.process_id=2309
2024-11-04T09:01:04.540918Z	   23 Query	rollback
2024-11-04T09:01:04.540958Z	   23 Query	SET autocommit=1
2024-11-04T09:01:04.541003Z	   22 Query	SET autocommit=0
2024-11-04T09:01:04.541085Z	   22 Query	select templates0_.process_id as process_1_30_0_, templates0_.property_id as property2_30_0_, property1_.id as id1_22_1_, property1_.choice as choice2_22_1_, property1_.creationDate as creation3_22_1_, property1_.dataType as datatype4_22_1_, property1_.obligatory as obligato5_22_1_, property1_.title as title6_22_1_, property1_.value as value7_22_1_ from template_x_property templates0_ inner join property property1_ on templates0_.property_id=property1_.id where templates0_.process_id=2309
2024-11-04T09:01:04.541188Z	   22 Query	rollback
2024-11-04T09:01:04.541221Z	   22 Query	SET autocommit=1
2024-11-04T09:01:04.541262Z	   23 Query	SET autocommit=0

from time to time (while issuing many smaller queries as well) really complex queries are fired.

matthias-ronge changed the base branch from master to hibernate-search September 4, 2024 14:59

matthias-ronge force-pushed the 5760_2a+b branch from babe208 to a2724c0 Compare September 6, 2024 08:20

henning-gerhardt reviewed Sep 6, 2024

View reviewed changes

Kitodo/src/main/resources/log4j2.xml Outdated Show resolved Hide resolved

matthias-ronge force-pushed the 5760_2a+b branch 2 times, most recently from ecbe5bb to 103a456 Compare September 6, 2024 12:33

matthias-ronge force-pushed the 5760_2a+b branch 2 times, most recently from 3d33e50 to 6f8657d Compare September 10, 2024 07:57

matthias-ronge mentioned this pull request Sep 11, 2024

[hibernate-search] Part 1: Remove custom ElasticSearch integration #6209

Merged

matthias-ronge force-pushed the 5760_2a+b branch 2 times, most recently from ec00505 to 945cc24 Compare September 12, 2024 07:29

matthias-ronge marked this pull request as ready for review September 12, 2024 07:57

solth and others added 16 commits September 13, 2024 11:37

Add HibernateSearch dependencies to DataManagement module

f94697a

Declare Hibernate Search version in root POM, and bump to 6.2.4.Final

43dbc44

Add HibernateSearch annotations to base indexed classes

10ebf2f

Add index names, add 'Indexed' annotation

adcd4a1

Add annotations for complex fields

d1a5c97

Add annotations for Docket, Filter, Ruleset and Workflow

e8c841a

Show the indexing page if the search server is available

6c28a24

Remove Create Mapping and Delete Index buttons (henceforth implied)

40e5249

Remove button to index remaining - not supported by Hibernate Search

e97eba0

Index all objects of given 'objectType' with massIndexer

26280cf

Re-implement indexing page

faeed43

Fix checkstyle

40ef313

Improve wording, add Javadoc

11bcf85

Don't show a total of 0 objects when starting indexing

caf4bae

Returns result processing to the calling class

5b2e0ef

Remove test for mapping - created transparently

d6ae013

matthias-ronge added 14 commits September 13, 2024 11:37

Set number of database objects

24a2005

Add template count

5fa7f85

Bring OpenSearch background instance for tests

732997c

Add MockDatabase index to Kitodo - DataManagement

ea5f0e3

Fix test

6c0eecc

Increase timeout (slow laptop)

5fa2d88

Fix problems

60bcacf

Fix search for ID

d97ac6a

Log all queries

9c20c89

Add missing file for tests

0c8374c

Fix tests

afd9ae8

Add Hibernate Search config file to Selenium resources

3369fa3

Remove unused imports

963e1c7

Add tasks to processes to enable sorting by sortHelperStatus

c1bbea7

matthias-ronge force-pushed the 5760_2a+b branch from 945cc24 to c1bbea7 Compare September 13, 2024 09:38

henning-gerhardt reviewed Sep 13, 2024

View reviewed changes

solth requested review from thomaslow and oliver-stoehr October 8, 2024 12:57

matthias-ronge mentioned this pull request Oct 29, 2024

[hibernate-search] Implement indexing and search #6283

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218

[hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218

matthias-ronge commented Sep 4, 2024 •

edited

Loading

henning-gerhardt commented Sep 9, 2024

matthias-ronge commented Sep 9, 2024

henning-gerhardt commented Sep 9, 2024 •

edited

Loading

henning-gerhardt Sep 13, 2024

matthias-ronge Oct 7, 2024

henning-gerhardt Oct 7, 2024

henning-gerhardt Sep 13, 2024

matthias-ronge Oct 7, 2024

henning-gerhardt Oct 7, 2024

henning-gerhardt Sep 13, 2024

henning-gerhardt Sep 13, 2024

matthias-ronge Oct 7, 2024

henning-gerhardt Oct 7, 2024

henning-gerhardt Sep 13, 2024

henning-gerhardt Sep 13, 2024

matthias-ronge Oct 7, 2024

henning-gerhardt Oct 7, 2024

henning-gerhardt Sep 13, 2024

matthias-ronge Oct 7, 2024

henning-gerhardt Oct 7, 2024

henning-gerhardt Sep 13, 2024

matthias-ronge Oct 7, 2024

henning-gerhardt Oct 7, 2024

henning-gerhardt Sep 13, 2024

henning-gerhardt Sep 13, 2024

thomaslow commented Oct 21, 2024

thomaslow commented Oct 21, 2024 •

edited

Loading

matthias-ronge commented Oct 28, 2024

BartChris commented Nov 4, 2024 •

edited

Loading

[hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218

Are you sure you want to change the base?

[hibernate-search] Introduce Hibernate Search framework and implement indexing page #6218

Conversation

matthias-ronge commented Sep 4, 2024 • edited Loading

henning-gerhardt commented Sep 9, 2024

matthias-ronge commented Sep 9, 2024

henning-gerhardt commented Sep 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thomaslow commented Oct 21, 2024

thomaslow commented Oct 21, 2024 • edited Loading

matthias-ronge commented Oct 28, 2024

BartChris commented Nov 4, 2024 • edited Loading

matthias-ronge commented Sep 4, 2024 •

edited

Loading

henning-gerhardt commented Sep 9, 2024 •

edited

Loading

thomaslow commented Oct 21, 2024 •

edited

Loading

BartChris commented Nov 4, 2024 •

edited

Loading