-
-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement search index that works with multiple API server instances #655
Comments
Few complications:
|
Idea from @VithikaS: If tables like |
Leaning onto Full Text Search capabilities of the database we're already using might be preferable for multiple reasons:
What needs to be checked is how well FTS is supported in other RDBMSes besides PostgreSQL, as we will eventually have to tackle #642 and can't rely on Postgres-exclusive features. |
For a short-term solution, we decided to drop Lucene entirely: #661 |
One of the challenges with making the API server horizontally scalable (#375), is the question of what to do with the local Lucene indexes.
Lucene indexes use write locks, such that only concurrent write operations by multiple processes are not possible. As a consequence, it is not possible to share an indexes across multiple application instances.
Additionally, index modifications are "requested" through DT's internal event system. For example, this is roughly what happens when a new component is created via REST API call:
The procedure is similar for when components are updated or deleted. The usage of internal events means that an API server instance can only ever update indexes with changes it itself has made.
If we were to refactor index access such that only one instance could perform writes, and all others only reads, components created or modified by readers would never reflect in the index.
There are a few options I see for dealing with this:
Option (2) would look roughly like this:
Because the order of write operations on the index matter (
CREATE
should be processed beforeCOMMIT
), the Kafka consumer must be single-threaded. This also means that the only reason to have more than one partition for the Kafka topic would be availability, but not parallelism.The text was updated successfully, but these errors were encountered: