Replies: 3 comments 1 reply
-
Solution No.4 is the best IMO.
It's unclear whether this fiber should be run on all replicas or only on the master and; what happens if the master is switched while the build is in progress; should we persist the build progress somehow and maybe even replicate the changes; what happens if the build fails (e.g. if the unique constraint is violated). I assume you'll describe the procedure in more details in the next RFC.
This would be unacceptable for Vinyl. I think that in case of Vinyl "lazy" (I'd rather name them "disabled") indexes should remain disabled after restart and continue building after |
Beta Was this translation helpful? Give feedback.
-
Definitely not solution 1. In some cases with a single large space the cloning process would take up 2x the memory. Besides, solution 1 basically makes the user write all the code we already have for index build (which's rather complicated):
Solution 2 may be better memory-wise but still has a problem: we would have to take care of master changes, which seems rather complicated: finding the right instance to continue the process (there might be multiple writable instances), taking care of "original" writes which should be duplicated to the new space, versus the ones coming from an existing master and so on. Solutions 3 and 4 look good to me.
If the index is already built and turned global on all instances of the replicaset, new replicas will simply receive it during join process, like they always do. OTOH, we don't even have to make the index global if we say that all schema is defined in centralized configuration. In this case each instance will have the same set of indexes built locally, and everything will work as expected, no? |
Beta Was this translation helpful? Give feedback.
-
Solution 1We should not go this way, IMHO.
This won't work for vinyl, insert may yield and abort original transaction. The only reliable option is to modify app's logic and insert into several spaces at once, which seems like a really bad way in terms of user experience.
Agree with Sergey here, user will have to deal with insert of the data not yet moved to another space, which is not trivial. Solution 2As you said, it indeed requires user intervention in the case of master change, which I don't really like Solution 3This solution doesn't fix the bug, we're discussing here, replication hang. It proposes the alternative way of creating indexes, which users won't use and won't probably know about it. If the locally build index doesn't exist, we fallback to the old behavior, with the same problem. Moreover, this will require introduction of the local rows in the global (and moreover, synchronous) Solution 4I like this one, since no user intervention is required. Even if user doesn't change the code, it'll work the new way, without blocking the replication process. I propose to consider this solution more precisely. |
Beta Was this translation helpful? Give feedback.
-
Reviewers
Tickets
Summary
When a space is large enough, building a new index on it can be quite long. Minutes, hours, depending on the space size. The same is about index alter - it might require index rebuild, space fullscan. That isn't a big deal locally on the instance, because the build is asynchronous - transactions can still be processed, even on the changing space.
But it gets complicated in a cluster due to the following reasons.
Replication gets stuck in a replicated cluster. Yes, the index build is async fiber-wise, but it blocks the current fiber. The blockage happens on-replace into
_index
space. Not on-commit. Because of that the applier's feature of committing the txns asynchronously doesn't help. The longest part happens before the commit.The replica's lag will grow, it won't receive any new data until the build is finished. But the replication still is alive, and at least it doesn't block the transaction processing on the master when the replication is asynchronous. Unlike the next problem.
Master transaction processing gets stuck in a synchronously replicated cluster. Because the index build transaction on the master blocks the limbo until the appliers also apply it and write to their WALs. And that will last until the quorum of replicas have finished the index build.
Essentially, in a synchro cluster with large spaces it becomes impossible to create new indexes. It requires hacks, like creating a new space with all the needed indexes and same format, then slowly copy the data from the old space, in multiple small transactions, then delete the old space. Sounds not complex really, but it requires the user to change their code to maintain this "migration" process by writing into both old and new spaces while the copying is in progress.
The document tries to suggest solutions how people could create large indexes in a replicaset without blocking the replication.
⭐️⭐️ Solution 1: do nothing
The issue in the ticket isn't really a bug. It is an inconvenience, which has a workaround explained above.
The only problem is that the user would have to support that in their code.
Lets repeat the solution here for clarity. When a user wants a new index or alter an existing one in a non-trivial way, they do this:
on_replace
trigger on the old space, which does the same work on the new space.Pros: don't need to do anything, already works.
Cons:
⭐️ Solution 2: space alter-clone
Not a bug, as said above. But the inconvenience is quite unhandy. The solution provided above Tarantool could wrap into a nice API available out of the box.
That is, Tarantool would allow to clone a space with any of its indexes and metadata altered. Once the cloning is done, the user could do the final "drop + rename" themselves.
If designed carefully, this could be an interesting tool to do more than just a new index creation, like:
Note, that the problem in the ticket also concerns index alterations which couldn't be completed instantly and require an index scan (for duplicates, values having incompatible type).
If the solution looks interesting enough, a proper API and behaviour design could be proposed. It could be something like
old_space:copy(new_space)
. An example:More interesting outcomes:
Pros:
Cons:
⭐️⭐️⭐️ Solution 3: replica-local index
The problem of index creation/alter is hitting the replication hard. One approach could be to attack the replication shortcomings then. That is, drop the replication from the process.
Lets imagine that the replicas and master could build the same indexes independently, fully local. And when finished, the master would in a single small DDL transaction "enable" this index.
The index creation would then be a 2 step process. 1 - create a local index on all replicas. 2 - turn the local index into a global one on the master.
This needs 2 features which aren't available yet, but aren't hard to add:
Replica-local DDL is not unusual for Tarantool. There is right now a space type
temporary
(not to confuse withdata-temporary
). It can be created on read-only replicas, can have its own indexes, is visible in_space
and its indexes in_index
, but it is not replicated, and its data isn't stored in WAL.Replica-local persistent data also is not a new thing. Tarantool does have "local" spaces. They have replicaset-global meta (
_space
and_index
rows) and their data is persisted, but not replicated. They can only be created by master, but can take DML on any instance and it is not replicated.The proposal is to introduce replica-local indexes. They can be created by any replica, even read-only, on absolutely any space. This index is persisted in
_index
and is not replicated.Creation of the index will not affect replication at all, and won't block the limbo, because replica-local transactions are not synchronous by definition.
To create a new global index, the user then would then go and create a replica-local index on each instance.
Then to make it global the user would on the master instance make
index:alter{is_global = true}
. Locally it works instantly. When this txn comes to the replicas, it will try to find a replica-local index in this space with all the same meta besides index ID. If found, it also works instantly, by changing the index ID to the global one (ID is primary, so it would mean moving local index's data to the new global index with the global ID, and dropping the empty local index). If not found, a new index is created as usual.The solution not only allows to create/alter indexes in the cluster bypassing the replication, but also allows the user to purposefully create replica-local indexes without ever making them global. It could be handy to reduce memory usage on the master and speed-up master's DML. Master would only store the unique indexes and handle DML, and the replicas would store the other indexes + serve DQL.
The cons is that the user has to visit each replica to create the replica-local indexes on the first step.
Pros: introduces a new feature - replica-local indexes, which can be used not only for replicaset-wide index building.
Cons: needs 2 steps, one of them to be done on each instance in the replicaset. Including new instances, where this index won't appear automatically.
⭐️⭐️⭐️⭐️ Solution 4: lazy index
Consider another angle - a long index build blocks replication because the transaction can't be committed until the index is built. Then lets just allow this.
Lazy index creation is when its entry is added to
_index
instantly, launching a background building process, which would run in a special fiber, global or one per each lazy index.Such an index would be visible and droppable but can't be used or altered until the building is complete. Any usage attempt would return an error.
When the building is complete, the index is usable like any other. If the building fails, the index would report that in its status.
On restart it would behave like a normal index, i.e. block
box.cfg{}
until the build is finished. Except that if the build has failed,box.cfg{}
still finishes ok, and the index status reports it as broken.The user can later drop the
lazy
flag from the index options to turn it into a regular index.An example:
Pros:
Cons: implementation can be tricky.
Proposal
The solution-4 (lazy index) looks the most promising. It solves the problem, require minimal action from the user, and can be even considered a feature.
If there are no other suggestions and everybody agrees, I would then describe solution-4 in more details.
Alternatives
Solution 5: index build on replicas is not blocking applier's fiber
The idea was that lets not block the replica's side on the index build. Make the
_index
transactions in separate fibers. Won't work because the replication is still stuck. Limbo on the master would be blocked anyway. It would be still waiting on the master until the applier commits the index build transaction.Beta Was this translation helpful? Give feedback.
All reactions