Streaming setting #3759
Replies: 3 comments 1 reply
-
As you say, deleting the oldest entries and then compact the segments is costly. Maybe you can use partition, for example, to set the partition's tag by date and then query the latest partitions, but of course, you can also delete old partitions. |
Beta Was this translation helpful? Give feedback.
-
Ok thank you very much for your feedback! It's quite helpful. I have one question left regarding streaming insertion. The docs state:
Thus segments are accumulated until index_file_size is reached. If index_file_size is reached a merge operation is triggered and an index is created. But the docs also state that merge operations are triggered after each flush. So when my index_file_size is 4096 and I insert and flush 4 batches of 1024 consequtively, how many merge operations are triggered? one or four? |
Beta Was this translation helpful? Give feedback.
-
If someone has similar issues, this article was pretty helpful. |
Beta Was this translation helpful? Give feedback.
-
This library offers the major building blocks to build streaming systems where single entries or mini batches are added to the index iteratively (in a reasonable amount of time). But in a streaming system the index can grow infinetly and one needs to implement some pruning mechanisms. One can delete the oldest entries and then compact the segments at a given interval but these operations seem costly. I'm wondering what the runtime performance of these operations is (might depend on the index)? If there is a certain index type that is well suited for a streaming setting? Or if it is possible to implement a capped (fifo) index structure that is automatically pruned on insertion?
Beta Was this translation helpful? Give feedback.
All reactions