Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc Missing Functionality for Roaring64 bitmaps #549

Open
6 of 9 tasks
Dr-Emann opened this issue Jan 13, 2024 · 9 comments
Open
6 of 9 tasks

Misc Missing Functionality for Roaring64 bitmaps #549

Dr-Emann opened this issue Jan 13, 2024 · 9 comments

Comments

@Dr-Emann
Copy link
Member

Dr-Emann commented Jan 13, 2024

This is just a list of functionality which I noticed was missing in comparison to roaring_bitmap_t as I was adding bindings to croaring-rs. I may add more if I find anything else, rather than making more issues.

  • roaring64_bitmap_clear, to remove all data, and start from an empty bitmap
    (can this be done more efficiently than roaring64_bitmap_remove_range_closed(0, UINT64_MAX)). It seems like roaring64_bitmap_remove_range_closed could be optimized, it appears to loop through all high 48 bit combos, rather than skipping around using the ART. roaring64_bitmap_remove_range_closed(0, UINT64_MAX) is now efficient.
  • The ability to do external iteration (an iterator type that the caller controls arbitrarily)
  • roaring64_bitmap_remove_run_compression. This is probably only really useful after Provide serialization/deserialization functionality to the ART-based 64-bit roaring #542, to avoid run containers in the serialization when the reader won't understand them
  • roaring64_bitmap_internal_validate, mostly useful for tests, to ensure our assumptions are never broken (max art depth, things which should be sorted are, etc).
  • roaring64_bitmap_flip
  • roaring64_bitmap_range_cardinality is exclusive only, it's unfortunate that roaring64_bitmap_cardinality(r) can't be emulated by roaring64_bitmap_range_cardinality(r, 0, UINT64_MAX)
  • roaring64_bitmap_steal_roaring32(roaring_bitmap_t *) which steals the containers from the passed 32 bit bitmap. (naming?) Add roaring64_bitmap_move_from_roaring32 function #649
  • roaring64_bitmap_add_offset
  • roaring64_bitmap_overwrite
@Ezibenroc
Copy link
Member

Ezibenroc commented Jan 14, 2024

If I may add a few to the list:

  • A function to convert a roaring64_bitmap_t to an array of uint64_t (similar to roaring_bitmap_to_uint32_array)
  • A function to convert a roaring_bitmap_t to a roaring64_bitmap_t

Dr-Emann added a commit that referenced this issue May 23, 2024
The implementation was actually already written using an inclusive range
already.

See #549
Dr-Emann added a commit that referenced this issue May 23, 2024
The implementation was actually already written using an inclusive range
already.

See #549
Dr-Emann added a commit that referenced this issue May 23, 2024
The implementation was actually already written using an inclusive range
already.

See #549
Dr-Emann added a commit that referenced this issue May 23, 2024
The implementation was actually already written using an inclusive range
already.

See #549
Dr-Emann added a commit that referenced this issue May 23, 2024
The implementation was actually already written using an inclusive range
already.

See #549
@mux
Copy link

mux commented Jul 23, 2024

I noticed that the 64-bit implementation is also missing _init_with_capacity() and _init_cleared() APIs for use cases where the caller prefers to handle storing the bitmap structure. Maybe these can be added to the list as well?

@Dr-Emann
Copy link
Member Author

Dr-Emann commented Aug 4, 2024

I believe the lack of _init_* methods is intentional, we don't expose the definition of the roaring64_bitmap_t.

with_capacity also doesn't fully make sense either: the 32 bit version of the bitmap stores its containers in a sorted dynamic array: _init_with_capacity pre-sizes the array of container pointers. But the 64 bit version instead uses a tree structure to store the containers, to avoid having to shift things around so much when e.g. inserting a container before all others. So there's not really such a thing to pre-size.

@SLieve
Copy link
Contributor

SLieve commented Aug 7, 2024

@Dr-Emann is correct, we did not expose these init methods as they don't make sense for the 64-bit implementation and also because we don't expose the definition of roaring64_bitmap_t.

@mux
Copy link

mux commented Aug 13, 2024

Thank you for the explanation on why _with_capacity() makes no sense for the 64-bit implementation.

I realize that roaring64_bitmap_t is not exposed, and I definitely understand why that might be preferrable. However, in some specific cases, it might be necessary to expose the definition to allow consumers of the code to avoid an allocation. In systems where you tend to use thousands of such bitmaps, it can make quite a difference. Now I can of course live with things the way they are, but in this particular scenario it strikes me as a little strange, since the 32-bit implementation does expose the internals. What is the reasoning behind the inconsistency?

To add some context, I am working on a codebase where it is common to have data structures wrapped in other structs, to use our implementation of software transactional memory. The additional allocation can be significant for performance, but maybe even more importantly, this means that accessing such a structure leads to an additional pointer dereference and therefore can incur additional cache misses. These are costly in a system where we are mostly bottlenecked on memory.

@SLieve
Copy link
Contributor

SLieve commented Aug 14, 2024

@mux roaring64_bitmap_t internally does pointer traversal, and this could be up to a fair amount (6 pointers per access) for sparse or high cardinality data. If you are worried about allocations, could you use a custom allocator for the bitmaps?

Hiding the implementation allows us to easily make changes that change the structure of the implementation. I am not sure why this was not done for the 32-bit implementation, but it was written much earlier than the 64-bit version and it's hard to change now.

@mux
Copy link

mux commented Aug 14, 2024

@SLieve I understand - like I was saying I was curious about how things came to be like this but I totally get why you'd want to keep things this way. Nearly every single time I create a new API acting on data, I have a forward struct declaration, and provide pointer-based functions. Thanks for your answer.

@nolaforensix
Copy link

_overwrite() is also missing, correct?

@lemire
Copy link
Member

lemire commented Nov 11, 2024

@nolaforensix It is now on the list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants