Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exposing the roaring64_iterator_t struct in the roaring64 header #682

Open
nardyy01 opened this issue Nov 11, 2024 · 12 comments
Open

Exposing the roaring64_iterator_t struct in the roaring64 header #682

nardyy01 opened this issue Nov 11, 2024 · 12 comments
Assignees

Comments

@nardyy01
Copy link

nardyy01 commented Nov 11, 2024

I wanted to see if there's any chance of moving the roaring64_iterator_s definition from roaring64.c to the header. My reasoning for this is that the original roaring.h had everything exposed which allowed for referencing/dereferencing certain values. I was trying to update some code to allow for similar functionality with the option of using either 32 or 64bits and everything was available except this.

@nardyy01 nardyy01 changed the title Exposing the roaring64_iterator_t struct in the header Exposing the roaring64_iterator_t struct in the roaring64 header Nov 11, 2024
@SLieve
Copy link
Contributor

SLieve commented Nov 11, 2024

@nardyy01 could you expand on your use-case? Keeping the definition hidden allows for future changes to the struct without breaking users, so this change would need to have appropriate motivation.

@nardyy01
Copy link
Author

nardyy01 commented Nov 11, 2024

Would having it in the header here be much different from roaring.h (with warning comments of only accessing value)?
I am modifying company code, so I need to use the submodule for proper access to the bitmaps.

As far as use-case, we handle large datasets that are often compressed and stored in multi-dimensional containers. In most areas we are passing around references while processing the data in chunks to keep up the performance -- maintaining access to the reference allows for quick access and reduced calls overall.

@lemire
Copy link
Member

lemire commented Nov 11, 2024

Would having it in the header here be much different from roaring.h (with warning comments of only accessing value)?

Can you be specific? There is no roaring_iterator_t or roaring_iterator_s, right?

As far as use-case, we handle large datasets that are often compressed and stored in multi-dimensional containers. In most areas we are passing around references while processing the data in chunks to keep up the performance -- maintaining access to the reference allows for quick access and reduced calls overall.

So what you want to do it capture the iterator by value rather than as a pointer?

@nardyy01
Copy link
Author

nardyy01 commented Nov 11, 2024

This is what it looks like in roaring.h currently -- but for 64 bits the definition is in roaring64.c

Obviously, if you modify the underlying bitmap, the iterator
becomes invalid. So don't.
*/

/**
 * A struct used to keep iterator state. Users should only access
 * `current_value` and `has_value`, the rest of the type should be treated as
 * opaque.
 */
typedef struct roaring_uint32_iterator_s {
    const roaring_bitmap_t *parent;        // Owner
    const ROARING_CONTAINER_T *container;  // Current container
    uint8_t typecode;                      // Typecode of current container
    int32_t container_index;               // Current container index
    uint32_t highbits;                     // High 16 bits of the current value
    roaring_container_iterator_t container_it;

    uint32_t current_value;
    bool has_value;
} roaring_uint32_iterator_t;

@nardyy01
Copy link
Author

nardyy01 commented Nov 12, 2024

Its not really capturing the value I suppose, since it could be modified if not called with const-- its more just getting direct access to it by returning a reference or pointer to the address ( uint_32& or uint32_t*) of an instance of current_value which could be stored and accessed freely

@lemire
Copy link
Member

lemire commented Nov 12, 2024

@nardyy01 Thanks. This is much clearer.

@SLieve What do you think? I submit to you that we are probably unlikely to change this struct in the future.

@josh08287
Copy link

FYI Issue #669 will almost certainly need this, as creating c++ iterators will need to be able to provide operator*() and operator->() conferencing operators to be able to complete the iterators. We have these iterators in our c++ wrapper for 32bit and are also trying to make them for 64bit.

@lemire
Copy link
Member

lemire commented Nov 12, 2024

It is just a matter of copying and pasting code, but I'd really like to get @SLieve's ok before doing so.

@madscientist
Copy link
Contributor

I have gotten very distracted and haven't pushed my implementation for #669 but the way it works is that I create a local struct that contained enough info, and the proper operators, to allow it to appear as an lvalue and update the bitmap. Then the iterator worked with that. I don't remember needing to expose any internals. Although I'm not sure I got the 64bit version working completely. Of course, it's not a requirement to make internals fully public, in C++, to allow an iterator special access. It's very common to use friend or similar.

@lemire
Copy link
Member

lemire commented Nov 12, 2024

@madscientist

Of course, it's not a requirement to make internals fully public, in C++, to allow an iterator special access. It's very common to use friend or similar.

In C, we can document that only some fields should be accessed. I don't think that it is a critical issue.

@SLieve's point seems different: he does not want to expose the definition. This is sensible, but we can revisit this choice.

@SLieve
Copy link
Contributor

SLieve commented Nov 13, 2024

I'm still not quite getting the use-case here. You'd like to get a reference or pointer to the current iterator value? If that's the case, I have two thoughts:

  1. The current iterator value does not point to the value in the bitmap, so modifying the iterator value does not modify the bitmap.
  2. Passing an integer by reference or pointer has no advantage in terms of efficiency over passing it by value.

@nardyy01
Copy link
Author

@SLieve Hmm since the struct is using a copy, I suppose these are valid points.
I have modified the code to pass by value and it seems to be handling similarly. I will update if any performance differences are noticed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants