-
-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the ByteRangeRequest
more literate
#2437
Comments
This would be a good opportunity to think if we should also consider a type that allows doing things like "last N bytes" (without knowing the size beforehand). Most (all?) object stores, for example, allow that type of query. |
Today the elements of |
@paraseba Suffix requests are already used (#1661 (comment)), we just want to ensure the type is well defined so that semantics are clear.
All except Azure.
From here it's zarr-python/src/zarr/codecs/sharding.py Lines 700 to 702 in bc588a7
|
Yes, it is |
That is great, then making this type more clear is even more important. |
So, the dataclass would be like this? @dataclass
class ByteRangeRequest:
start: int # can be negative for indexing from the back
length: int | None # None = read to the end |
No it would be more like a union: class RangeRequest(TypedDict):
"""A request with a start and end in the file"
start: int
"""The first byte of the range"""
end: int
"""The end of the byte range (exclusive)"""
class SuffixRequest(TypedDict):
"""A range that fetches the last N bytes from a file"""
size: int
"""The number of suffix bytes to fetch"""
class OffsetRequest(TypedDict):
"""A request for the first N bytes from a file"""
size: int
"""The number of bytes to fetch"""
ByteRangeRequest = Union[RangeRequest, SuffixRequest, OffsetRequest] I think here the semantics are very clear, even if the union makes it slightly verbose. And here I used typed dicts but that wouldn't fully work as shown because there's no way to decipher between the suffix and offset request (if they both were to use the "size" dict key), but that's easily fixed. |
In case you're curious, in developmentseed/obstore#67 I updated |
Dataclasses are heavier than tuples. Is there any significant performance consequence (memory and / or runtime) of making this change? What if we are fetching 100_000 chunks? |
In obstore I'm still using a tuple of two numbers for the most common case of a range with a known, positive start and end. I think the semantics there are pretty clear (same as
If you made 100,000 suffix requests, it would be 2.7ms slower from this. I figure that suffix requests make up a small part of normal usage anyways, and that this makes the API a lot clearer. But if people make issues in |
Continuing a conversation from #1661
We have IO operations that can fetch ranges of bytes. That byte range parameter is parameterized by an (optional) 2-tuple of nullable ints, i.e.
tuple[int | None, int | None] | None
. The tuple part is ambiguous between two similar specifications of a range of integers --(start, step)
and(start, stop)
; this ambiguity is bad.We should rework this type so that it's harder to be confused about these two possibilities. One option suggested by @kylebarron would be a dataclass, e.g. :
It would also be nice if there was an instance of this class (e.g., the default) that conveyed "fetch the whole thing", and then
byte_range
wouldn't need to be optional in all the store methods. I'm also not sure we get much fromNone
in this type, since there are integers that can express "the end" of a range, e.g.-1
. Curious to hear other ideas here.The text was updated successfully, but these errors were encountered: