-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Methods for growing and shrinking an array #11485
base: master
Are you sure you want to change the base?
Conversation
I don't think |
Perhaps |
I agree that |
FWIW doing the API exactly as in C++ would solve all of the issues named so far. It also prevents having to think what happens when you specify a smaller capacity than now. Also I think it's very pragmatic or goal-oriented. I don't often see the need to set a specific smaller buffer size. |
If we go with two different methods, I like Java's C# also uses |
Array#resize
method
spec/std/array_spec.cr
Outdated
expect_raises(ArgumentError, "Insufficient capacity: 2 cannot hold 3 elements") do | ||
ary = [1, 2, 3] | ||
ary.ensure_capacity(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again (#11485 (comment)) this seems very unexpected: Why would this method raise? We ask to ensure the array has at least a capacity of 2, and the actual capacity is 3. So everything should be good, nothing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agh, sorry I missed this in the original comment. Will fix.
src/array.cr
Outdated
|
||
# Reduces the internal buffer to exactly fit the number of elements in the | ||
# array. If `rewind` is truthy it moves the elements in the internal array to the front. | ||
def trim_to_size(rewind = true) : self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced about the usefulness of the rewind
parameter. #trim_to_size
should always rewind. I can't see a relevant usecase where you would not want that, especially since you have no external insight on or control of the offset.
For #ensure_capacity
it's not that clear, but I don't know why it should bother about that either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be useful to have an "extra capacity" argument, that is resize_to_capacity(size + extra_capacity)
? I think we would want that, so that if an element was added immediately after the trimming, it wouldn't needed to increase capacity so close again. on the other hand i guess ensure_capacity
can be used for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think the use case for #trim_to_size
would be to say I'm done with adding or removing things, let's have it trimmed to save space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it is implemented here, ensure_capacity
won't shrink, so it won't help you in that case. This is why my original proposal had just one method to set the capacity, either expanding or shrinking it. I wonder what's best, if to go back to the original proposal or to add an extra-capacity argument to leave some room.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if fixing #11445 would help here in your use case @carlhoerberg .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, it's a very good addition, but our deques/arrays also shrinks. well, we can continue to do what we do, create a new array/deque and copy all elements to it, but not having to copy would be much more efficent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that we indeed should never expose the rewind
parameter and always do it.
Later someone can separately research whether that's a significant slowdown, and then possibly add this parameter. Also whether it's perhaps possible to do a 2-in-1 equivalent of realloc
+ rewind
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, then again, potentially deprecating the rewind
parameter to be ignored (always true) would also not be a breaking change. So maybe it's fine as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please let's not add the rewind parameter. Nothing in Array's API says anything about the internal implementation. This will prevent changing the internal implementation later on. Also users reading this will have no idea what this means.
src/array.cr
Outdated
resize_to_capacity capacity | ||
else | ||
err = "Insufficient capacity: #{capacity} cannot hold #{@size} elements" | ||
err += " with an offset of #{@offset_to_buffer}" if @offset_to_buffer > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the discussion in https://github.com/crystal-lang/crystal/pull/11485/files#r756340774, this offset part of the error message is an internal implementation detail that users will likely not understand.
Where did the requirement for introducing these methods come from? I don't doubt their usefulness, I just wonder if there are real programs where you tried this and it proved to be a benefit. I'm only asking this because Ruby doesn't have such methods. |
We have Deques/Arrays that can grow to >100M elements, and then back to 0 or any value in between, and for our long running server application we want to minimize memory usage, hence what we do today when we notice that the size of the Deque/Array is significatly smaller than the capacity is to create a new Deque/Array and copies all the remaining elements to it, so that the unused memory can be GCed, and returned to the OS. |
Wow! 100M elements in memory? Without knowing anything about what the application is... Can't that data be kept in a database? Something like sqlite3, for example. |
@carlhoerberg So you occasionally want to trim down the capacity, but not as tight as the current size, in order to leave some room for growing again without resizing. Sounds legit. |
I removed the |
src/array.cr
Outdated
raise ArgumentError.new("Negative extra capacity: #{extra}") if extra < 0 | ||
|
||
rewind | ||
resize_to_capacity (@size + extra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm there's an extra space here; surprised crystal tool format
didn't complain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. I suppose that's because it's just interpreted as method call plus a separate parenthesized expression, not as a method call with args in parenthesis.
src/array.cr
Outdated
shift_buffer_by -@offset_to_buffer | ||
end | ||
|
||
# Enusures that the internal buffer has at least `capacity` elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo "ensures"
src/array.cr
Outdated
|
||
# Enusures that the internal buffer has at least `capacity` elements. | ||
def ensure_capacity(capacity : Int32) : self | ||
rewind |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will do a realloc even if the given capacity
is smaller than @size
, is that intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't realloc, but memcpy. But that's the same problem.
I'm not sure #ensure_capacity
should ever rewind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's never a need to rewind.
src/array.cr
Outdated
|
||
# Reduces the internal buffer to exactly fit the number of elements in the | ||
# array, plus `extra` elements. | ||
def trim_to_size(extra : Int32 = 0) : self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra argument is pretty confusing. Trim to size 2 looks like it will trim it to size 2. I would rename extra to "plus", and make it a named argument. Then it will read like "trim to size plus 2".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not sure I agree on the word. extra
is fully correct English, I slightly prefer that one. "with 2 extra elements".
But yes, definitely would need to remove the ability to pass this as a positional argument 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's mainly that as a positional argument a reader might be confused. The named argument could still be extra, but then it doesn't read like an English sentence. "trim to size extra 2" vs "trim to size plus 2". Maybe there's another word or set of words we could use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vote:
❤️ a.trim_to_size(plus: 2)
😄 a.trim_to_size(extra: 2)
🎉 a.trim_to_size(with_extra: 2)
I can't think of another relevant name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike the idea of naming a parameter with_anything
. My impression was that the "with" is implied by the opening of the list of arguments -- of course you're calling the method with something. I was going to make an argument that by that logic you could add with_
to any parameter whatsoever, but actually I didn't find any examples, so it could just be that I'm wrong here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe bonus
? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or more technical additive
.
Any reason that extra space can be reserved only on the right (for |
@HertzDevil yes: delay a proper treatment of |
Solves #8209 and #11443 by creating two methods, one for growing the array and one for shrinking: