Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Methods for growing and shrinking an array #11485

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

beta-ziliani
Copy link
Member

@beta-ziliani beta-ziliani commented Nov 23, 2021

Solves #8209 and #11443 by creating two methods, one for growing the array and one for shrinking:

ary = ['a', 's', 't', 'e', 'r', 'i', 't', 'e']
p ary.remaining_capacity  # => 8
ary.ensure_capacity(20)
p ary.remaining_capacity  # => 20
ary.trim_to_size(8)
p ary.remaining_capacity  # => 8

spec/std/array_spec.cr Outdated Show resolved Hide resolved
@oprypin
Copy link
Member

oprypin commented Nov 23, 2021

I don't think resize is a good name, because at least in C++, C#, Python+NumPy that means actually dropping elements or filling the array with actual new elements.
C++ has reserve (which, by the way, actually can't release memory) and shrink_to_fit for this purpose.
And perhaps this pull request's title should mention what the method does.

@Fryguy
Copy link
Contributor

Fryguy commented Nov 23, 2021

Perhaps .capacity= (maybe with a getter .capicity)?

@straight-shoota
Copy link
Member

I agree that rescale might be confusing about the semantics. #capacity= looks better. But it's a setter, not an action method, so that's a bit of a down side as well. It doesn't immediately convey that it might actually move things around.
Maybe #rescale could be an option? Or #resize_capacity?

@oprypin
Copy link
Member

oprypin commented Nov 23, 2021

FWIW doing the API exactly as in C++ would solve all of the issues named so far. It also prevents having to think what happens when you specify a smaller capacity than now. Also I think it's very pragmatic or goal-oriented. I don't often see the need to set a specific smaller buffer size.
So, reserve(size) (never shrinks) and shrink_to_fit()

@beta-ziliani
Copy link
Member Author

beta-ziliani commented Nov 24, 2021

If we go with two different methods, I like Java's ArrayList ensureCapacity and trimToSize better. Not only the name is different: it tells you that it's ok to give a small value, just it won't do anything.

C# also uses trimToSize, but for the capacity it has a setter as proposed by @Fryguy (that also shrinks). But I don't think the setter is a good idea, since as mentioned in the comment we need to do something about the offset, requiring an extra (defaultable) arg.

@beta-ziliani beta-ziliani changed the title New Array#resize method Methods for growing and shrinking an array Nov 24, 2021
Comment on lines 2075 to 2077
expect_raises(ArgumentError, "Insufficient capacity: 2 cannot hold 3 elements") do
ary = [1, 2, 3]
ary.ensure_capacity(2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again (#11485 (comment)) this seems very unexpected: Why would this method raise? We ask to ensure the array has at least a capacity of 2, and the actual capacity is 3. So everything should be good, nothing to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agh, sorry I missed this in the original comment. Will fix.

src/array.cr Outdated

# Reduces the internal buffer to exactly fit the number of elements in the
# array. If `rewind` is truthy it moves the elements in the internal array to the front.
def trim_to_size(rewind = true) : self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced about the usefulness of the rewind parameter. #trim_to_size should always rewind. I can't see a relevant usecase where you would not want that, especially since you have no external insight on or control of the offset.
For #ensure_capacity it's not that clear, but I don't know why it should bother about that either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be useful to have an "extra capacity" argument, that is resize_to_capacity(size + extra_capacity)? I think we would want that, so that if an element was added immediately after the trimming, it wouldn't needed to increase capacity so close again. on the other hand i guess ensure_capacity can be used for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think the use case for #trim_to_size would be to say I'm done with adding or removing things, let's have it trimmed to save space.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it is implemented here, ensure_capacity won't shrink, so it won't help you in that case. This is why my original proposal had just one method to set the capacity, either expanding or shrinking it. I wonder what's best, if to go back to the original proposal or to add an extra-capacity argument to leave some room.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if fixing #11445 would help here in your use case @carlhoerberg .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's a very good addition, but our deques/arrays also shrinks. well, we can continue to do what we do, create a new array/deque and copy all elements to it, but not having to copy would be much more efficent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that we indeed should never expose the rewind parameter and always do it.
Later someone can separately research whether that's a significant slowdown, and then possibly add this parameter. Also whether it's perhaps possible to do a 2-in-1 equivalent of realloc + rewind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, then again, potentially deprecating the rewind parameter to be ignored (always true) would also not be a breaking change. So maybe it's fine as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let's not add the rewind parameter. Nothing in Array's API says anything about the internal implementation. This will prevent changing the internal implementation later on. Also users reading this will have no idea what this means.

src/array.cr Outdated
resize_to_capacity capacity
else
err = "Insufficient capacity: #{capacity} cannot hold #{@size} elements"
err += " with an offset of #{@offset_to_buffer}" if @offset_to_buffer > 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the discussion in https://github.com/crystal-lang/crystal/pull/11485/files#r756340774, this offset part of the error message is an internal implementation detail that users will likely not understand.

@asterite
Copy link
Member

Where did the requirement for introducing these methods come from? I don't doubt their usefulness, I just wonder if there are real programs where you tried this and it proved to be a benefit.

I'm only asking this because Ruby doesn't have such methods.

@carlhoerberg
Copy link
Contributor

carlhoerberg commented Nov 25, 2021

We have Deques/Arrays that can grow to >100M elements, and then back to 0 or any value in between, and for our long running server application we want to minimize memory usage, hence what we do today when we notice that the size of the Deque/Array is significatly smaller than the capacity is to create a new Deque/Array and copies all the remaining elements to it, so that the unused memory can be GCed, and returned to the OS.

@asterite
Copy link
Member

Wow! 100M elements in memory? Without knowing anything about what the application is... Can't that data be kept in a database? Something like sqlite3, for example.

@straight-shoota
Copy link
Member

@carlhoerberg So you occasionally want to trim down the capacity, but not as tight as the current size, in order to leave some room for growing again without resizing. Sounds legit.
Note however, that trimming does realloc, which may do a copy as well. So it's is not necessarily significantly faster than allocating a new array. It has an opportunity to be faster, depending on what realloc does.

@oprypin oprypin self-requested a review November 26, 2021 20:59
@beta-ziliani
Copy link
Member Author

I removed the rewind parameters (it always rewind first), and I added an extra capacity to trim_to_size to have an option of leaving some spare room.

src/array.cr Outdated
raise ArgumentError.new("Negative extra capacity: #{extra}") if extra < 0

rewind
resize_to_capacity (@size + extra)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm there's an extra space here; surprised crystal tool format didn't complain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I suppose that's because it's just interpreted as method call plus a separate parenthesized expression, not as a method call with args in parenthesis.

src/array.cr Outdated
shift_buffer_by -@offset_to_buffer
end

# Enusures that the internal buffer has at least `capacity` elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo "ensures"

src/array.cr Outdated

# Enusures that the internal buffer has at least `capacity` elements.
def ensure_capacity(capacity : Int32) : self
rewind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will do a realloc even if the given capacity is smaller than @size, is that intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't realloc, but memcpy. But that's the same problem.
I'm not sure #ensure_capacity should ever rewind.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's never a need to rewind.

src/array.cr Outdated

# Reduces the internal buffer to exactly fit the number of elements in the
# array, plus `extra` elements.
def trim_to_size(extra : Int32 = 0) : self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This extra argument is pretty confusing. Trim to size 2 looks like it will trim it to size 2. I would rename extra to "plus", and make it a named argument. Then it will read like "trim to size plus 2".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure I agree on the word. extra is fully correct English, I slightly prefer that one. "with 2 extra elements".
But yes, definitely would need to remove the ability to pass this as a positional argument 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's mainly that as a positional argument a reader might be confused. The named argument could still be extra, but then it doesn't read like an English sentence. "trim to size extra 2" vs "trim to size plus 2". Maybe there's another word or set of words we could use.

Copy link
Member Author

@beta-ziliani beta-ziliani Dec 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vote:

❤️ a.trim_to_size(plus: 2)
😄 a.trim_to_size(extra: 2)
🎉 a.trim_to_size(with_extra: 2)

I can't think of another relevant name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the idea of naming a parameter with_anything. My impression was that the "with" is implied by the opening of the list of arguments -- of course you're calling the method with something. I was going to make an argument that by that logic you could add with_ to any parameter whatsoever, but actually I didn't find any examples, so it could just be that I'm wrong here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe bonus? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or more technical additive.

@HertzDevil
Copy link
Contributor

Any reason that extra space can be reserved only on the right (for #push) but not on the left (for #unshift)?

@beta-ziliani
Copy link
Member Author

@HertzDevil yes: delay a proper treatment of @offset_to_buffer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants