Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-35740: Add documentation for list arrays' values property #35865

Merged
merged 5 commits into from
Aug 23, 2023

Conversation

spenczar
Copy link
Contributor

@spenczar spenczar commented Jun 1, 2023

Just docs.

I'm a little shaky on my understanding of exactly what's going on with FixedSizeListArray.values, and its behavior with nulls, so that wording might deserve a careful read.

@spenczar spenczar requested a review from AlenkaF as a code owner June 1, 2023 04:34
@github-actions
Copy link

github-actions bot commented Jun 1, 2023

⚠️ GitHub issue #35740 has been automatically assigned in GitHub to PR creator.

@AlenkaF
Copy link
Member

AlenkaF commented Jun 1, 2023

Thank you for the contribution @spenczar !

I think the main difference between values and flatten is when the array is a slice into another array (see description in https://arrow.apache.org/docs/dev/cpp/api/array.html#_CPPv4N5arrow18FixedSizeListArrayE). You could play around with looking at the code for array = pa.array([[1, 2], None, [3, None]],type=pa.large_list(pa.int32()))and array.slice(0, 1).

You are correct in the understanding that the behaviour of FixedSizeListArray.values is a bit different. I am now aware of the reason for this, but the difference can be seen in the link to the docs from before (https://arrow.apache.org/docs/dev/cpp/api/array.html#_CPPv4N5arrow9ListArrayE).

Some nit for finish: there are doctest errors that need to be fixed with adding ...:

=================================== FAILURES ===================================
___________ [doctest] pyarrow.lib.FixedSizeListArray.values.__get__ ____________
2378         values : Array
2379 
2380         Examples
2381         --------
2382         >>> import pyarrow as pa
2383         >>> array = pa.array(
2384         ...     [[1, 2], None, [3, None]],
2385         ...     type=pa.list_(pa.int32(), 2)
2386         ... )
2387         >>> array.values
Differences (unified diff with -expected +actual):
    @@ -1,3 +1,3 @@
    -<pyarrow.lib.Int32Array object at 0x12d1743a0>
    +<pyarrow.lib.Int32Array object at 0x7f3828fa5440>
     [
       1,

/opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/lib.cpython-39-x86_64-linux-gnu.so:2387: DocTestFailure
_____________ [doctest] pyarrow.lib.LargeListArray.values.__get__ ______________
2189         values : Array
2190 
2191         Examples
2192         --------
2193         >>> import pyarrow as pa
2194         >>> array = pa.array(
2195         ...     [[1, 2], None, [3, 4, None, 6]],
2196         ...     type=pa.large_list(pa.int32()),
2197         ... )
2198         >>> array.values
Differences (unified diff with -expected +actual):
    @@ -1,3 +1,3 @@
    -<pyarrow.lib.Int32Array object at 0x12d174100>
    +<pyarrow.lib.Int32Array object at 0x7f3828faeb40>
     [
       1,

@spenczar
Copy link
Contributor Author

spenczar commented Jun 1, 2023

Thanks for the explanation @AlenkaF, that makes a lot of sense, and I learned a bunch about Arrow internals from playing with slice. Do you think the docs I wrote need any edits?

I think I've fixed the doctest issue, too, we'll see if it passes CI now.

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the corrections!
Some minor nits ...

python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Jun 6, 2023
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @spenczar!
I added a few more comments. I think if we expand it even more (eg additional example), we can further clarify the difference between values and flatten(). But note that what you have here is already a clear improvement, so it's certainly fine as well to just focus on finalizing the edits you have now in this PR.

python/pyarrow/array.pxi Outdated Show resolved Hide resolved
python/pyarrow/array.pxi Show resolved Hide resolved
python/pyarrow/array.pxi Show resolved Hide resolved
python/pyarrow/array.pxi Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Jun 6, 2023
Co-authored-by: Alenka Frim <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 6, 2023
@spenczar
Copy link
Contributor Author

Hello! Anything else to be done here? This is affecting some intersphinx stuff I'm doing in a mild way.

@spenczar
Copy link
Contributor Author

Is there anything I can do to get this merged?

Copy link
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay @spenczar!

I would like us to address the unresolved comment before merging. Otherwise the PR looks good 👍

python/pyarrow/array.pxi Outdated Show resolved Hide resolved
@AlenkaF AlenkaF merged commit 6837d05 into apache:main Aug 23, 2023
11 checks passed
@AlenkaF AlenkaF removed the awaiting change review Awaiting change review label Aug 23, 2023
@AlenkaF AlenkaF added this to the 14.0.0 milestone Aug 23, 2023
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 6837d05.

There were 108 benchmark results with an error:

There was 1 benchmark result indicating a performance regression:

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…pache#35865)

Just docs.

I'm a little shaky on my understanding of exactly what's going on with FixedSizeListArray.values, and its behavior with nulls, so that wording might deserve a careful read.
* Closes: apache#35740

Lead-authored-by: Spencer Nelson <[email protected]>
Co-authored-by: Joris Van den Bossche <[email protected]>
Co-authored-by: Alenka Frim <[email protected]>
Signed-off-by: AlenkaF <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Python][Docs] 'values' attribute is missing documentation for ListArray and its friends
3 participants