Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: [Python][Docs] Add examples for MapArray.from_arrays #37656

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions python/pyarrow/array.pxi
Original file line number Diff line number Diff line change
Expand Up @@ -2363,6 +2363,79 @@ cdef class MapArray(ListArray):
Returns
-------
map_array : MapArray

Examples
--------
First, let's understand the structure of our dataset when viewed in a rectangular data model.
The total of 5 respondents answered the question "How much did you like the movie x?".
The value -1 in the integer array means that the value is missing. The boolean array
represents the null bitmask corresponding to the missing values in the integer array.

slobodan-ilic marked this conversation as resolved.
Show resolved Hide resolved
>>> import pyarrow as pa
>>> movies_rectangular = np.ma.masked_array([
... [10, -1, -1],
... [8, 4, 5],
... [-1, 10, 3],
... [-1, -1, -1],
... [-1, -1, -1]
... ],
... [
... [False, True, True],
... [False, False, False],
... [True, False, False],
... [True, True, True],
... [True, True, True],
... ])

To represent the same data with the MapArray and from_arrays, the data is
formed like this:

>>> offsets = [
... 0, # -- row 1 start
... 1, # -- row 2 start
... 4, # -- row 3 start
... 6, # -- row 4 start
... 6, # -- row 5 start
... 6, # -- row 5 end
... ]
>>> movies = [
... "Dark Knight", # ---------------------------------- row 1
... "Dark Knight", "Meet the Parents", "Superman", # -- row 2
... "Meet the Parents", "Superman", # ----------------- row 3
... ]
>>> likings = [
... 10, # -------- row 1
... 8, 4, 5, # --- row 2
... 10, 3 # ------ row 3
... ]
>>> pa.MapArray.from_arrays(offsets, movies, likings).to_pandas()
0 [(Dark Knight, 10)]
1 [(Dark Knight, 8), (Meet the Parents, 4), (Sup...
2 [(Meet the Parents, 10), (Superman, 3)]
3 []
4 []
dtype: object

If the data in the empty rows needs to be marked as missing, it's possible
to do so by modifying the offsets argument, so that we specify `None` as
the starting positions of the rows we want marked as missing. The end row
offset still has to refer to the existing value from keys (and values):

>>> offsets = [
... 0, # ----- row 1 start
... 1, # ----- row 2 start
... 4, # ----- row 3 start
... None, # -- row 4 start
... None, # -- row 5 start
... 6, # ----- row 5 end
... ]
>>> pa.MapArray.from_arrays(offsets, movies, likings).to_pandas()
0 [(Dark Knight, 10)]
1 [(Dark Knight, 8), (Meet the Parents, 4), (Sup...
2 [(Meet the Parents, 10), (Superman, 3)]
3 None
4 None
dtype: object
"""
cdef:
Array _offsets, _keys, _items
Expand Down