Skip to content

Commit

Permalink
Merge branch 'main' into na_rep-bug
Browse files Browse the repository at this point in the history
  • Loading branch information
rsm-23 authored Oct 16, 2023
2 parents 2304fb7 + e0d6051 commit 44ba528
Show file tree
Hide file tree
Showing 179 changed files with 2,873 additions and 2,457 deletions.
4 changes: 2 additions & 2 deletions .github/actions/build_pandas/action.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,8 @@ runs:
- name: Build Pandas
run: |
if [[ ${{ inputs.editable }} == "true" ]]; then
pip install -e . --no-build-isolation -v
pip install -e . --no-build-isolation -v --no-deps
else
pip install . --no-build-isolation -v
pip install . --no-build-isolation -v --no-deps
fi
shell: bash -el {0}
2 changes: 1 addition & 1 deletion .github/workflows/unit-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -348,7 +348,7 @@ jobs:
python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scientific-python-nightly-wheels/simple numpy
python -m pip install versioneer[toml]
python -m pip install python-dateutil pytz tzdata "cython<3.0.3" hypothesis>=6.46.1 pytest>=7.3.2 pytest-xdist>=2.2.0 pytest-cov pytest-asyncio>=0.17
python -m pip install -ve . --no-build-isolation --no-index
python -m pip install -ve . --no-build-isolation --no-index --no-deps
python -m pip list
- name: Run Tests
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-310.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ dependencies:

# required dependencies
- python-dateutil
- numpy
- numpy<2
- pytz

# optional dependencies
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-311-downstream_compat.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ dependencies:

# required dependencies
- python-dateutil
- numpy
- numpy<2
- pytz

# optional dependencies
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-311-pyarrownightly.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ dependencies:

# required dependencies
- python-dateutil
- numpy
- numpy<2
- pytz
- pip

Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-311.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ dependencies:

# required dependencies
- python-dateutil
- numpy
- numpy<2
- pytz

# optional dependencies
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-39-minimum_versions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ dependencies:

# required dependencies
- python-dateutil=2.8.2
- numpy=1.22.4
- numpy=1.22.4, <2
- pytz=2020.1

# optional dependencies
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ dependencies:

# required dependencies
- python-dateutil
- numpy
- numpy<2
- pytz

# optional dependencies
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/actions-pypy-39.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ dependencies:
- hypothesis>=6.46.1

# required
- numpy
- numpy<2
- python-dateutil
- pytz
- pip:
Expand Down
2 changes: 1 addition & 1 deletion ci/deps/circle-310-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ dependencies:

# required dependencies
- python-dateutil
- numpy
- numpy<2
- pytz

# optional dependencies
Expand Down
2 changes: 1 addition & 1 deletion doc/source/development/contributing_codebase.rst
Original file line number Diff line number Diff line change
Expand Up @@ -540,7 +540,7 @@ xfail during the testing phase. To do so, use the ``request`` fixture:
def test_xfail(request):
mark = pytest.mark.xfail(raises=TypeError, reason="Indicate why here")
request.node.add_marker(mark)
request.applymarker(mark)
xfail is not to be used for tests involving failure due to invalid user arguments.
For these tests, we need to verify the correct exception type and error message
Expand Down
130 changes: 75 additions & 55 deletions doc/source/user_guide/copy_on_write.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ Copy-on-Write (CoW)
*******************

Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
optimizations that become possible through CoW are implemented and supported. A complete list
can be found at :ref:`Copy-on-Write optimizations <copy_on_write.optimizations>`.
optimizations that become possible through CoW are implemented and supported. All possible
optimizations are supported starting from pandas 2.1.

We expect that CoW will be enabled by default in version 3.0.

Expand Down Expand Up @@ -154,66 +154,86 @@ With copy on write this can be done by using ``loc``.
df.loc[df["bar"] > 5, "foo"] = 100
Read-only NumPy arrays
----------------------

Accessing the underlying NumPy array of a DataFrame will return a read-only array if the array
shares data with the initial DataFrame:

The array is a copy if the initial DataFrame consists of more than one array:


.. ipython:: python
df = pd.DataFrame({"a": [1, 2], "b": [1.5, 2.5]})
df.to_numpy()
The array shares data with the DataFrame if the DataFrame consists of only one NumPy array:

.. ipython:: python
df = pd.DataFrame({"a": [1, 2], "b": [3, 4]})
df.to_numpy()
This array is read-only, which means that it can't be modified inplace:

.. ipython:: python
:okexcept:
arr = df.to_numpy()
arr[0, 0] = 100
The same holds true for a Series, since a Series always consists of a single array.

There are two potential solution to this:

- Trigger a copy manually if you want to avoid updating DataFrames that share memory with your array.
- Make the array writeable. This is a more performant solution but circumvents Copy-on-Write rules, so
it should be used with caution.

.. ipython:: python
arr = df.to_numpy()
arr.flags.writeable = True
arr[0, 0] = 100
arr
Patterns to avoid
-----------------

No defensive copy will be performed if two objects share the same data while
you are modifying one object inplace.

.. ipython:: python
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df2 = df.reset_index()
df2.iloc[0, 0] = 100
This creates two objects that share data and thus the setitem operation will trigger a
copy. This is not necessary if the initial object ``df`` isn't needed anymore.
Simply reassigning to the same variable will invalidate the reference that is
held by the object.

.. ipython:: python
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
df = df.reset_index()
df.iloc[0, 0] = 100
No copy is necessary in this example.
Creating multiple references keeps unnecessary references alive
and thus will hurt performance with Copy-on-Write.

.. _copy_on_write.optimizations:

Copy-on-Write optimizations
---------------------------

A new lazy copy mechanism that defers the copy until the object in question is modified
and only if this object shares data with another object. This mechanism was added to
following methods:

- :meth:`DataFrame.reset_index` / :meth:`Series.reset_index`
- :meth:`DataFrame.set_index`
- :meth:`DataFrame.set_axis` / :meth:`Series.set_axis`
- :meth:`DataFrame.set_flags` / :meth:`Series.set_flags`
- :meth:`DataFrame.rename_axis` / :meth:`Series.rename_axis`
- :meth:`DataFrame.reindex` / :meth:`Series.reindex`
- :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
- :meth:`DataFrame.assign`
- :meth:`DataFrame.drop`
- :meth:`DataFrame.dropna` / :meth:`Series.dropna`
- :meth:`DataFrame.select_dtypes`
- :meth:`DataFrame.align` / :meth:`Series.align`
- :meth:`Series.to_frame`
- :meth:`DataFrame.rename` / :meth:`Series.rename`
- :meth:`DataFrame.add_prefix` / :meth:`Series.add_prefix`
- :meth:`DataFrame.add_suffix` / :meth:`Series.add_suffix`
- :meth:`DataFrame.drop_duplicates` / :meth:`Series.drop_duplicates`
- :meth:`DataFrame.droplevel` / :meth:`Series.droplevel`
- :meth:`DataFrame.reorder_levels` / :meth:`Series.reorder_levels`
- :meth:`DataFrame.between_time` / :meth:`Series.between_time`
- :meth:`DataFrame.filter` / :meth:`Series.filter`
- :meth:`DataFrame.head` / :meth:`Series.head`
- :meth:`DataFrame.tail` / :meth:`Series.tail`
- :meth:`DataFrame.isetitem`
- :meth:`DataFrame.pipe` / :meth:`Series.pipe`
- :meth:`DataFrame.pop` / :meth:`Series.pop`
- :meth:`DataFrame.replace` / :meth:`Series.replace`
- :meth:`DataFrame.shift` / :meth:`Series.shift`
- :meth:`DataFrame.sort_index` / :meth:`Series.sort_index`
- :meth:`DataFrame.sort_values` / :meth:`Series.sort_values`
- :meth:`DataFrame.squeeze` / :meth:`Series.squeeze`
- :meth:`DataFrame.swapaxes`
- :meth:`DataFrame.swaplevel` / :meth:`Series.swaplevel`
- :meth:`DataFrame.take` / :meth:`Series.take`
- :meth:`DataFrame.to_timestamp` / :meth:`Series.to_timestamp`
- :meth:`DataFrame.to_period` / :meth:`Series.to_period`
- :meth:`DataFrame.truncate`
- :meth:`DataFrame.iterrows`
- :meth:`DataFrame.tz_convert` / :meth:`Series.tz_localize`
- :meth:`DataFrame.fillna` / :meth:`Series.fillna`
- :meth:`DataFrame.interpolate` / :meth:`Series.interpolate`
- :meth:`DataFrame.ffill` / :meth:`Series.ffill`
- :meth:`DataFrame.bfill` / :meth:`Series.bfill`
- :meth:`DataFrame.where` / :meth:`Series.where`
- :meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
- :meth:`DataFrame.astype` / :meth:`Series.astype`
- :meth:`DataFrame.convert_dtypes` / :meth:`Series.convert_dtypes`
- :meth:`DataFrame.join`
- :meth:`DataFrame.eval`
- :func:`concat`
- :func:`merge`
methods that don't require a copy of the underlying data. Popular examples are :meth:`DataFrame.drop` for ``axis=1``
and :meth:`DataFrame.rename`.

These methods return views when Copy-on-Write is enabled, which provides a significant
performance improvement compared to the regular execution.
Expand Down
28 changes: 14 additions & 14 deletions doc/source/user_guide/timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -461,7 +461,7 @@ of those specified will not be generated:

.. ipython:: python
pd.date_range(start, end, freq="BM")
pd.date_range(start, end, freq="BME")
pd.date_range(start, end, freq="W")
Expand Down Expand Up @@ -557,7 +557,7 @@ intelligent functionality like selection, slicing, etc.

.. ipython:: python
rng = pd.date_range(start, end, freq="BM")
rng = pd.date_range(start, end, freq="BME")
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts.index
ts[:5].index
Expand Down Expand Up @@ -884,9 +884,9 @@ into ``freq`` keyword arguments. The available date offsets and associated frequ
:class:`~pandas.tseries.offsets.LastWeekOfMonth`, ``'LWOM'``, "the x-th day of the last week of each month"
:class:`~pandas.tseries.offsets.MonthEnd`, ``'ME'``, "calendar month end"
:class:`~pandas.tseries.offsets.MonthBegin`, ``'MS'``, "calendar month begin"
:class:`~pandas.tseries.offsets.BMonthEnd` or :class:`~pandas.tseries.offsets.BusinessMonthEnd`, ``'BM'``, "business month end"
:class:`~pandas.tseries.offsets.BMonthEnd` or :class:`~pandas.tseries.offsets.BusinessMonthEnd`, ``'BME'``, "business month end"
:class:`~pandas.tseries.offsets.BMonthBegin` or :class:`~pandas.tseries.offsets.BusinessMonthBegin`, ``'BMS'``, "business month begin"
:class:`~pandas.tseries.offsets.CBMonthEnd` or :class:`~pandas.tseries.offsets.CustomBusinessMonthEnd`, ``'CBM'``, "custom business month end"
:class:`~pandas.tseries.offsets.CBMonthEnd` or :class:`~pandas.tseries.offsets.CustomBusinessMonthEnd`, ``'CBME'``, "custom business month end"
:class:`~pandas.tseries.offsets.CBMonthBegin` or :class:`~pandas.tseries.offsets.CustomBusinessMonthBegin`, ``'CBMS'``, "custom business month begin"
:class:`~pandas.tseries.offsets.SemiMonthEnd`, ``'SM'``, "15th (or other day_of_month) and calendar month end"
:class:`~pandas.tseries.offsets.SemiMonthBegin`, ``'SMS'``, "15th (or other day_of_month) and calendar month begin"
Expand All @@ -896,9 +896,9 @@ into ``freq`` keyword arguments. The available date offsets and associated frequ
:class:`~pandas.tseries.offsets.BQuarterBegin`, ``'BQS'``, "business quarter begin"
:class:`~pandas.tseries.offsets.FY5253Quarter`, ``'REQ'``, "retail (aka 52-53 week) quarter"
:class:`~pandas.tseries.offsets.YearEnd`, ``'Y'``, "calendar year end"
:class:`~pandas.tseries.offsets.YearBegin`, ``'AS'`` or ``'BYS'``,"calendar year begin"
:class:`~pandas.tseries.offsets.BYearEnd`, ``'BA'``, "business year end"
:class:`~pandas.tseries.offsets.BYearBegin`, ``'BAS'``, "business year begin"
:class:`~pandas.tseries.offsets.YearBegin`, ``'YS'`` or ``'BYS'``,"calendar year begin"
:class:`~pandas.tseries.offsets.BYearEnd`, ``'BY'``, "business year end"
:class:`~pandas.tseries.offsets.BYearBegin`, ``'BYS'``, "business year begin"
:class:`~pandas.tseries.offsets.FY5253`, ``'RE'``, "retail (aka 52-53 week) year"
:class:`~pandas.tseries.offsets.Easter`, None, "Easter holiday"
:class:`~pandas.tseries.offsets.BusinessHour`, ``'bh'``, "business hour"
Expand Down Expand Up @@ -1248,8 +1248,8 @@ frequencies. We will refer to these aliases as *offset aliases*.
"W", "weekly frequency"
"ME", "month end frequency"
"SM", "semi-month end frequency (15th and end of month)"
"BM", "business month end frequency"
"CBM", "custom business month end frequency"
"BME", "business month end frequency"
"CBME", "custom business month end frequency"
"MS", "month start frequency"
"SMS", "semi-month start frequency (1st and 15th)"
"BMS", "business month start frequency"
Expand All @@ -1259,9 +1259,9 @@ frequencies. We will refer to these aliases as *offset aliases*.
"QS", "quarter start frequency"
"BQS", "business quarter start frequency"
"Y", "year end frequency"
"BA, BY", "business year end frequency"
"AS, YS", "year start frequency"
"BAS, BYS", "business year start frequency"
"BY", "business year end frequency"
"YS", "year start frequency"
"BYS", "business year start frequency"
"h", "hourly frequency"
"bh", "business hour frequency"
"cbh", "custom business hour frequency"
Expand Down Expand Up @@ -1586,7 +1586,7 @@ rather than changing the alignment of the data and the index:
ts.shift(5, freq="D")
ts.shift(5, freq=pd.offsets.BDay())
ts.shift(5, freq="BM")
ts.shift(5, freq="BME")
Note that with when ``freq`` is specified, the leading entry is no longer NaN
because the data is not being realigned.
Expand Down Expand Up @@ -1692,7 +1692,7 @@ the end of the interval.
.. warning::

The default values for ``label`` and ``closed`` is '**left**' for all
frequency offsets except for 'ME', 'Y', 'Q', 'BM', 'BA', 'BQ', and 'W'
frequency offsets except for 'ME', 'Y', 'Q', 'BME', 'BY', 'BQ', and 'W'
which all have a default of 'right'.

This might unintendedly lead to looking ahead, where the value for a later
Expand Down
20 changes: 16 additions & 4 deletions doc/source/whatsnew/v0.20.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -886,11 +886,23 @@ This would happen with a ``lexsorted``, but non-monotonic levels. (:issue:`15622

This is *unchanged* from prior versions, but shown for illustration purposes:

.. ipython:: python
.. code-block:: python
df = pd.DataFrame(np.arange(6), columns=['value'],
index=pd.MultiIndex.from_product([list('BA'), range(3)]))
df
In [81]: df = pd.DataFrame(np.arange(6), columns=['value'],
....: index=pd.MultiIndex.from_product([list('BA'), range(3)]))
....:
In [82]: df
Out[82]:
value
B 0 0
1 1
2 2
A 0 3
1 4
2 5
[6 rows x 1 columns]
.. code-block:: python
Expand Down
Loading

0 comments on commit 44ba528

Please sign in to comment.