Skip to content

Commit

Permalink
docs
Browse files Browse the repository at this point in the history
  • Loading branch information
rwijtvliet committed May 10, 2024
1 parent d26f3e9 commit 4108a2d
Show file tree
Hide file tree
Showing 2 changed files with 85 additions and 57 deletions.
108 changes: 68 additions & 40 deletions docs/core/interoperability.rst
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,7 @@ To pass a single value, the following objects can be used:
print(repr(pf.Q_(50.0, "Eur/MWh")))
# --- hide: stop ---
pf.Q_(50.0, "Eur/MWh")

The unit is converted to the default unit for its dimension once it is used in any of the ``portfolyo`` objects, see also :ref:`this section<nameunitcompatibility>` further below.


See `pint's website <https://pint.readthedocs.io>`_ for more information about ``pint``.

.. hint:: Using a ``pint.Quantity`` expresses a more deliberate intent, and therefore allows us to catch dimensionality errors more easily. For dimensionless values, such as fractions, we could even use a dimensionless ``Quantity`` (though this quickly becomes cumbersome).
Expand All @@ -51,17 +49,17 @@ One or more values

If we have to specify several individual values, we can use:

* A dictionary with the one or more of the dimension abbrevations (``"w"``, ``"q"``, ``"p"``, ``"r"``, ``"nodim"``) as the keys, and ``float``, ``int`` or ``pint.Quantity`` instances as the values. E.g.:
* A dictionary with the one or more of the dimension abbrevations (``"w"``, ``"q"``, ``"p"``, ``"r"``, ``"nodim"``) as the keys, and ``pint.Quantity`` instances (or ``float`` or ``int`` if dimensionless) as the values. E.g.:

.. exec_code::

# --- hide: start ---
import portfolyo as pf
import pandas as pd
# --- hide: stop ---
{"p": 50.0, "w": pf.Q_(120, 'MW')}
{"p": pf.Q_(50.0, 'Eur/MWh'), "w": pf.Q_(120, 'MW')}
# --- hide: start ---
print(repr({"p": 50.0, "w": pf.Q_(120.0, 'MW')}))
print(repr({"p": pf.Q_(50.0, 'Eur/MWh'), "w": pf.Q_(120.0, 'MW')}))

* Or we can use any other ``Mapping`` from string values to ``float``s, ``int``s, or ``pint.Quantity`` objects, e.g., a ``pandas.Series`` with a string index. It is recommended, however, to use ``Series`` only for timeseries information.

Expand All @@ -75,7 +73,7 @@ One timeseries

.. warning:: To avoid unexpected behavior, timeseries (``pandas.Series`` and ``pandas.DataFrame`` objects) should be of a certain form. See :doc:`../specialized_topics/dataprep`.

For timeseries, ``pandas.Series`` are used. These can be "unit-agnostic" (i.e., of datatype ``float`` or ``int``), or unit-aware as in the example below. [#ts]_
For timeseries, ``pandas.Series`` are used. Unless dimensionless, these should be "unit-aware" as in the example below. [#ts]_

.. exec_code::

Expand All @@ -84,9 +82,9 @@ For timeseries, ``pandas.Series`` are used. These can be "unit-agnostic" (i.e.,
import pandas as pd
idx = pd.date_range("2023", freq="AS", periods=2)
# --- hide: stop ---
pd.Series([50, 56.0], idx, dtype="pint[Eur/MWh]") # unit-aware
tseries = pd.Series([50, 56.0], idx, dtype="pint[Eur/MWh]") # unit-aware
# --- hide: start ---
print(repr(pd.Series([50, 56.0], idx, dtype="pint[Eur/MWh]")))
print(repr(tseries))

.. warning:: The ``name`` attribute of a ``pandas.Series`` is always ignored.

Expand All @@ -105,11 +103,12 @@ To pass several timeseries, we can use:
import pandas as pd
idx = pd.date_range("2023", freq="AS", periods=2)
# --- hide: stop ---
{"p": pd.Series([50, 56], idx), "w": pd.Series([120, 125], idx, dtype="pint[MW]")}
dict_of_tseries = {"p": pd.Series([50, 56], idx, dtype="pint[Eur/MWh]"), "w": pd.Series([120, 125], idx, dtype="pint[MW]")}
dict_of_tseries
# --- hide: start ---
print(repr({"p": pd.Series([50, 56.0], idx), "w": pd.Series([120, 125.0], idx, dtype="pint[MW]")}))
print(repr(dict_of_tseries))

Each of the timeseries can have a unit or be unit-agnostic.
Each of the timeseries must a unit unless it's dimensionless.

* Or we can use any other ``Mapping`` from string values to timeseries, e.g., a ``pandas.DataFrame`` with a datetime-index. In this case:

Expand All @@ -119,14 +118,17 @@ To pass several timeseries, we can use:
import portfolyo as pf
import pandas as pd
idx = pd.date_range("2023", freq="AS", periods=2)
dict_of_tseries = {"p": pd.Series([50, 56], idx, dtype="pint[Eur/MWh]"), "w": pd.Series([120, 125], idx, dtype="pint[MW]")}
# --- hide: stop ---
pd.DataFrame({"p": [50, 56], "w": [120, 125]}, idx)
df = pd.DataFrame(dict_of_tseries)
df
# --- hide: start ---
print(repr(pd.DataFrame({"p": [50, 56.0], "w": [120, 125.0]}, idx)))
print(repr(df))

Dataframes can also be made unit-aware. [#df]_
Note that it is not immediately obvious, that the dataframe is unit-aware. [#df]_

.. note:: The same applied here: because we have to explicitly state the dimension abbreviation, these objects help us avoid dimensionality errors. For this reason, we may want to use them, even for *single* timeseries.

.. note:: The same applied here: because we have to explicitly state the dimension abbreviation when creating a dataframe, they help us avoid dimensionality errors. For this reason, we may want to use them, even for *single* timeseries.

------------
Combinations
Expand All @@ -141,11 +143,11 @@ Dictionaries are the most versatily of these objects. They can be used to pass a
import pandas as pd
idx = pd.date_range("2023", freq="AS", periods=2)
# --- hide: stop ---
d1 = {"p": 50}
d2 = {"p": 50, "w": 120}
d3 = {"p": pd.Series([50, 56], idx)}
d4 = {"p": pd.Series([50, 56], idx), "w": pd.Series([120, 125], idx)}
d5 = {"p": pd.Series([50, 56], idx), "w": 120}
d1 = {"p": pf.Q_(50, "Eur/MWh")}
d2 = {"p": pf.Q_(50, "Eur/MWh"), "w": pf.Q_(120, "MW")}
d3 = {"p": pd.Series([50, 56], idx, dtype="pint[Eur/MWh]")}
d4 = {"p": pd.Series([50, 56], idx, dtype="pint[Eur/MWh]"), "w": pd.Series([120, 125], idx, dtype="pint[MW]")}
d5 = {"p": pd.Series([50, 56], idx, dtype="pint[Eur/MWh]"), "w": pf.Q_(120, "MW")}


.. _ducktyping:
Expand All @@ -164,23 +166,21 @@ Compatilibity of abbrevation and unit

Information can have a key (one of the dimension abbrevations: ``"w"``, ``"q"``, ``"p"``, ``"r"``, ``"nodim"``) and/or a unit. In a DataFrame, a timeseries' key is the corresponding column name. A timeseries 'by itself' has no key; its name is ignored.

There is a one-to-one relationship between dimension abbrevation and unit; see :doc:`../specialized_topics/dimensions`.

* In some of the objects discussed above, we specify both a key *and* a unit. In that case, ``portfolyo`` checks if the unit has the correct dimensionality. If so, but it is not the default unit, a conversion to the default unit is done.
* In some of the objects discussed above, we specify both a key *and* a unit. In that case, ``portfolyo`` checks if it knows the unit, and if it has the correct dimensionality.

E.g., the key ``"p"`` and unit ``ctEur/kWh`` of ``{"p": pd.Series([5.0, 5.6], idx, dtype="pint[ctEur/kWh]")}`` are consistent. The values will be changed to the default unit (=Eur/MWh) upon further processing. Using ``"q"`` instead of ``"p"`` results in a dimensionality error, and using ``"x"`` results in a KeyError.
E.g., the key ``"p"`` and unit ``ctEur/kWh`` of ``{"p": pd.Series([5.0, 5.6], idx, dtype="pint[ctEur/kWh]")}`` are consistent. Using ``"q"`` instead of ``"p"`` results in a dimensionality error, and using ``"x"`` results in a KeyError.

* In some objects, only the unit is specified. Here, the dimension is deduced from the unit, and the unit is converted into the default (if necessary).
* In some objects, only the unit is specified. Here, the dimension is deduced from the unit.

E.g., the timeseries ``pd.Series([5.0, 5.6], idx, dtype="pint[ctEur/kWh]")`` (NB: without the dictionary key) is such an object.

* In other objects, only the key is specified. In that case, the unit is deduced from the key - the default unit is assumed.
* In other objects, only the key is specified. This is bad practice, as the unit is unclear, and an error is raised when using it in calculations or ``PfLine`` initialisation.

E.g., the key ``"p"`` of ``{"p": pd.Series([50, 56], idx)}`` indicates that we are dealing with prices, and the default unit of Eur/MWh is assumed.
E.g., the key ``"p"`` of ``{"p": pd.Series([50, 56], idx)}`` indicates that we are dealing with prices, but the unit is unclear.

* If both are not provided, the dimension must be inferrable from the context, and the unit is assumed to be the default for that dimension.
* If both are not provided, an error is also raised.

E.g. when adding a ``float`` value to a ``PfLine`` containing prices, the value is assumed to also be a price, in the default unit (= Eur/MWh).
E.g. when adding a ``float`` value to a ``PfLine`` containing prices.


---------
Expand Down Expand Up @@ -211,16 +211,44 @@ Footnotes
.. code-block:: python
:emphasize-lines: 4, 7
>>> import pandas as pd
>>> idx = pd.date_range("2023", freq="AS", periods=2)
>>> s_price = pd.Series([50, 56], idx, dtype="pint[Eur/MWh]")
>>> s_volume = pd.Series([120, 125], idx, dtype="pint[MW]")
>>> df1 = pd.DataFrame({"p": s_price, "w": s_volume})
>>> idx = pandas.date_range("2023", freq="AS", periods=2)
>>> s_price = pandas.Series([50, 56], idx, dtype="pint[Eur/MWh]")
>>> s_volume = pandas.Series([120, 125], idx, dtype="pint[MW]")
>>> df1 = pandas.DataFrame({"p": s_price, "w": s_volume})
>>> df_agn = pandas.DataFrame({"p": [50, 56], 'w': [120, 125]}, idx) # unit-agnostic
>>> df_agn = pd.DataFrame({"p": [50, 56], 'w': [120, 125]}, idx) # unit-agnostic
>>> df2 = df_agn.astype({'p': 'pint[Eur/MWh]', 'w': 'pint[MW]'}) # same as df1
>>> df1.dtypes
p pint[Eur/MWh]
w pint[MW]
dtype: object
We can inspect the units with `.dtypes`...
.. exec_code::
# --- hide: start ---
import portfolyo as pf
import pandas as pd
idx = pd.date_range("2023", freq="AS", periods=2)
s_price = pd.Series([50, 56], idx, dtype="pint[Eur/MWh]")
s_volume = pd.Series([120, 125], idx, dtype="pint[MW]")
df1 = pd.DataFrame({"p": s_price, "w": s_volume})
# --- hide: stop ---
df1.dtypes
# --- hide: start ---
print(repr(df1.dtypes))
...or by removing the units from the values and adding them as a column level instead:
.. exec_code::
# --- hide: start ---
import portfolyo as pf
import pandas as pd
idx = pd.date_range("2023", freq="AS", periods=2)
s_price = pd.Series([50, 56], idx, dtype="pint[Eur/MWh]")
s_volume = pd.Series([120, 125], idx, dtype="pint[MW]")
df1 = pd.DataFrame({"p": s_price, "w": s_volume})
# --- hide: stop ---
df1.pint.dequantify()
# --- hide: start ---
print(repr(df1.pint.dequantify()))
34 changes: 17 additions & 17 deletions docs/specialized_topics/dimensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,19 +12,19 @@ We are usually dealing with data that has a certain physical dimension, which de
Abbreviations
-------------

In ``portfolyo``, a convention is used for the abbreviation the most relevant dimensions, as well as their default units.

============ ============================================== ============
Abbreviation Dimension Default unit
============ ============================================== ============
``w`` Power (energy per duration) Megawatt [MW]
``q`` Energy (power times duration) Megawatt-hour [MWh]
``p`` Price (revenue per energy) Euro per Megawatt-hour [Eur/MWh]
``r`` Revenue (or any other Euro [Eur]
absolute monetary value)
``duration`` Duration (of a delivery period) hours [h]
``nodim`` dimensionless none
============ ============================================== ============
In ``portfolyo``, a convention is used for the abbreviation the most relevant dimensions.

============ ==============================================
Abbreviation Dimension
============ ==============================================
``w`` Power (energy per duration)
``q`` Energy (power times duration)
``p`` Price (revenue per energy)
``r`` Revenue (or any other
absolute monetary value)
``duration`` Duration (of a delivery period)
``nodim`` dimensionless
============ ==============================================

Remarks:

Expand All @@ -40,17 +40,17 @@ Using ``pint``, a user can specify not only the magnitude of a value (or Series

In short:

* If both an abbreviation and a unit are specified, they should be compatible.

* If a unit is specified, the values are automatically converted to the default unit.
* A unit must be specified.

* If no abbreviation is specified, the unit is used to deduce it (using the table above).

* If both an abbreviation and a unit are specified, they should be compatible.

------------------------
Other naming conventions
------------------------

"volume" and "quantity" are often used to indicate either power or energy without having to specify, which one. In most situations, it is clear from context which delivery period is meant, so knowing one, we can calculate the other. E.g. 100 MW in January means 74 400 MWh.
"volume" and "quantity" are often used to indicate either power or energy without having to specify, which one. In most situations, it is clear from context which delivery period is meant, so knowing one, we can calculate the other. E.g. 100 MW for the duration of January equals 74 400 MWh.



Expand Down

0 comments on commit 4108a2d

Please sign in to comment.