Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using .loc to add values does not retain pint dtype when working with Series. #126

Open
rwijtvliet opened this issue May 13, 2022 · 2 comments

Comments

@rwijtvliet
Copy link

rwijtvliet commented May 13, 2022

Using .loc to add values to a Series does not retain its pint dtype. For DataFrame, the dtypes are retained.

Here is a minimal working example:

import pandas as pd
import pint_pandas

# Create unit-aware Series and Dataframe
s = pd.Series([70, 60, 50],  dtype="pint[W]")
df = pd.DataFrame(
    {
        "a": pd.Series([70, 60, 50], dtype="pint[W]"),
        "b": pd.Series([0.5, 0.4, 0.2], dtype="pint[s]"),
    }
)

# Append None
s.loc[8] = None
df.loc[8, :] = None

# Issue: s gets the object dtype and the units are included in the individual values
s
# 0    70.0 W
# 1    60.0 W
# 2    50.0 W
# 8      None
# dtype: object

# But df is good, with the columns retaining their pint dtype
df
#       a    b
# 0  70.0  0.5
# 1  60.0  0.4
# 2  50.0  0.2
# 8   nan  nan
df.pint.dequantify()
#          a    b
# unit     W    s
# 0     70.0  0.5
# 1     60.0  0.4
# 2     50.0  0.2
# 8      NaN  NaN

Here are my versions:
{'numpy': '1.22.3', 'pandas': '1.4.1', 'pint': '0.18', 'pint_pandas': '0.2'}

@andrewgsavage
Copy link
Collaborator

Had a look on latest versions, it looks like it's only when appending to a series it's an issue; not for overwriting values. Might be worth raising an issue in pandas-dev?

import pandas as pd
import pint_pandas
import pint
import numpy as np

ureg =pint.get_application_registry()
Q_ = ureg.Quantity 
pint_pandas.show_versions()

{'numpy': '1.23.3',
 'pandas': '1.5.2',
 'pint': '0.20.2.dev16+g01411c7',
 'pint_pandas': '0.4.dev40+g2f39497.d20221212'}

# Appending a single value to the series fails

s = pd.Series([70, 60, 50],  dtype="pint[W]")
# uncomment each line in turn
# s.loc[8] = None # ValueError: fill_value must be a scalar
# s.loc[8] = np.nan # ValueError: fill_value must be a scalar
# s.loc[8] = Q_(np.nan,"W") # converts series to object dtype
# s.loc[8] = 1 # works, maintains pint[watt] dtype
# s.loc[8] = Q_(1, "W") # converts series to object dtype
s

# Appending a list of values KeyErrors:
# s.loc[[8,9]] = Q_(1, "W") # KeyError: "None of [Int64Index([8, 9], dtype='int64')] are in the [index]"

# Setting a list of values works:
s = pd.Series([70, 60, 50],  dtype="pint[W]")
# uncomment each line in turn
# s.loc[[0,1]] = None # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = np.nan # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = Q_(np.nan,"W") # Works, maintains pint[watt] dtype
# s.loc[[0,1]] = 1 # works, maintains pint[watt] dtype
s.loc[[0,1]] = Q_(1, "W") # works, maintains pint[watt] dtype
s

# DataFrame still works
df = pd.DataFrame(
    {
        "a": pd.Series([70, 60, 50], dtype="pint[W]"),
        "b": pd.Series([0.5, 0.4, 0.2], dtype="pint[s]"),
    }
)
# df.loc[8,:] = None # Works, maintains dtypes
# df.loc[8,:] = np.nan # Works, maintains dtypes
# df.loc[8,:] = Q_(np.nan,"W") # Works, dimensionality error
# df.loc[8,:] = 1 # works, maintains pint[watt] dtype
# df.loc[8,:] = Q_(1, "W") # Works, dimensionality error
print(df.dtypes)
df

@MichaelTiemannOSC
Copy link
Collaborator

This pandas issue seems related: pandas-dev/pandas#24246

The _maybe_promote logic currently tripping things up looks like this:

def _maybe_promote(dtype: np.dtype, fill_value=np.nan):
    # The actual implementation of the function, use `maybe_promote` above for                                                                                                                                         
    # a cached version.                                                                                                                                                                                                
    if not is_scalar(fill_value):
        # with object dtype there is nothing to promote, and the user can                                                                                                                                              
        #  pass pretty much any weird fill_value they like                                                                                                                                                             
        if not is_object_dtype(dtype):
            # with object dtype there is nothing to promote, and the user can                                                                                                                                          
            #  pass pretty much any weird fill_value they like                                                                                                                                                         
            raise ValueError("fill_value must be a scalar")
        dtype = _dtype_obj
        return dtype, fill_value

What's missing is anis_extension_array_dtype clause between the two that can do something sane when we need to promote an NA value to a Quantity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants