Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mean and sum functions #643

Open
wants to merge 7 commits into
base: develop
Choose a base branch
from

Conversation

aleexarias
Copy link

@aleexarias aleexarias commented Jan 13, 2025

Update mean and sum functions for FData, FDataGrid, FDataIrregular and FDataBasis to correctly handle NaN values in coefficients.

Fixes #642

Describe the proposed changes

Edit the mean function from FData so that it only becomes a parameter check, leaving the checks as it is.
Add an auxiliar function in FDataGrid that works for mean, sum and var, and simply calls the relevant np.sum/nansum, mean/nanmean, var/nanvar when relevant depending on the skipna parameter, have the mean and sum function work with this auxiliar function.
Add a mean function in FDataBasis that calculates the means for the coefficients when the functions have no nan values in the coefficients, otherwise it is not considered for the calculations.
Add a mean function in FDataIrregular that calculates the mean based on the mean_counts parameter and depending on skipna or not.

  • I have performed a self-review of my code
  • The code conforms to the style used in this package
  • The code is fully documented and typed (type-checked with Mypy)
  • I have added thorough tests for the new/changed functionality

Sorry, something went wrong.

@vnmabus vnmabus changed the title Update mean an sum functions Update mean and sum functions Feb 14, 2025
A FDataBasis object with just one sample representing
the mean of all the samples in the original object.
"""
super().mean(axis=axis, dtype=dtype, out=out, keepdims=keepdims,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am no longer sure that we want to do any validation in the abstract class. It is confusing. I would rather move the validation to the subclasses, or, if we do not want to repeat code, to a function in _utils or in a (maybe private for now) function in misc.validation.

if min_count > 0:
valid = ~np.isnan(self.data_matrix)
n_valid = np.sum(valid, axis=0)
data[n_valid < min_count] = np.nan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a conditional be more clear?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not seem to understand where and how you are suggesting to use a conditional, the code does seem clear to me (as the author, I might be biased)

Comment on lines 640 to 641
return self._compute_aggregate(operation='sum', skipna=skipna,
min_count=min_count)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For multiline expressions, our style guide is to put each parameter starting a line of its own, and the matching delimiter starting its own line (at the same indentation level as the line in which it is opened:

Suggested change
return self._compute_aggregate(operation='sum', skipna=skipna,
min_count=min_count)
return self._compute_aggregate(
operation='sum',
skipna=skipna,
min_count=min_count,
)

Please, do the same in the other cases you edited.

if skipna:
count_values = np.sum(~np.isnan(common_values), axis=0)
else:
count_values = np.full(sum_values.shape, self.n_samples)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just self.n_samples?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To operate with sum_values, it is needed in array form to fit seamlessly with the flow of the case where skipna is specified

out: None = None,
keepdims: bool = False,
skipna: bool = False,
min_count: int = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that min_count is not being used here. Why is that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is left for compatibility with the mean functions of FDataIrregular and Grid, but it does not make sense to use it, as you do not have measurements for each observation, but simply the observations approximated by functions.

@@ -882,6 +882,7 @@ def mean(
out: None = None,
keepdims: bool = False,
skipna: bool = False,
min_count: int = 0,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is min_count removed?


data = agg_func(self.data_matrix, axis=0, keepdims=True)

if min_count > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be done if skipna == True.

else:
count_values = np.full(sum_values.shape, self.n_samples)

if min_count > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should only be done if skipna == True.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error in calculating the mean values in the FData object
2 participants