Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Fix errors in pandas.Series.argmin #32286

Merged
merged 18 commits into from
Mar 6, 2020

Conversation

farhanreynaldo
Copy link
Contributor

@farhanreynaldo farhanreynaldo commented Feb 27, 2020

output of python scripts/validate_docstrings.py pandas.Series.argmin:

################################################################################
################################## Validation ##################################
################################################################################

@simonjayhawkins
Copy link
Member

Thanks @farhanreynaldo can you merge master. failing CI should be fixed.

@simonjayhawkins
Copy link
Member

@datapythonista should we be sharing the bulk of the docstring with Series.argmax, xref #32019

@datapythonista
Copy link
Member

@datapythonista should we be sharing the bulk of the docstring with Series.argmax, xref #32019

Yes, I think that makes sense. @farhanreynaldo the idea here is instead of repeating the same in both docstrings, to reuse from one to the other. We have a @doc decorator to help with that. The general idea if I'm not wrong of the exact syntax is:

@doc(operation='maximum')
def argmax():
    """
    This is computing the {operation}.
    """
    pass

@doc(argmax, operation='minimum')
def argmin():
    pass

The @doc decorator was recently introduced in #31060, the diff of that PR can help you understand. If you have any question, or you have any idea for improvement given this use case, feel free to ping @HH-MWB

@farhanreynaldo
Copy link
Contributor Author

@datapythonista should we be sharing the bulk of the docstring with Series.argmax, xref #32019

Yes, I think that makes sense. @farhanreynaldo the idea here is instead of repeating the same in both docstrings, to reuse from one to the other. We have a @doc decorator to help with that. The general idea if I'm not wrong of the exact syntax is:

@doc(operation='maximum')
def argmax():
    """
    This is computing the {operation}.
    """
    pass

@doc(argmax, operation='minimum')
def argmin():
    pass

The @doc decorator was recently introduced in #31060, the diff of that PR can help you understand. If you have any question, or you have any idea for improvement given this use case, feel free to ping @HH-MWB

Sure, let me figure it out first.

@farhanreynaldo
Copy link
Contributor Author

@datapythonista should I do another pull request regarding the changes for pandas.Series.argmax doc?

@datapythonista
Copy link
Member

@datapythonista should I do another pull request regarding the changes for pandas.Series.argmax doc?

I think a single PR with the changes to reuse the docstring is what makes more sense. Whether you reuse this PR, or you close this an open a new one doesn't make a difference for us.

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @farhanreynaldo for the updates.

a couple of ci failues.. you'll need to run black, https://pandas.io/docs/development/contributing.html#python-pep8-black
the also the doc decorator parameters should be strings.

@farhanreynaldo
Copy link
Contributor Author

Thanks @farhanreynaldo for the updates.

a couple of ci failues.. you'll need to run black, https://pandas.io/docs/development/contributing.html#python-pep8-black
the also the doc decorator parameters should be strings.

Sure, I will run black pandas

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @farhanreynaldo , could perhaps reduce the parameter count for clarity.

@simonjayhawkins
Copy link
Member

@farhanreynaldo can you merge master to resolve conflicts

@simonjayhawkins simonjayhawkins added this to the 1.1 milestone Mar 3, 2020
Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @farhanreynaldo

For the examples, we would usually have both the minimum and the maximum in both examples, which may be easier and it makes sense. And even for the see also section we often have both (even if one is self-referencing the current page).

But looks good like this too.

@simonjayhawkins
Copy link
Member

@farhanreynaldo can you run black to fix ci. ping on green.

@farhanreynaldo
Copy link
Contributor Author

Great job @farhanreynaldo

For the examples, we would usually have both the minimum and the maximum in both examples, which may be easier and it makes sense. And even for the see also section we often have both (even if one is self-referencing the current page).

But looks good like this too.

Yeah, I think it would be nice to put both examples in the documentation, but then we should add another parameter for example values in doc arguments. Is that okay?

@farhanreynaldo
Copy link
Contributor Author

@farhanreynaldo can you run black to fix ci. ping on green.

@simonjayhawkins all green now.

@datapythonista
Copy link
Member

The idea is not to add another parameter but to remove the ones you have now for the examples. If you have the max first and the min later you don't need any variable.

@farhanreynaldo
Copy link
Contributor Author

The idea is not to add another parameter but to remove the ones you have now for the examples. If you have the max first and the min later you don't need any variable.

Oh I see, we could reduce the parameter. Is this what you are looking for?

See Also
--------
Series.arg{op} : Return position of the {op}imum value.
numpy.ndarray.arg{op} : Equivalent method for numpy arrays.
Series.arg{oppose} : Similar method, but returning the {oppose}imum.
Series.idxmax : Return index label of the maximum values.
Series.idxmin : Return index label of the minimum values.

and for the examples

Examples
--------
Consider dataset containing cereal calories

>>> s = pd.Series({{'Corn Flakes': 100.0, 'Almond Delight': 110.0,
...                'Cinnamon Toast Crunch': 120.0, 'Cocoa Puff': 110.0}})
>>> s
Corn Flakes              100.0
Almond Delight           110.0
Cinnamon Toast Crunch    120.0
Cocoa Puff               110.0
dtype: float64

>>> s.argmax()
2

>>> s.argmin()
0

The maximum cereal calories is the third element and 
the minimum cereal calories is the first element,
since series is zero-indexed. 

@datapythonista
Copy link
Member

Exactly, that's what I meant.

@datapythonista
Copy link
Member

And for see also you could also repeat argmin and argmax and avoid the variables.

@farhanreynaldo
Copy link
Contributor Author

And for see also you could also repeat argmin and argmax and avoid the variables.

Could we keep the variables in the See Also section so the order of the reference consistent with its function?
e.g pandas.Series.argmin documentation would have its function appear first in the See also section

@datapythonista
Copy link
Member

I'm ok with that. There is a trade off between keeping things simple in the code, and having each docstring with its own links, examples... In general I think it's better to keep things simpler for us if the docstrings are useful, but not really important in this case, I think any option is good.

@farhanreynaldo
Copy link
Contributor Author

I'm ok with that. There is a trade off between keeping things simple in the code, and having each docstring with its own links, examples... In general I think it's better to keep things simpler for us if the docstrings are useful, but not really important in this case, I think any option is good.

I think I'm just gonna follow your lead.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @farhanreynaldo, looks great. Just couple of minor things that I think can still be improved, but nice work.

@@ -927,16 +927,19 @@ def max(self, axis=None, skipna=True, *args, **kwargs):
nv.validate_max(args, kwargs)
return nanops.nanmax(self._values, skipna=skipna)

@doc(
op="max", oppose="min", value="largest",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is from black, but I guess you can leave the decorator in a single line, and black will respect it.

numpy.ndarray.argmax : Equivalent method for numpy arrays.
Series.argmin : Similar method, but returning the minimum.
Series.argmax : Return position of the maximum value.
Series.argmin : Return position of the minimum value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not finally using oppose. I think you can replace those two lines by this, and we don't need to repeat the same page in this case. Or you can simply remove the oppose parameter of doc otherwise.

Suggested change
Series.argmin : Return position of the minimum value.
Series.arg{oppose} : Return position of the {oppose}imum value.

@@ -1019,25 +1026,10 @@ def min(self, axis=None, skipna=True, *args, **kwargs):
nv.validate_min(args, kwargs)
return nanops.nanmin(self._values, skipna=skipna)

@doc(
argmax, op="min", oppose="max", value="smallest",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before about being a one-liner.

@datapythonista
Copy link
Member

Thanks for the fixes. The error in the CI is caused by a new version of Matplotlib being released, and will be fixed after #32444 is merged.

@farhanreynaldo
Copy link
Contributor Author

Thanks for the fixes. The error in the CI is caused by a new version of Matplotlib being released, and will be fixed after #32444 is merged.

Thank you for the input throughout this pull request, @datapythonista and @simonjayhawkins!

@datapythonista
Copy link
Member

If you can merge master, CI should be green now.

@datapythonista
Copy link
Member

@simonjayhawkins do you want to have another look? this should be ready now

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @farhanreynaldo lgtm

@datapythonista
Copy link
Member

I think we should merge this. @farhanreynaldo looks like GitHub deployed some changes last night, were they consider (in the git history) that the person merging the PR is the author, and not you :(

See #32457 for the full discussion.

@farhanreynaldo
Copy link
Contributor Author

Actually I'm okay with that

@datapythonista datapythonista merged commit f6b3e82 into pandas-dev:master Mar 6, 2020
@datapythonista
Copy link
Member

Thanks for working on this @farhanreynaldo

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Co-authored-by: Farhan Reynaldo Hutabarat <[email protected]>
Co-authored-by: Simon Hawkins <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants