Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dealing with non-pure functions #17

Open
edublancas opened this issue Nov 23, 2021 · 3 comments
Open

Dealing with non-pure functions #17

edublancas opened this issue Nov 23, 2021 · 3 comments
Labels
enhancement New feature or request low priority

Comments

@edublancas
Copy link
Contributor

edublancas commented Nov 23, 2021

soorgeon detects outputs by looking at the left side of an assignment expression, for example, let's say the section of a notebook looks like this:

x = 1
y, z = compute_stuff()

In the snippet above, x, y, and z will be detected as outputs, since they are all on the left side of the assignment (=) operator.

However, non-pure functions mutate their inputs: Say we have the following:

import pandas as pd

def fn(df):
    df['new_column'] = 1

df = pd.DataFrame({'column': [0]})

# problematic line!
df2 = fn(df)

Before the problematic line df, only has one column (named column), but after the problematic line, df has two columns (column, and new_column); however we currently have no way of knowing that since df does not appear to the left of the assignment operator. This might cause the output of soorgeon refactor to be incorrect.

Unfortunately, there is an infinite way to mutate variables, in fact, any method might mutate them. example:

# this may or may not mutate the input
df.do_something()

I think for now we should detect instances of:

df[something] = stuff

and display a warning

@e1ha
Copy link
Contributor

e1ha commented Jun 28, 2022

Just to clarify but is the issue that since the change for df occurs within a function when the code calls the function, soorgeon cannot see that df has changed which means that df will not be marked as an output when it should? I'm assuming this only applies to function parameters, since for local variables, they wouldn't matter and global variables aren't supported. Also when it says "mutating its inputs" does that mean it applies to any input parameters that are mutable objects or just data frames? Thanks.

@edublancas
Copy link
Contributor Author

edublancas commented Jun 28, 2022

(I edited my original comment to make it clearer)

Just to clarify but is the issue that since the change for df occurs within a function when the code calls the function, soorgeon cannot see that df has changed which means that df will not be marked as an output when it should?

correct

Also when it says "mutating its inputs" does that mean it applies to any input parameters that are mutable objects or just data frames? Thanks.

it applies to any input parameters

let me sync with Ido, this is low priority so there might be other stuff that will be better use of your time :)

@edublancas edublancas added enhancement New feature or request low priority labels Jun 28, 2022
@idomic
Copy link
Contributor

idomic commented Jun 29, 2022

@e1ha let's focus on the Algolia story for now and more to be coming soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request low priority
Projects
None yet
Development

No branches or pull requests

3 participants