Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using run(standalone_mode=True) to fix packaged Kedro project getting a sys.exit #2682

Closed
Tracked by #3237
noklam opened this issue Jun 14, 2023 · 7 comments

Comments

@noklam
Copy link
Contributor

noklam commented Jun 14, 2023

The problem with the main() method is that it currently returns an exit code, so downstream processes can't do anything with it.

Clarify this one a bit. I think these are 2 problems

  1. main should return the session.run() - so downstream processing is possible
  2. click generate a sys.exit() by default, and since main wrap around the CLI so it creates some issue on Databricks or even just IPython - this link explains more.

Originally posted by @noklam in #1423 (comment)

Background

#1807 (comment)

Consider using run(standalone_mode=True to fix the issue

Why is it important?

Any packaged project is run as a click program, which generate a sys.exit that exit the program and causing integration with kedro pipeline impossible.

For example, you may want to have some normal Python code which consume the output of a Kedro pipeline. We shouldn't force users to use Kedro for everything and it should be easy to integrate with Kedro.

result = my_pipeline()
arbitary_python_code(result)

Workaround

The current workaround is mentioned in Packaged kedro pipeline does not work well on Databricks, which is the recommend way to run packaged project on Databricks now. This is no difference from running Kedro in project mode. The issue is quite similar but Databricks is a specific case and this is more general. We should document it clearly what are the 2 modes for Add documentation about what the bootstrap_project and configure_project methods do and what they're supposed to be used for

from kedro.framework.session import KedroSession
from kedro.framework.project import configure_project
package_name = <your_package_name>
configure_project(package_name)

with KedroSession.create(package_name) as session:
    session.run()

There is also a ticket discuss whether we should get rid of cli.py. In #1423, noted that the workaround above does not work if the project have a custom cli.py.

@noklam noklam changed the title Consider using run(standalone_mode=True to fix packaged Kedro project getting a sys.exit Consider using run(standalone_mode=True) to fix packaged Kedro project getting a sys.exit Jun 14, 2023
@astrojuanlu
Copy link
Member

I'm wondering if, rather than using standalone=True, we could make our Click functions thin wrappers over non-Click functions that can be used separately. Something like

@click.command("run")
def run_cli(args):
    run(args)

def run(args):
    ...

And we only declare the non-Click version as part of the public API so that people don't import the Click one:

__all__ = ["run"]

That way we decouple ourselves from Click a bit. I'm personally not very happy about how conversations get shut down over there pallets/click#2249 (comment)

@merelcht
Copy link
Member

This should be discussed in Tech Design through Packaged kedro pipeline does not work well on Databricks#1807 first before it's implemented.

@noklam
Copy link
Contributor Author

noklam commented Oct 25, 2023

I'm wondering if, rather than using standalone=True, we could make our Click functions thin wrappers over non-Click functions that can be used separately. Something like

It would be something very similar to KedroSession, in that case we need to also think about should we encourage user using KedroSession or just this new run function. Noted that user can define (or override) the cli run command. Using the CLI guarantee this will be identical (Not many people are doing it, but it's arguably breaking), but the alternative isn't.

@idanov
Copy link
Member

idanov commented Oct 26, 2023

The entrypoint should be a click one, because that allows the execution of your packaged app as run arguments, i.e. project-package --pipeline hello ~= kedro run --pipeline hello. The solution for this problem is rather simple, I've opened a PR here #3191 and would love to hear from @noklam if this will solve the problem.

@noklam
Copy link
Contributor Author

noklam commented Mar 4, 2024

https://www.reddit.com/r/Python/comments/7xdi5r/is_sysexit_bad_practice/ and https://click.palletsprojects.com/en/7.x/exceptions/

using standalone_modedoesn't seem to be a bad idea. If we decide to move forward with this, we should make sure kedro run is consistent to package.__main__.

For the record we say standalone=True everywhere, I think we should do the opposite. True is the default, and thus emitting sys.exit as a standalone application.

@astrojuanlu
Copy link
Member

astrojuanlu commented Mar 7, 2024

I've changed my mind on this issue - it always felt like a hack... but the problem is on click questionable decision of raise an exception for control flow, not our use of standalone_mode.

@merelcht
Copy link
Member

merelcht commented Aug 9, 2024

Closing this because it's addressed in #4026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants