Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KedroSession not working on packaged kedro project #1869

Closed
ajinkyachandulwar opened this issue Sep 23, 2022 · 10 comments
Closed

KedroSession not working on packaged kedro project #1869

ajinkyachandulwar opened this issue Sep 23, 2022 · 10 comments
Assignees
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@ajinkyachandulwar
Copy link

Description

KedroSession giving error ModuleNotFoundError: No module named 'None' on kedro version 0.18.2
And after using bootstrap_project() not able to find pipelines defined in another project.

Steps followed to use Kedrosession
https://kedro.readthedocs.io/en/0.18.2/kedro_project_setup/session.html

Context

I am trying to use the kedro pipelines defined in 1 of my kedro project (ProjectX). I am packaging ProjectX using kedro package and then install it using pip install.

Steps to Reproduce

  1. Create a new kedro project (ProjectX) using kedro new
  2. Add 1 pipeline (pipeline1) in it and make entry in pipeline_registry.py
  3. package this project using kedro package
  4. now install the generated wheel file
  5. create another kedro project (Project Y) and try to invoke pipeline defined in ProjectX using Kedrosession
from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from pathlib import Path

metadata = bootstrap_project(Path.cwd())

extra_params = dict(
      param1="value1",
      param2="value2",
  )

with KedroSession.create("projectx", extra_params=extra_params) as s:
        return s.run(pipeline_name="pipeline1")

Expected Result

Should execute pipeline1 and return the output dataframes.

Actual Result

Getting below error

354 │ │ try: │
│ 355 │ │ │ pipeline = pipelines[name] │
│ 356 │ │ except KeyError as exc: │
│ ❱ 357 │ │ │ raise ValueError( │
│ 358 │ │ │ │ f"Failed to find the pipeline named '{name}'. " │
│ 359 │ │ │ │ f"It needs to be generated and returned " │
│ 360 │ │ │ │ f"by the 'register_pipelines' function." │
│ │
│ ╭──────────────────────────────────────── locals ─────────────────────────────────────────╮ │
│ │ context = <kedro.framework.context.context.KedroContext object at 0x7f8f6b1d5d90> │ │
│ │ extra_params = {'param1': 'value1', 'param2': 'value2'} │ │
│ │ from_inputs = None │ │
│ │ from_nodes = None │ │
│ │ load_versions = None │ │
│ │ name = 'pipeline1' │ │
│ │ node_names = None │ │
│ │ pipeline_name = 'pipeline1' │ │
│ │ runner = None │ │
│ │ save_version = '2022-09-23T12.19.58.331Z' │ │
│ │ self = <kedro.framework.session.session.KedroSession object at 0x7f8f6b114b50> │ │
│ │ session_id = '2022-09-23T12.19.58.331Z' │ │
│ │ tags = None │ │
│ │ to_nodes = None │ │
│ │ to_outputs = None │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Failed to find the pipeline named 'pipeline1'. It needs to be generated and returned by
the 'register_pipelines' function.

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V):
    Name: kedro
    Version: 0.18.2
  • Python version used (python -V): Python 3.8.13
  • Operating system and version: MacOS Monterey (12.0.1)
@ajinkyachandulwar
Copy link
Author

ajinkyachandulwar commented Sep 23, 2022

also followed the steps mentioned at #1583
Still the issue is same

@noklam
Copy link
Contributor

noklam commented Sep 23, 2022

also followed the steps mentioned at #1583
Still the issue is same

@ajinkyachandulwar #1583 (comment)

Have you tried this already? If so what error did u get?

@ajinkyachandulwar
Copy link
Author

@noklam i followed the same steps mentioned in
https://kedro.readthedocs.io/en/stable/tutorial/package_a_project.html#package-your-project
for packaging my kedro project, but While using kedrosession i am getting the above mentioned error.
Please note this was working fine on kedro v0.17.7

@noklam
Copy link
Contributor

noklam commented Sep 24, 2022

@ajinkyachandulwar

For packaged mode you will simply call the package.

from kedro_spaceflights.__main__ import main

main(
    ["--pipeline", "__default__"]
)  # or simply main() if you don't want to provide any arguments

@ajinkyachandulwar
Copy link
Author

But with this option i will not be getting the output of my pipeline as dataframe (as i am not using any catalog.yml).

Also in kedro 0.17.7 kedrosession was working fine for me so why its not working in 0.18.2 ?
Is kedrosession not meant to use outside kedro project ?

@noklam
Copy link
Contributor

noklam commented Sep 24, 2022

The workaround to use KedroSession with package mode can be found here. #1807 (comment)

In short, there is "package mode" and "project mode". KedroSession is not meant to be used outside of Kedro Project as of this moment. See #1583 (comment) for more information.

However, I like the idea that KedroSession can be used for package mode, this is exactly what I proposedhere. However, we haven't discussed this within the team, so it's just my opinion. Thanks for raising this, I think this supports my idea.

I have made this issue and "Technical Design" ticket and hopefully we have some time to discuss this next week.

@noklam noklam added Issue: Bug Report 🐞 Bug that needs to be fixed Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation labels Sep 24, 2022
@noklam
Copy link
Contributor

noklam commented Sep 24, 2022

Background for discussion - full background in #1423
The main related bit are:

  • Kedro run has 2 modes, package and project mode
    • There are many entry points - is it necessary? Improve kedro run as a package #1423
      • main, kedro run (CLI), KedroSession.run - Why do we need another Python API if session.run? Can we have just 1 interface - KedroSessions, and then the CLI is just a very thin wrapper of it?

My questions are, are they really different? Do we actually need two modes? So I purposed here Why not just let KedroSession works with package mode?

  1. Do we intend to support running the package with KedroSession prior 0.18.x? Is this a regression? This behavior change is briefly explained in Investigate KedroSession.create: ModuleNotFoundError: No module named 'None' #1583. We used to have configure_project when KedroSession is created but then it's removed afterward. Investigate KedroSession.create: ModuleNotFoundError: No module named 'None' #1583 (comment)
  2. KedroSession currently do nothing about the package_name other than saving it. Why do we even need it if it's not doing anything? In my view it's should be used, but maybe there is historical reason that I don't fully understand yet. The full details are in Improve kedro run as a package #1423 where @AntonyMilneQB also have some opinion about this.
  • One thing that I like about KedroSession is we can immediately do kedro run --package_name=xxx

There are more to discuss, but these are the main issues directly related to this.

  • Currently we don't have a better way to support "package mode" to return any output, possible solutions are
    • KedroSession.run can be used to run the package as well - @AntonyMilneQB thinks that python -m package" is better, and it actually supports all kedro run` arguments too

I think it's better to clear this up from the design perspective, as we don't have a good way to solve this, and KedroSession can work now with a slightly less elegant solution together with configure_project. I don't want to advise too many to use this workaround if we later decide to change. Currently, we also use KedroSession as a workaround for packaged kedro project for Kedro pipeline does not work well on Databricks

This one is slightly off-topic, but why params goes into session as a context manager and isn't part of the session.run arguments?
The inconsistency is also a direct cause of these questions How do I update parameters from the kedro notebook?.

Edited on 28th Sep

@noklam noklam self-assigned this Sep 24, 2022
@ajinkyachandulwar
Copy link
Author

Thanks @noklam for creating bug for the requirement. As i mentioned my project pipelined produce outputs as dataframes which are required for further processing. For this change i can see #1423 this pull request is in review. Any idea when this feature it will be available for use to end users ?

@noklam
Copy link
Contributor

noklam commented Sep 27, 2022

@ajinkyachandulwar Would this work for you for now? I think it will still take some time before #1423 is merged.

The workaround to use KedroSession with package mode can be found here. #1807 (comment)

Can you try this snippet? configure_project should be used instead of bootstrap_project

from kedro.framework.session import KedroSession
from kedro.framework.project import configure_project
package_name = <your_package_name>
configure_project(package_name)

with KedroSession.create(package_name) as session:
    session.run()

@merelcht
Copy link
Member

merelcht commented Nov 8, 2022

Hi @ajinkyachandulwar , I'll close this issue for now, but feel free to re-open if you still need help solving it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Bug Report 🐞 Bug that needs to be fixed Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
None yet
Development

No branches or pull requests

3 participants