Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate KedroSession.create: ModuleNotFoundError: No module named 'None' #1583

Closed
PetervanHeck opened this issue Jun 1, 2022 · 18 comments · Fixed by #1713
Closed

Investigate KedroSession.create: ModuleNotFoundError: No module named 'None' #1583

PetervanHeck opened this issue Jun 1, 2022 · 18 comments · Fixed by #1713
Assignees

Comments

@PetervanHeck
Copy link

PetervanHeck commented Jun 1, 2022

Description

Following the documentation on creating and running a session causes an error:
https://kedro.readthedocs.io/en/0.18.1/kedro.framework.session.session.KedroSession.html

When creating a KedroSession, the validate_settings() appears not to get the right 'PACKAGE_NAME', and thus fails.

However, following the following page does work:
https://kedro.readthedocs.io/en/stable/kedro_project_setup/session.html

Context

Starting a Kedro session from within Python.

Steps to Reproduce

from kedro.framework.session import KedroSession
with KedroSession.create('<package_name>') as session: session.run()

Expected Result

Running the kedro project similarly to 'kedro run'

Actual Result

>>> from kedro.framework.session import KedroSession
>>> session = KedroSession.create('my_package_name')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "C:\Users\my_user\Environments\my_env\lib\site-packages\kedro\framework\session\session.py", line 138, in create
    validate_settings()
  File "C:\Users\my_user\Environments\my_env\lib\site-packages\kedro\framework\project\__init__.py", line 223, in validate_settings
    importlib.import_module(f"{PACKAGE_NAME}.settings")
  File "C:\Users\my_user\AppData\Local\Programs\Python\Python37\lib\importlib\__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'None'

Your Environment

Include as many relevant details about the environment in which you experienced the bug:

  • Kedro version used (pip show kedro or kedro -V): 0.18.0 and 0.18.1
  • Python version used (python -V): 3.7.9
  • Operating system and version: Microsoft Windows 10 Enterprise, Version 10.0.19042 Build 19042
@datajoely
Copy link
Contributor

Can you post your settings.py it looks like fails on the validate settings part?

@noklam
Copy link
Contributor

noklam commented Jun 1, 2022

Did you have your "package" actually installed? Does import package_name also gives you an error?

from kedro.framework.session import KedroSession
from kedro.framework.startup import bootstrap_project
from pathlib import Path

metadata = bootstrap_project(Path.cwd()) # Notice this is not used at all
with KedroSession.create('<package_name>') as session: session.run()

@PetervanHeck
Copy link
Author

with KedroSession.create('<package_name>')

This runs just fine, but having metadata.package_name being the same as the '<package_name>'

@PetervanHeck
Copy link
Author

Can you post your settings.py it looks like fails on the validate settings part?

Given that the alternative posted works, i doubt the settings.py is of interest. Besides, the file is as is after creating a file with 'kedro new'.

@noklam
Copy link
Contributor

noklam commented Jun 1, 2022

@PetervanHeck As you said, the package name is exactly the same. I think the problem here is you didn't have your package installed so when you do with KedroSession.create('<package_name>'), it fails to find the package.

The reason why 2nd approach work is bootstrap_project add your project path to Python so it can be imported.

Approach 1 should work as long as you did pip install -e ..

@PetervanHeck
Copy link
Author

Why would i need to 'install' my Kedro package in order to run it as a session? In the previous versions this work and thus i used it in for example my testing to validate input and output still matched. Is there another better method to run the Kedro pipeline?

@noklam
Copy link
Contributor

noklam commented Jun 1, 2022

@PetervanHeck Ah, so you are suggesting it works in 0.17.x and only breaks in 0.18.x?

If that's the case I can look into it. Would be great if you can provide which version of Kedro works and are you running this from the project directory?

@PetervanHeck
Copy link
Author

That's indeed the case. I've got it running in for example 0.17.4, 0.17.5 and 0.17.7.
And it indeed is a file in the project root directory stating:
with KedroSession.create('src.<package_name>') as session: session.run()
(This 'src.' in front of it, i've tried and doesn't solve the problem)

@noklam
Copy link
Contributor

noklam commented Jun 6, 2022

@PetervanHeck Thanks. From my testing with 0.17.7 and 0.18.1, it will both throw errors. But I don't fully understand why they give different errors yet. It would be great if you can give me an example that run successfully, a fake project is fine.

(Noted my working directory is called ms here.)

0.17.7

from kedro.framework.session import KedroSession
with KedroSession.create('ms') as session: session.run()

ModuleNotFoundError: No module named 'ms'

0.18.1

from kedro.framework.session import KedroSession
with KedroSession.create('ms') as session: session.run()

ModuleNotFoundError: No module named 'None'

I'll update the issue description a little bit.

@noklam
Copy link
Contributor

noklam commented Jun 6, 2022

This is just a note for internal use, please ignore this comment

For backlog grooming:

  1. Confirm if this is a regression or not
  2. If not, do we understand why it behaves differently?
  3. Update the doc so we always start with bootstrap_project if this is preferred.

@PetervanHeck
Copy link
Author

Dear Noklam,

Sorry for the late reply, i completely missed the request for the examples. I think the first error comes due to something i forgot to mention: adding 'src' to the path. Enclosed two projects, one for 17.7, one for 18.1 which replicate my problem.

For both projects I did the same:

  • create a new virtual environment
  • pip install kedro==0.17.7 (or 0.18.)
  • kedro new, naming the three items 'testrepo', 'testpackage' and 'testproject'
  • change the pipeline_registery.py to: return {"default": pipeline([node(lambda x: print(x), "params:val", None)])}
  • add 'val: 5' to conf/base/parameters.yml
  • add main.py (in the folder testrepo)
  • ran main.py

kedrotest.zip

@antonymilne antonymilne moved this to To Do in Kedro Framework Jun 27, 2022
@antonymilne
Copy link
Contributor

To do:

  • see if we're missing calls to bootstrap_project from wherever it's required in the docs
  • see if the error message No module named 'None' is masking some other more important regression (why is it None now when it wasn't before?)

@merelcht merelcht changed the title KedroSession.create: ModuleNotFoundError: No module named 'None' Investigate KedroSession.create: ModuleNotFoundError: No module named 'None' Jul 18, 2022
@noklam noklam moved this from To Do to In Progress in Kedro Framework Jul 18, 2022
@noklam noklam self-assigned this Jul 18, 2022
@noklam
Copy link
Contributor

noklam commented Jul 18, 2022

Development Note:
For the change of No module named 'None', this is because this code block was removed in Remove code for backwards compatibility.

if package_name is not None:
    configure_project(package_name)

Since the global PACKAGE_NAME is only set via configure_project, and that's why we get None in the error message.

During the investigation, I looked into how configure_project was used, I found that we should also remove it since bootstrap_project should already configure_project.

CC @AntonyMilneQB Does bootstrap behave pretty much like configure_project except adding the src path into sys.path?

metadata = bootstrap_project(default_project_path)
_remove_cached_modules(metadata.package_name)
configure_project(metadata.package_name)
session = KedroSession.create(
metadata.package_name, default_project_path, env=env, extra_params=extra_params
)

see if the error message No module named 'None' is masking some other more important regression (why is it None now when it wasn't before?)

  • None thing doesn't look like a regression to me, we could probably give a better error message when PACKAGE_NAME = None

see if we're missing calls to bootstrap_project from wherever it's required in the docs

  • I think the right thing to do is just make sure our doc use bootstrap_project whenever user trying to create session object in script instead of using kedro run.
    We may add some API doc saying that if this is not running in CLI mode the bootstrap_project should be called. I checked other documentations and they all mention bootstrap_project correctly

@antonymilne
Copy link
Contributor

antonymilne commented Jul 21, 2022

@noklam Nice investigating!

Great question about the difference between bootstrap_project and configure_project. This isn't at all obvious and something I always forget...

  • bootstrap_project is used when running in "project mode". i.e. you are in your project root directory and running kedro ... commands. pyproject.toml is in the same directory, and hence metadata is available.
  • configure_project is used when running in "packaged mode". i.e. you have done kedro package and are running your project as a Python package. The crucial difference is that pyproject.toml is not present in the Python package as it just gives build-time information used during packaging, and hence there is no metadata available.

This should become a bit clearer when I finally get around to continuing work on #1423.

@datajoely
Copy link
Contributor

Suggestion - change these to be more explicit?

bootstrap_project -> bootstrap_project this already works
configure_project -> bootstrap_packaged_project

wdyt?

@NeroOkwa NeroOkwa moved this from In Progress to In Review in Kedro Framework Jul 21, 2022
@antonymilne
Copy link
Contributor

Would definitely be good to have clearer names in principle, but the configure nomenclature is used a lot in the project __init__.py file, and this would be a big breaking change. I wouldn't be surprised if some of this stuff changes a bit anyway before 0.19 is released, so I think there's not much point renaming now.

@noklam noklam linked a pull request Jul 21, 2022 that will close this issue
8 tasks
Repository owner moved this from In Review to Done in Kedro Framework Jul 22, 2022
@ajinkyachandulwar
Copy link

ajinkyachandulwar commented Sep 22, 2022

@noklam @AntonyMilneQB
The solution that you have given works fine if i am using kedro pipelines from inside kedro project. But what if i package my project using kedro package and install the wheel file.

As after installing wheel file i don't get pyproject.toml file in the installed distribution and i get below error
RuntimeError: Could not find the project configuration file ‘pyproject.toml’ in cwd

Also my cwd(current working directory) will be the project were i am importing installed wheel (built using kedro package)

@noklam
Copy link
Contributor

noklam commented Sep 22, 2022

@ajinkyachandulwar

Is this link helpful? If you still have issue, feel free to ask in our Discord channel or open a new issue.
https://kedro.readthedocs.io/en/stable/tutorial/package_a_project.html#package-your-project

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants