Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial support for plugins #12985

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

facutuesca
Copy link

@facutuesca facutuesca commented Oct 1, 2024

This is a PR with an initial implementation of the plugin support discussed in #12766.

High-level description

Plugins will be detected and loaded via registered entrypoints. For example, a Python package that contains the following in pyproject.toml:

[project.entry-points."pip.plugins"]
example-distinspector = "pip_plugin_example"

will register the example-distinspector plugin by providing the pip_plugin_example module object. This module object will be loaded by pip and its methods will be called at different places while pip runs.

The methods that will be called (and the places where they will be called from) depend on the plugin type. For now, this implementation only supports a single plugin type: dist-inspector, which should provide three functions:

def plugin_type() -> str:
    return "dist-inspector"

def pre_download(url: str, filename: str, digest: str) -> None:
    # contract: `pre_download` raises `ValueError` to terminate
    # the operation that intends to download `filename` from `url`
    # with hash `digest`
    pass

def pre_extract(dist: Path) -> None:
    # contract: `pre_extract` raises `ValueError` to terminate
    # the operation that intends to unarchive `dist`
    pass

These functions (provided by the plugin in module pip_plugin_example) will be called by pip right before downloading or extracting a distribution file.

A repository with an example plugin package is here.

Implementation details

  • I focused on putting most of the plugin logic in new, separate files (models/plugin.py and utils/plugins.py). The only change to existing files is adding a function call in the specific places the plugin should run (before download and before extraction).
  • Plugin loading is conservative: if any of the expected functions is missing from the loaded module, we log a warning and skip using the plugin.
  • Since plugins should only raise ValueError, in case of other exception types I put a defensive check that logs a warning and converts the exception to a ValueError. This means that plugins that don't follow the contract will still work (with a warning). If we want to avoid that and just abort when a plugin misbehaves, this check needs to be changed.

Open questions

  • Are the places where we call the hooks (pre-download and pre-extract) correct?
  • Would loading the plugins on-demand (during the first call to a plugin hook) be better than loading them (always) during startup?

TODO

  • Once discussion is done, add tests and docs

cc @woodruffw

from pathlib import Path
from typing import Iterator, List

from pip._vendor.pygments.plugin import iter_entry_points
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't be using an undocumented helper function from pygments here. It could disappear without warning. If we need the functionality we should implement our own copy. Although isn't this available in importlib.metadata anyway?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point! I copied the logic to this file. The reason why we need it is because the API for EntryPoints changed in Python 3.10, so the way to get entry points matching a specific group name depends on the Python version:

groups = entry_points()
if hasattr(groups, "select"):
    # New interface in Python 3.10 and newer versions of the
    # importlib_metadata backport.
    return groups.select(group=group_name)
else:
    assert hasattr(groups, "get")
    # Older interface, deprecated in Python 3.10 and recent
    # importlib_metadata, but we need it in Python 3.8 and 3.9.
    return groups.get(group_name, [])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm so sick of the instability of the importlib.{metadata, resources} APIs, but we have to work with what we have, I guess :-(

Thanks for doing this.

continue
plugin = plugin_from_module(entrypoint.name, module)
if plugin is not None:
_loaded_plugins.append(plugin)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a great fan of running complex code at import time. Couldn't we have a load_plugins() function that gets called at a suitable point in pip's initialisation process? Or better still, load plugins on demand, because we don't want to impact performance loading plugins we don't need.

Copy link
Author

@facutuesca facutuesca Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I moved it into a load_plugins and called it right before command.main(cmd_args).

For loading plugins on demand, maybe we can change utils.plugins.plugin_pre_download_hook and utils.pugins.plugin_pre_extract_hook to check if plugins have been loaded, and load them if not.
Then the plugins should only be loaded when the first hook is called (and never if the code path does not include any hooks).
WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll do for now, but let's keep on-demand loading as an option, and do some proper testing at some point. I'm very aware that uv has demonstrated that performance is an important characteristic in an installer, and I want to make sure we avoid making pip slower and slower as we add features.

@pfmoore
Copy link
Member

pfmoore commented Oct 1, 2024

I'm still not clear on the status of the dist-inspector plugin type. Is it intended as an actual, supported plugin type for pip, and more to the point, is it intended to be the only supported plugin type (until someone submits PRs providing other plugin types)? What evidence do we have that this is plugin type is important? And where are the use cases (issues, feature requests) providing evidence that people need this plugin type and will use it?

I'm still trying to make the point that we shouldn't be adding a plugin mechanism in isolation. Rather, we should have a real world use case that justifies being implemented as a plugin, and we should create a plugin mechanism in support of implementing a solution for that use case. Specifically, with a concrete use case, we can decide whether it warrants something as general as a plugin approach, or if a builtin feature specific to that use case would be a better solution.

At the moment, this still feels like a solution looking for a problem.

@facutuesca
Copy link
Author

facutuesca commented Oct 1, 2024

I'm still not clear on the status of the dist-inspector plugin type. Is it intended as an actual, supported plugin type for pip?
....
Rather, we should have a real world use case that justifies being implemented as a plugin

Yes, it's intended as an actual plugin type for pip. One of the motivations behind the dist-inspector plugin type (that I'm aware of) is enabling verification of PEP 740 attestations during package installation. Since verifying these attestations requires sigstore-python, doing it inside pip is not possible (sigstore-python depends on pyca/cryptography, which AFAIK can't be vendored in).

Instead, we could use a plugin (of type dist-inspector) with a pre_download hook that downloads the provenance file for the artifact about to be downloaded, and verifies that provenance against the artifact's digest. If verification fails, we can interrupt the installation process before the user even downloads the artifact.

(edit: Support for downloading the PEP-740 attestations from PyPI is almost complete. Once it's finished, I'll create a repo with a plugin that does the above)

is it intended to be the only supported plugin type (until someone submits PRs providing other plugin types)?

Unless there is a strong need for other types, I think starting with just one is a good idea to keep things simple

@pfmoore
Copy link
Member

pfmoore commented Oct 1, 2024

One of the motivations behind the dist-inspector plugin type (that I'm aware of) is enabling verification of PEP 740 attestations during package installation.

Thank you. That had been mentioned before, but I'd forgotten. The sigstore dependency is a good example of why a plugin is needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants