Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

colocate __main__.py with the central directory record of the zipapp #2209

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Aug 4, 2023

When the Python interpreter executes a zipapp, it will import __main__.py, while __pex__/__init__.py will be loaded if the pex file is on the PYTHONPATH. This file is loaded from the zip file handle with zipimport, and the PEX bootstrap script at that location will then import some of the code in .bootstrap/ before the zipapp is unpacked and compiled. Because a zip file's central directory records are located at the end of the file, placing this code at the end of the zip file list will reduce the amount of seeking the Python interpreter needs to perform in order to execute the bootstrap script. This was previously discussed in #2175 (comment) and #2175 (comment).

Changes involved:

  1. Make Chroot#zip() respect the relative ordering of labels when collecting from self.filesets.
  2. Put "main" and "importhook" labels at the end of the zipapp.

@cosmicexplorer cosmicexplorer force-pushed the move-main-to-end-of-sources branch 2 times, most recently from 5273031 to 9f19ad7 Compare August 5, 2023 00:44
@cosmicexplorer cosmicexplorer force-pushed the move-main-to-end-of-sources branch 3 times, most recently from 5c617ff to a89cc18 Compare August 5, 2023 02:24
dest = os.path.join(dirname, f)
safe_mkdir(os.path.dirname(dest))
safe_copy(os.path.realpath(os.path.join(self._chroot.chroot, f)), dest)
with TRACER.timed("copying over uncached sources", V=9):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this seems fine - but boy oh boy does it seem incredibly complicated vs just ordering the labels correctly in the existing tuple on the LHS and writing a good block comment just above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, and this change could be reduced quite a bit by providing the labels literally (for example, we could avoid changing this method at all). I thought it was useful to document the meanings of each label, since they're not really documented elsewhere unless you grep for the literal strings "executable", "importhook", etc. I was also thinking that naming each section with a classmethod would make it easier for someone to subclass PEXBuilder and insert their own files into the appropriate section while retaining the performance benefit for zipapps. If that's not a supported use case, we could very well reduce the size of this change by a lot.

@jsirois
Copy link
Member

jsirois commented Aug 8, 2023

@cosmicexplorer I'm a huge non-fan of Problem / Solution formulas for writing commit messages. We're adults and I think you can just write a good message. Quoting people - not helpful. Just explaining the change and its reason should stand on its own.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Aug 9, 2023

I'm a huge non-fan of Problem / Solution formulas for writing commit messages.

Ok, I will keep that in mind for the future.

Quoting people - not helpful. Just explaining the change and its reason should stand on its own.

In this case it's not actually documented anywhere else that the python interpreter accesses __main__.py or __pex__/__init__.py from the same zip file handle (which I confirmed by looking through the CPython source), so I thought the quote was precisely the rationale needed for this change. However, given this feedback, I have rewritten the OP to be much more concise while simply linking to instead of directly quoting the prior discussions. Is that closer to what you're looking for?

@jsirois
Copy link
Member

jsirois commented Aug 9, 2023

Thanks @cosmicexplorer, yes. The facts remain the same regardless of who pointed them out. Much more sane to read in the git log.

FWIW:

When the Python interpreter executes a zipapp, it will import main.py or pex/init.py

Is not true. When the Python interpreter executes a zipapp (or a directory) it only looks for __main__.py. The __pex__ package must be imported by the user, Python knows nothing about that at all.

@cosmicexplorer
Copy link
Contributor Author

You're right; I have fixed the wording to clarify that __pex__/__init__.py is loaded if the pex file is on the PYTHONPATH.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants