Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

extension installer using threadpoolexecutor #16548

Draft
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

wkpark
Copy link
Contributor

@wkpark wkpark commented Oct 12, 2024

Description

  • parallelize run_extension_installer() using ThreadPoolExecutor()

Notes:

  • webui have args.skip_prepare_environment option. so, install.py should have independent behavior of its execution order.
  • max_workers=1 to use normal ordered install.

Checklist:

@w-e-w
Copy link
Collaborator

w-e-w commented Oct 19, 2024

feels like a bad idea
I believe issues would occur if two extensions are trying to install or modify the same files

@wkpark
Copy link
Contributor Author

wkpark commented Oct 19, 2024

thank you for your comment

there could be such race conditions e.g.)

  • download the same package at the same time
  • install the same package at the same time.

so I made a simple test program:

from concurrent.futures import ThreadPoolExecutor, as_completed
import subprocess


def run_extension_installer(r):
    try:
        ret = subprocess.run(["pip", "install", "-r", r], shell=False,)
        if ret.returncode != 0:
            if ret.stdout:
                print(ret.stdout)
            if ret.stderr:
                print(ret.stderr)
    except Exception as e:
        print("Error:", e)

with ThreadPoolExecutor(max_workers=2) as executor:
    futures = [executor.submit(run_extension_installer, r) for r in ["requirements.txt", "requirements2.txt"]]

even If requrirements.txt and requirements2.txt have same package list, pip install.. started at the same time and

  • download same packcage at the same time - it is no harm. (but it is wasteful)
  • try to install package at the same time - at this step, one pip install started norally but other pip install exit with OSError and it means pip have locking mechanism to prevent overwrite the same file.

so, there could be some pip install... fails.

but, if requirements.txt and requirements2.txt have different packages it will run normally.

so, I choose max_workers=2 to minimize possible race conditions.

Copy link
Collaborator

@w-e-w w-e-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just woke up so my writing here is a bit scatterbrained

still think it's mostly asking for trouble with little return

from my understanding pretty much the only reason that one would wish to parallelize install.py is to speed up load time
base on my system one install.py typically takes around 0.2 seconds if it's just checking if the requirements are met
so at best with 2 works you cut the time in half

note if an installer of extension is taking longer than this it's likely means that it's not optimized

while install.py is mostly used for pip, (and assuming that can handle this situation perfectly without issues which I still have my doubts), some extension installers does perform other actions, it is not possible to anticipate what they might do as there's no restrictions, it is totally possible that two extensions might decide to read or write to the same file using other methods that pip

also consider another case
I have two extensions specify two different version of a package, the later installer would basically override what has been done by the earlier installers
when this happened you would have specific packages to be re-install on every launch,
this is not ideal but webui might still work
and more importantly the behavior is consistent as the install the execution order is consistent

but if you parallelize them then there's no guarantee on which will be executed first, and so and so which package gets installed is basically leave to random chance

when there is a package conflict the most likely is it conflict of the packaged "tree" (not just one package), my guess is
if two installers is executed at the same time, the best case is one completely overrides the other, but which one has priority is up to a random chance, the worst case I can think of is that, they partially override each other, and this result in a higher chance of things breaking

@w-e-w w-e-w marked this pull request as draft October 20, 2024 00:23
@wkpark
Copy link
Contributor Author

wkpark commented Oct 20, 2024

I just woke up so my writing here is a bit scatterbrained

still think it's mostly asking for trouble with little return

from my understanding pretty much the only reason that one would wish to parallelize install.py is to speed up load time base on my system one install.py typically takes around 0.2 seconds if it's just checking if the requirements are met so at best with 2 works you cut the time in half

note if an installer of extension is taking longer than this it's likely means that it's not optimized

while install.py is mostly used for pip, (and assuming that can handle this situation perfectly without issues which I still have my doubts), some extension installers does perform other actions, it is not possible to anticipate what they might do as there's no restrictions, it is totally possible that two extensions might decide to read or write to the same file using other methods that pip

also consider another case I have two extensions specify two different version of a package, the later installer would basically override what has been done by the earlier installers when this happened you would have specific packages to be re-install on every launch, this is not ideal but webui might still work and more importantly the behavior is consistent as the install the execution order is consistent

yes. right but that is extension's fault with or without this PR I think.
as you mentioned install.py and scripts/*.py have same issue. to reduce/minimize these problem we have to minimize loading time executed code.

but if you parallelize them then there's no guarantee on which will be executed first, and so and so which package gets installed is basically leave to random chance

when there is a package conflict the most likely is it conflict of the packaged "tree" (not just one package), my guess is if two installers is executed at the same time, the best case is one completely overrides the other, but which one has priority is up to a random chance, the worst case I can think of is that, they partially override each other, and this result in a higher chance of things breaking

package dependency problem frequently occured in normal use case, anyway.


Let's be honest, I was surprised to see multiple pip instances working in the same venv environment at the same time. Normally yum/rpm/apt/etc don't work at the same time.
especially pip takes quite a long of time to download packages, and I think it would speed up the installation a lot if we could scan all the extensions' requirements.txt and run pip download separately in parallel. (see also pypa/pip#12923)

Thanks for the long answer. It gives me a lot of things to think about even if PR not accepted.

@w-e-w
Copy link
Collaborator

w-e-w commented Oct 20, 2024

I recall someone mentioning in Discord that we should revamp how extension installs dependencies in the first place

basically allowing extension to state their requirements and let web UI installs it dependencies as opposed to each extension installing them individually

but for obvious reasons that would require lots of work and as far as I am aware no one has put in any effort on making that happen

and even if someone could manage to create such a system what about old extensions all those have to be updated
so it's work plus more work plus more work for pretty much everyone involved


this and the other PR I don't see this being a thing
it's just too much complication for too less of a reward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants