Multiprocess slow start for ops #7338
-
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
We have found that solids don't seem to have this, thinking it could be a subprocess thing? |
Beta Was this translation helpful? Give feedback.
-
Yeah, with We do have a semi-complicated setup where we build a list of jobs dynamically from YAML, so if I had to guess this would be the reason for the delay -- does dagster bootstrap the repository list each time? |
Beta Was this translation helpful? Give feedback.
-
One tool to avoid repeating the cost is to use the context: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods |
Beta Was this translation helpful? Give feedback.
-
Tried this myself and I think I maybe see a small increase using forkserver and a list of preloaded modules, but it's still taking approx 8 seconds to start any op ( |
Beta Was this translation helpful? Give feedback.
One tool to avoid repeating the cost is to use the
start_method
forkserver
which is available as config on the multiprocess executor. The first op will still pay the init cost to create the template process, but each subsequent op should be faster as it forks that template process instead of starting from scratch. You may need to explicitly setpreload_modules
in the config if the default behavior doesn't load the necessary modules.context: https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods