batching dep inference #17477
Replies: 8 comments 36 replies
-
In Dep inference is more nuanced - dependencies can be requested all over the rule graph, and the existing data structures and rules around them are very intrinsically single-target-oriented. In fact, dep inference is iterative: we get deps, then those deps deps, and so on. So there isn't, generally speaking, one place where you can say up front "here are all the files that will need dependency parsing". However, in practice, there is one place where most heavyweight dep inference is triggered, and that is when iteratively computing TransitiveTargets. So, we could treat each iteration on the queued direct deps as a batch. However those batches may be a lot smaller than we'd like. An alternative is, during transitive target computation, to pre-parse everything in the cmd-line specs (since But however we do this, we may want to cache the results for each file separately, i.e., split the batch results, so that small edits don't trigger large re-parses of many files. Maybe at first only in memory, but at some point, likely, also in lmdb_store, i.e., as synthetic "split" processes. Unlike other cases, we completely control the process, so we can be highly confident that it is safe to split the output and cache as-if we ran each file in a separate process. This would require some engine support though. Anyway, thoughts and ideas welcome. |
Beta Was this translation helpful? Give feedback.
-
PS Running the process on a single file takes ~100ms on my m1 |
Beta Was this translation helpful? Give feedback.
-
Somewhat unrelated, but from a dependency rules perspective it would be interesting if we could build a generic core API for dep inference that the various backends adopt in order to have one central piece for applying dependency rules to. It's no small feat to pull off, but I think that it may be on a similar order of complexity as implementing batching and could perhaps be worth considering while doing so. |
Beta Was this translation helpful? Give feedback.
-
I have this 95% complete in a stale branch. |
Beta Was this translation helpful? Give feedback.
-
(as a fun aside) the parser is token/ast based. And there are Rust-implementer Python parsers. It would be a fun little speedup to port the dep parser to Rust 😌 And by porting to rust, I mean standalone (so not embedded in the engine) |
Beta Was this translation helpful? Give feedback.
-
That's a great start, but it may not be going far enough. We could profitably parse hundreds of files in a single process. What I'm thinking of is - when generating transitive deps - to preemptively seed the cache with the parse results for all files mentioned in the cmd-line specs. We know we will definitely need them, and is likely that they will cover a substantial portion of the transitive deps as well (in the common case of |
Beta Was this translation helpful? Give feedback.
-
I think that it's important to differentiate the use cases a little bit. For completely cold builds, with no cache: yes, batching would help. But we would always recommend against completely cold builds: users should be using caches (either local or remote). For completely warm builds, the time to access the cache per file ends up dominating the runtime (i.e.: doing 1557 cache lookups). Assuming that you actually implemented split cache lookups for your batch processes, that would still be the dominating factor. If you didn't implement split cache lookups, you would make fewer cache lookups for larger batches, but you would be much more likely to miss the cache and actually need to run inference. While microbenchmarks are useful, I think that to motivate this change, seeing profiles of the use cases that would be affected would be good. Because we have a few different known bottlenecks, and I suspect that this is not the longest pole. |
Beta Was this translation helpful? Give feedback.
-
OK @benjyw https://github.com/thejcannon/pants/tree/batchinfer now has the PoC changes:
|
Beta Was this translation helpful? Give feedback.
-
I did a little bit of manual benchmarking of the python dependency parser process. I took that script, modified it to accept multiple input files, and ran it on all 1227 .py files under src/python/pants, with various size batches.
As expected, almost the entire runtime consists of process overhead (all numbers measured on an m1 laptop):
Note that I ran this experiment outside of Pants, using
find
andxargs
, so this doesn't include sandbox setup time. Also, the batches ran sequentially, so the wall time of the current one-file-per-process strategy in Pants is faster than that 99 seconds, thanks to parallel execution. But it's still much slower and more CPU-expensive than it needs to be.E.g., we know we have users with larger repos for whom full-repo dep inference (e.g., in a call to
./pants peek
) takes several minutes.So it seems reasonably clear that batching the dependency parsing is a big perf win.
(I'm referring to Python here, it seems likely that similar benefits would obtain for JVM at least).
This discussion is to, er, discuss some options for doing so.
Beta Was this translation helpful? Give feedback.
All reactions