batching dep inference #17477

benjyw · 2022-11-06T04:20:11Z

benjyw
Nov 6, 2022
Maintainer Sponsor

I did a little bit of manual benchmarking of the python dependency parser process. I took that script, modified it to accept multiple input files, and ran it on all 1227 .py files under src/python/pants, with various size batches.

As expected, almost the entire runtime consists of process overhead (all numbers measured on an m1 laptop):

Batch Size	Seconds
1	99
2	50
5	22
10	12
25	6.25
50	4.25
100	3
250	2.5
500	2.3
1000	2.2

Note that I ran this experiment outside of Pants, using find and xargs, so this doesn't include sandbox setup time. Also, the batches ran sequentially, so the wall time of the current one-file-per-process strategy in Pants is faster than that 99 seconds, thanks to parallel execution. But it's still much slower and more CPU-expensive than it needs to be.

E.g., we know we have users with larger repos for whom full-repo dep inference (e.g., in a call to ./pants peek) takes several minutes.

So it seems reasonably clear that batching the dependency parsing is a big perf win.

(I'm referring to Python here, it seems likely that similar benefits would obtain for JVM at least).

This discussion is to, er, discuss some options for doing so.

benjyw · 2022-11-06T04:47:15Z

benjyw
Nov 6, 2022
Maintainer Author Sponsor

In test and lint the goal itself does high-level batching, and it's relatively straightforward, conceptually, to do so because the high-level structure of those goals is "find all the targets to act on, then act on them", so there is an obvious place to insert batching up front.

Dep inference is more nuanced - dependencies can be requested all over the rule graph, and the existing data structures and rules around them are very intrinsically single-target-oriented. In fact, dep inference is iterative: we get deps, then those deps deps, and so on. So there isn't, generally speaking, one place where you can say up front "here are all the files that will need dependency parsing".

However, in practice, there is one place where most heavyweight dep inference is triggered, and that is when iteratively computing TransitiveTargets.

So, we could treat each iteration on the queued direct deps as a batch. However those batches may be a lot smaller than we'd like.

An alternative is, during transitive target computation, to pre-parse everything in the cmd-line specs (since ::/path/to/:: is such a common case).

But however we do this, we may want to cache the results for each file separately, i.e., split the batch results, so that small edits don't trigger large re-parses of many files. Maybe at first only in memory, but at some point, likely, also in lmdb_store, i.e., as synthetic "split" processes. Unlike other cases, we completely control the process, so we can be highly confident that it is safe to split the output and cache as-if we ran each file in a separate process. This would require some engine support though.

Anyway, thoughts and ideas welcome.

0 replies

benjyw · 2022-11-06T09:32:31Z

benjyw
Nov 6, 2022
Maintainer Author Sponsor

PS Running the process on a single file takes ~100ms on my m1

0 replies

kaos · 2022-11-06T17:35:59Z

kaos
Nov 6, 2022
Collaborator

Somewhat unrelated, but from a dependency rules perspective it would be interesting if we could build a generic core API for dep inference that the various backends adopt in order to have one central piece for applying dependency rules to.

It's no small feat to pull off, but I think that it may be on a similar order of complexity as implementing batching and could perhaps be worth considering while doing so.

2 replies

thejcannon Nov 6, 2022
Maintainer

Can you help me understand a bit better what you mean?
How is that different than what we have today?

kaos Nov 7, 2022
Collaborator

Today we have a generic "end user" API for fetching all dependencies (explicitly listed and inferred/injected) but the various backends all have their own implementations for gathering the non explicitly listed deps, so hooking something into that flow will be tedious and repetitive.
Examples for what I'm referring to:

The shell backend uses

pants/src/python/pants/backend/shell/dependency_inference.py

Lines 178 to 183 in 7de8661

    
           detected_imports = await Get( 
        
               ParsedShellImports, 
        
               ParseShellImportsRequest( 
        
                   hydrated_sources.snapshot.digest, hydrated_sources.snapshot.files[0] 
        
               ), 
        
           )

The Python backend have several for various parts, but perhaps the main one is

pants/src/python/pants/backend/python/dependency_inference/rules.py

Lines 375 to 385 in 7de8661

    
           parsed_dependencies = await Get( 
        
               ParsedPythonDependencies, 
        
               ParsePythonDependenciesRequest( 
        
                   request.field_set.source, 
        
                   interpreter_constraints, 
        
                   string_imports=python_infer_subsystem.string_imports, 
        
                   string_imports_min_dots=python_infer_subsystem.string_imports_min_dots, 
        
                   assets=python_infer_subsystem.assets, 
        
                   assets_min_slashes=python_infer_subsystem.assets_min_slashes, 
        
               ), 
        
           )

Etc...

Now naturally all these are very specific and doesn't share much between the backends, so the "generic core" api I'm suggesting would just be another layer to the dependency inference API, so that there's a way to filter/augment the set of dependencies being picked up in a generic way for all backends regardless of backend flavor. Maybe there's a better approach to this, that's also part of the reason I'm raising this question here, now.

thejcannon · 2022-11-06T19:42:09Z

thejcannon
Nov 6, 2022
Maintainer

So, we could treat each iteration on the queued direct deps as a batch. However those batches may be a lot smaller than we'd like.

I have this 95% complete in a stale branch.
I'll push it to my fork very soon and maybe find the time to finish it (but not merge with main) so we have some real world metrics

3 replies

benjyw Nov 7, 2022
Maintainer Author Sponsor

What does this do? It batches the requests of the direct deps of a single source file while iteratively building the transitive targets?

thejcannon Nov 7, 2022
Maintainer

Bingo

thejcannon Nov 7, 2022
Maintainer

Logically just replace the MultiGet for each target queued with a single Get for all the targets as a batch.

thejcannon · 2022-11-06T19:48:37Z

thejcannon
Nov 6, 2022
Maintainer

(as a fun aside) the parser is token/ast based. And there are Rust-implementer Python parsers. It would be a fun little speedup to port the dep parser to Rust 😌

And by porting to rust, I mean standalone (so not embedded in the engine)

0 replies

benjyw · 2022-11-07T15:08:47Z

benjyw
Nov 7, 2022
Maintainer Author Sponsor

That's a great start, but it may not be going far enough. We could profitably parse hundreds of files in a single process.

What I'm thinking of is - when generating transitive deps - to preemptively seed the cache with the parse results for all files mentioned in the cmd-line specs. We know we will definitely need them, and is likely that they will cover a substantial portion of the transitive deps as well (in the common case of :: it is certain)

4 replies

thejcannon Nov 7, 2022
Maintainer

I think once the plumbing is installed, we can build whatever we want on it 😄

benjyw Nov 7, 2022
Maintainer Author Sponsor

The missing piece of plumbing, though, is the ability to split the output and cache per-process.

kaos Nov 7, 2022
Collaborator

(Is this "thread" a comment to another comment, rather than a comment of its own? It got me a bit confused...)
Edit: notice the different hints in the input boxes.. "Write a reply" vs "Write a comment" :) (this nuance makes for a bad UX btw)

thejcannon Nov 7, 2022
Maintainer

to preemptively seed the cache with the parse results for all files mentioned in the cmd-line specs.

Ah yeah I missed that. Although I think you're starting to touch on a completely different subject (synthetic cache entries). I suspect batched dep inference (the title of the discussion) would still be an incremental "win".

stuhood · 2022-11-07T17:36:48Z

stuhood
Nov 7, 2022
Maintainer Sponsor

So it seems reasonably clear that batching the dependency parsing is a big perf win.

I think that it's important to differentiate the use cases a little bit.

For completely cold builds, with no cache: yes, batching would help. But we would always recommend against completely cold builds: users should be using caches (either local or remote).

For completely warm builds, the time to access the cache per file ends up dominating the runtime (i.e.: doing 1557 cache lookups). Assuming that you actually implemented split cache lookups for your batch processes, that would still be the dominating factor. If you didn't implement split cache lookups, you would make fewer cache lookups for larger batches, but you would be much more likely to miss the cache and actually need to run inference.

While microbenchmarks are useful, I think that to motivate this change, seeing profiles of the use cases that would be affected would be good. Because we have a few different known bottlenecks, and I suspect that this is not the longest pole.

9 replies

stuhood Nov 8, 2022
Maintainer Sponsor

Right: but if you don't do that, then you don't have caching for anything and your resolve and tests will definitely be the long pole in CI.

benjyw Dec 2, 2022
Maintainer Author Sponsor

Not necessarily - if you're using --changed-since you may run few tests, but you still need full dep inference to figure that out.

benjyw Dec 2, 2022
Maintainer Author Sponsor

You could spend 10 minutes on dep inference, to discover that you only need to run a handful of tests for a few seconds.

ryanking Dec 16, 2022

Don't you all have data you can analyze to figure out which would be net faster?

benjyw Dec 16, 2022
Maintainer Author Sponsor

There might be ways to do dynamic stuff, yeah.

thejcannon · 2022-11-07T20:56:30Z

thejcannon
Nov 7, 2022
Maintainer

OK @benjyw https://github.com/thejcannon/pants/tree/batchinfer now has the PoC changes:

New types BatchedDependenciesRequest/BatchedInferDependenciesRequest which batches the dep request
- This is opt-in, so not updated everywhere
- Updated peek though
- BatchedInferDependenciesRequest is a union, so we don't have to update dep inference implementors
Updated Python dep inference to use a batch of files
Peek updated to batch by some size, and request batched deps (size is hardcoded, play around with it)
Commented out dep validation because I'm lazy
Python should but doesnt partition based on ICs/resolve (commented out some BUILD stuff to make this work)

18 replies

benjyw Dec 15, 2022
Maintainer Author Sponsor

Yeah, I'm not against. But we'd have to be careful about what the cache key would be. Off the top of my head, it would need to be at least: the SHA of the file content plus the SHA of the inference script plus the identity of the python interpreter we ran the inference script on.

EDIT: And the platform and env vars the inference script ran in.

benjyw Dec 15, 2022
Maintainer Author Sponsor

And that is ~the inputs to the single Process...

benjyw Dec 15, 2022
Maintainer Author Sponsor

Basically, modeling the split as a Process at least keeps us honest about what should go into that cache key.

thejcannon Dec 15, 2022
Maintainer

Yeah, I wonder if we implement this reusing basically the process type and only change 2 things:

the cached entries (somehow) split the filenames from argv so the process key looks like it was called with one arg
if we're reusing the process cache location, any piece of information included in the key to ensure we don't overlap with an actual identical process (dummy env car or something equivalent)

And then call a spade a spade and give it a good name.

It's not so far as to try and be generic and merge digests/etc..., but still provides the safety and flexibility.

thejcannon Dec 15, 2022
Maintainer

Oh and fwiw we'd still need to plumb the entirety of batched dependency requests, which is what my branch does. Implementors would then choose vanilla process or backdoor thing

batching dep inference #17477

benjyw Nov 6, 2022 Maintainer Sponsor

Replies: 8 comments · 36 replies

benjyw Nov 6, 2022 Maintainer Author Sponsor

benjyw Nov 6, 2022 Maintainer Author Sponsor

kaos Nov 6, 2022 Collaborator

thejcannon Nov 6, 2022 Maintainer

kaos Nov 7, 2022 Collaborator

thejcannon Nov 6, 2022 Maintainer

benjyw Nov 7, 2022 Maintainer Author Sponsor

thejcannon Nov 7, 2022 Maintainer

thejcannon Nov 7, 2022 Maintainer

thejcannon Nov 6, 2022 Maintainer

benjyw Nov 7, 2022 Maintainer Author Sponsor

thejcannon Nov 7, 2022 Maintainer

benjyw Nov 7, 2022 Maintainer Author Sponsor

kaos Nov 7, 2022 Collaborator

thejcannon Nov 7, 2022 Maintainer

stuhood Nov 7, 2022 Maintainer Sponsor

stuhood Nov 8, 2022 Maintainer Sponsor

benjyw Dec 2, 2022 Maintainer Author Sponsor

benjyw Dec 2, 2022 Maintainer Author Sponsor

ryanking Dec 16, 2022

benjyw Dec 16, 2022 Maintainer Author Sponsor

thejcannon Nov 7, 2022 Maintainer

benjyw Dec 15, 2022 Maintainer Author Sponsor

benjyw Dec 15, 2022 Maintainer Author Sponsor

benjyw Dec 15, 2022 Maintainer Author Sponsor

thejcannon Dec 15, 2022 Maintainer

thejcannon Dec 15, 2022 Maintainer

benjyw
Nov 6, 2022
Maintainer Sponsor

Replies: 8 comments 36 replies

benjyw
Nov 6, 2022
Maintainer Author Sponsor

benjyw
Nov 6, 2022
Maintainer Author Sponsor

kaos
Nov 6, 2022
Collaborator

thejcannon Nov 6, 2022
Maintainer

kaos Nov 7, 2022
Collaborator

thejcannon
Nov 6, 2022
Maintainer

benjyw Nov 7, 2022
Maintainer Author Sponsor

thejcannon Nov 7, 2022
Maintainer

thejcannon Nov 7, 2022
Maintainer

thejcannon
Nov 6, 2022
Maintainer

benjyw
Nov 7, 2022
Maintainer Author Sponsor

thejcannon Nov 7, 2022
Maintainer

benjyw Nov 7, 2022
Maintainer Author Sponsor

kaos Nov 7, 2022
Collaborator

thejcannon Nov 7, 2022
Maintainer

stuhood
Nov 7, 2022
Maintainer Sponsor

stuhood Nov 8, 2022
Maintainer Sponsor

benjyw Dec 2, 2022
Maintainer Author Sponsor

benjyw Dec 2, 2022
Maintainer Author Sponsor

benjyw Dec 16, 2022
Maintainer Author Sponsor

thejcannon
Nov 7, 2022
Maintainer

benjyw Dec 15, 2022
Maintainer Author Sponsor

benjyw Dec 15, 2022
Maintainer Author Sponsor

benjyw Dec 15, 2022
Maintainer Author Sponsor

thejcannon Dec 15, 2022
Maintainer

thejcannon Dec 15, 2022
Maintainer