Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Token system to avoid cross-module query dependencies #16275

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Feb 20, 2025

Summary

This PR introduces a token based system to reduce accidental cross-module query dependencies.

The basic idea is that we create a custom accessor for tracked struct fields that return an AstNodeRef and require an extra query_file: File argument.
The query_file is the file of the enclosing query and we compare it against the tracked struct's scope to ensure it is the same.

The basic idea here is: You should probably not access the node if don't know in which context your query runs or wrap your code in an extra query.

Getting the file should generally be easy. E.g. TypeInferenceBuilder has a file method.

This system doesn't prevent abuse. It's normally easy to get the correct file. E.g. you can call definition.scope(db).file(db) to get the file
but let's not do that ;).

The main downside of this approach is that it's sort of annoying. But I take that for extra peace of mind.

This should mitigate/fix #15949

Alternative approach

I considered changing AstNodeRef::node to take a File instead. But that quickly became annoying because we'd lose the Deref and Debug would also need a workaround. It also isn't AstNodeRef::node that's the problem, it's accessing the field on the tracked struct. That's why I opted for the custom-accessor on the tracked struct approach.

The ideal solution is to solve this problem by restructuring our modules and enforce it with visibility constraints but that's a bit more work.

@MichaReiser MichaReiser added internal An internal refactor or improvement red-knot Multi-file analysis & type inference labels Feb 20, 2025
@MichaReiser MichaReiser force-pushed the micha/prove-system-to-avoid-cross-module-query-dependencies branch from b054d43 to 31380ea Compare February 20, 2025 13:53
@MichaReiser MichaReiser marked this pull request as ready for review February 20, 2025 13:53
@MichaReiser MichaReiser force-pushed the micha/prove-system-to-avoid-cross-module-query-dependencies branch from 31380ea to 978af92 Compare February 20, 2025 14:00
Comment on lines +57 to +58
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
/// It acts as a token of proof that we aren't accessing an AST node from a different file
/// to the one from which the current enclosing Salsa query is being called.
/// Doing so would lead to cross-file dependencies, hurting incremental computation.

Comment on lines +57 to +58
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
/// It acts as a token of proof that we aren't accessing an AST node from a different file
/// to the one from which the current enclosing Salsa query is being called.
/// Doing so would lead to cross-file dependencies, hurting incremental computation.

Comment on lines +208 to +210
/// `query_file` is the file for which the current query performs type inference.
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// `query_file` is the file for which the current query performs type inference.
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
/// `query_file` is the file for which the current query performs type inference.
/// It acts as a token of proof that we aren't accessing an AST node from a different file
/// to the one from which the current enclosing Salsa query is being called.
/// Doing so would lead to cross-file dependencies, hurting incremental computation.

Comment on lines +68 to +69
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
/// It acts as a token of proof that we aren't accessing an AST node from a different file
/// to the one from which the current enclosing Salsa query is being called.
/// Doing so would lead to cross-file dependencies, hurting incremental computation.

Comment on lines +57 to +58
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// It acts as a token of prove that we aren't accessing an AST node from a different file
/// than in which the current enclosing Salsa query (which would lead to cross-file dependencies).
/// It acts as a token of proof that we aren't accessing an AST node from a different file
/// to the one from which the current enclosing Salsa query is being called.
/// Doing so would lead to cross-file dependencies, hurting incremental computation.

@MichaReiser MichaReiser force-pushed the micha/prove-system-to-avoid-cross-module-query-dependencies branch 2 times, most recently from 49e9d81 to d888cf0 Compare February 20, 2025 17:11
@MichaReiser
Copy link
Member Author

MichaReiser commented Feb 20, 2025

One caveat. This doesn't protect us fully from cross-module dependencies. For example, a function can still call semantic_index (or ast_ids) and introduce a cross-module dependency that way. But I think it helps a little?

Base automatically changed from micha/untrack-symbol-by-id to main February 20, 2025 17:46
@MichaReiser MichaReiser changed the title [red-knot] Token system to avoid cross-module query depndencies [red-knot] Token system to avoid cross-module query dependencies Feb 20, 2025
@MichaReiser MichaReiser force-pushed the micha/prove-system-to-avoid-cross-module-query-dependencies branch from d888cf0 to 82812e0 Compare February 20, 2025 17:50
@carljm
Copy link
Contributor

carljm commented Feb 20, 2025

I have mixed feelings about this. It feels like it's not protecting the boundary at the layer where we need to protect it in order to get the results we actually want (as mentioned above about semantic_index and ast_ids), and it introduces a fair amount of "noise" in order to provide protection that isn't quite at the right layer (which means I would still want to explore other solutions to #15949).

Exactly where the "right" layer is is still an open question: it could be set at "anything below Type", which would be I think pretty conceptually clean but might result in too many queries needed? But it seems clear that it ought to be set "above" semantic index at the very least.

If you feel this alone would have avoided a significant number of the errors we've made in this area in the past, I'm not opposed to merging it for now and considering other approaches later.

@MichaReiser
Copy link
Member Author

If you feel this alone would have avoided a significant number of the errors we've made in this area in the past, I'm not opposed to merging it for now and considering other approaches later.

I do think it would have caught the cross-module use in Class::implicit_instance_attribute and in visibility constraint's evaluate.

But I agree that it's not as good as I first thought it would be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement red-knot Multi-file analysis & type inference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[red-knot] make it easier to avoid accidental cross-module direct AST usage
3 participants