Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement ForVariableUsedAfterBlock (based on wemake_python_styleguide ControlVarUsedAfterBlockViolation WPS441) #11769

Open
wants to merge 38 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
c1091a1
Rough start
MadLittleMods May 28, 2024
261fbc3
Rough run-able
MadLittleMods May 28, 2024
8bed302
Lookup by reference
MadLittleMods May 28, 2024
e85fe87
Add todo
MadLittleMods May 28, 2024
eecc2af
Checkpoint working with Olivier
MadLittleMods May 29, 2024
3661066
Work in all scopes
MadLittleMods Jun 5, 2024
59a2e3b
Clean up code
MadLittleMods Jun 5, 2024
0e491b3
Adjust some wording
MadLittleMods Jun 5, 2024
9ca72a6
Move to wemake_python_styleguide ruleset
MadLittleMods Jun 5, 2024
0ad0df0
Commit test snapshot
MadLittleMods Jun 5, 2024
eaf2a35
Fix shadowed cases
MadLittleMods Jun 6, 2024
710d68c
Clean up debug
MadLittleMods Jun 6, 2024
a7de1cb
Fix references before
MadLittleMods Jun 6, 2024
f2a62bb
Adjust wording
MadLittleMods Jun 6, 2024
97b165c
Update snapshot
MadLittleMods Jun 6, 2024
7ceb306
Move to Ruff (RUF) ruleset
MadLittleMods Jul 30, 2024
acc5b2a
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Jul 30, 2024
f5aa633
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Jul 30, 2024
026a48f
Use preview group
MadLittleMods Aug 2, 2024
544bbf5
Separate out `in` expression
MadLittleMods Aug 2, 2024
2d13a62
Better comment to explain why source-order
MadLittleMods Aug 2, 2024
1d18c66
No need to `collect()`
MadLittleMods Aug 2, 2024
7edd539
Rename to `ForVariableUsedAfterBlock`
MadLittleMods Aug 2, 2024
5351952
Remove whitespace change
MadLittleMods Aug 2, 2024
19a0ab9
Nit: Use labeled loops
MadLittleMods Aug 2, 2024
9e2ece4
No need for `BlockKind` anymore
MadLittleMods Aug 2, 2024
d675a03
Use `statement_id(...)` inside `statement(...)`
MadLittleMods Aug 2, 2024
71ca5e1
Fix lint warnings about unused imports
MadLittleMods Aug 2, 2024
8a11073
Add nested function/class example
MadLittleMods Aug 2, 2024
e59efa2
Remove `known_good_reference_node_ids` and iterating bindings in reverse
MadLittleMods Aug 9, 2024
5a76166
Get `binding_statement` in smarter way that avoids `unwrap()`
MadLittleMods Aug 9, 2024
106e78a
Better document why when `binding.statement(...)` can be `None`
MadLittleMods Aug 9, 2024
8b3e4de
Test augmented assignment
MadLittleMods Aug 9, 2024
59427f1
Reword
MadLittleMods Aug 9, 2024
63ecbfc
Explain more `unwrap()`
MadLittleMods Aug 9, 2024
27dad4f
Update snapshot
MadLittleMods Aug 9, 2024
e7653df
Merge branch 'main' into madlittlemods/control-var-used-after-block
MadLittleMods Aug 9, 2024
9bcc4fe
Fix snapshot after bumping rule number
MadLittleMods Aug 9, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
for global_var in []:
_ = global_var
pass

# ❌
_ = global_var

def foo():
# For control var used outside block
for event in []:
_ = event
pass

# ❌
_ = event

for x in []:
_ = x
pass

# Using the same control variable name in a different loop is ok
for x in []:
_ = x
pass

for y in []:
# ❌ x is used outside of the loop it was defined in (meant to use y)
if x == 5:
pass

# Assign a variable before the loop
room_id = 3
_ = room_id
# Use the same variable name in a loop
for room_id in []:
_ = room_id
pass

# ❌ After the loop is not ok because the value is probably not what you expect
_ = room_id

# ❌ Augmented assignment is not allowed because the value is probably not what you expect
room_id += 1

# Assigning again after the loop is ok
room_id = 5
room_id += 1
_ = room_id
Comment on lines +31 to +48
Copy link
Author

@MadLittleMods MadLittleMods Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Variable declared outside the loop

But it may also be intentional that we catch this as part of this rule. What do you think?

I think it's intentional that we catch the indico example as part of the rule. We have an example in the test fixture file for this.

The other example from pandas could be addressed with some fancy control flow analysis (see other discussion)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I think there are two different reasons why you want to avoid using a loop variable after the loop has finished:

  1. The first reason is that the loop might be empty, which means that the variable is never assigned:
    iterable = []
    for x in iterable:
        pass
    print(x)  # NameError
    Catching these feels like a "correctness" issue, that we should hopefully be able to do without very many false positives.
  2. The second is cases where we can be sure that the variable is defined, but we still forbid this pattern, perhaps on grounds of readability/style (it might not be obvious to the reader that the variable is mutated; there might be clearer ways of writing the code), or because there's a chance that the author of the code might not be familiar with Python's scoping rules, and might not have intended to reassign the variable by using the loop:
    x = 42
    for x in range(5):
         pass
    print(x)  # oops, now it's 4?? The author of the code expected it to still be 42
    Sometimes code like this can indicate a bug, but it also might be working exactly as the author of the code intended; there's a lot of code out there that deliberately mutates a variable using a loop like this.

The conclusion I draw from this is that we in fact need two rules here:

  1. One rule to catch instances where a loop variable is used after the loop and was not defined before the loop. This is almost always bad practice, as the loop variable might not be defined.
  2. One more opinionated rule that catches instances where the loop variable is definitely defined, but the mutation of the variable via the loop might not be desired. This might be a bug and might not be, depending on how experienced the developer is with Python's semantics.

For now I think I'd implement the first rule; we can consider the second rule as a followup.


Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use in combination with try..except

It seems to be a somewhat common pattern to use the for loop variable outside the for-block when combined with try..except

The trio example really lends itself well to the convenience that the non-block scoping provides. I think this is probably best resolved by adding a # noqa: RUF032 comment for the lint rule to explicitly mark the case as fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple usages in the same statement

It would be great if we only flagged each variable once (at least per statement?)

Seems like a decent improvement. Do we do this sort of deduplication for other rules? For the violation Diagnostic, do we combine the TextRange to cover the whole area?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete statements

We may explicitly want to allow usages in delete statements (example)

Seems harmless to allow 👍

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usages in del statements are just as problematic as other usages, since they also result in NameErrors if the variable is undefined:

>>> iterable = []
>>> for item in iterable:
...     pass
... 
>>> del item
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'item' is not defined. Did you mean: 'iter'?

Copy link
Author

@MadLittleMods MadLittleMods Aug 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Control flow

This here seems like a clear false positive but detecting it would be hard because it requires some form of control flow analysis

The pandas example makes sense to allow if we can detect the context via control flow analysis. The other pandas example makes sense for this as well since it has a return statement to guard it every leaking below.

But I think it's good that we catch bokeh examples as sketchy. I don't see how we could use control flow analysis to see that they're fine?


I'm assuming we don't want to actually dive into implementing the control flow analysis in this first iteration?

# Tuple destructuring
for a, b, c in []:
_ = a
_ = b
_ = c
pass

# ❌
_ = a
_ = b
_ = c

# Array destructuring
for [d, e, f] in []:
_ = d
_ = e
_ = f
pass

# ❌
_ = d
_ = e
_ = f

# Nested function and class definitions are fine
for potential_power in []:
def action():
print(potential_power)

class Animal:
power_level = potential_power

Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ pub(crate) fn deferred_scopes(checker: &mut Checker) {
Rule::AsyncioDanglingTask,
Rule::BadStaticmethodArgument,
Rule::BuiltinAttributeShadowing,
Rule::ForVariableUsedAfterBlock,
Rule::GlobalVariableNotAssigned,
Rule::ImportPrivateName,
Rule::ImportShadowedByLoopVar,
Expand Down Expand Up @@ -86,6 +87,10 @@ pub(crate) fn deferred_scopes(checker: &mut Checker) {
for scope_id in checker.analyze.scopes.iter().rev().copied() {
let scope = &checker.semantic.scopes[scope_id];

if checker.enabled(Rule::ForVariableUsedAfterBlock) {
ruff::rules::for_variable_used_after_block(checker, scope, &mut diagnostics);
}

if checker.enabled(Rule::UndefinedLocal) {
pyflakes::rules::undefined_local(checker, scope_id, scope, &mut diagnostics);
}
Expand Down
1 change: 1 addition & 0 deletions crates/ruff_linter/src/codes.rs
Original file line number Diff line number Diff line change
Expand Up @@ -958,6 +958,7 @@ pub fn code_to_rule(linter: Linter, code: &str) -> Option<(RuleGroup, Rule)> {
(Ruff, "029") => (RuleGroup::Preview, rules::ruff::rules::UnusedAsync),
(Ruff, "030") => (RuleGroup::Preview, rules::ruff::rules::AssertWithPrintMessage),
(Ruff, "031") => (RuleGroup::Preview, rules::ruff::rules::IncorrectlyParenthesizedTupleInSubscript),
(Ruff, "032") => (RuleGroup::Preview, rules::ruff::rules::ForVariableUsedAfterBlock),
(Ruff, "100") => (RuleGroup::Stable, rules::ruff::rules::UnusedNOQA),
(Ruff, "101") => (RuleGroup::Preview, rules::ruff::rules::RedirectedNOQA),

Expand Down
4 changes: 4 additions & 0 deletions crates/ruff_linter/src/rules/ruff/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,10 @@ mod tests {

#[test_case(Rule::AsyncioDanglingTask, Path::new("RUF006.py"))]
#[test_case(Rule::CollectionLiteralConcatenation, Path::new("RUF005.py"))]
#[test_case(
Rule::ForVariableUsedAfterBlock,
Path::new("for_variable_used_after_block.py")
)]
#[test_case(Rule::ExplicitFStringTypeConversion, Path::new("RUF010.py"))]
#[test_case(Rule::FunctionCallInDataclassDefaultArgument, Path::new("RUF009.py"))]
#[test_case(Rule::ImplicitOptional, Path::new("RUF013_0.py"))]
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
use ruff_diagnostics::{Diagnostic, Violation};
use ruff_macros::{derive_message_formats, violation};
use ruff_python_semantic::Scope;
use ruff_text_size::Ranged;

use crate::checkers::ast::Checker;

/// ## What it does
/// Checks for variables defined in `for` loops that are used outside of their
/// respective blocks.
///
/// ## Why is this bad?
/// Usage of of a control variable outside of the block they're defined in will probably
/// lead to flawed logic in a way that will likely cause bugs. The variable might not
/// contain what you expect.
///
/// In Python, unlike many other languages, `for` loops don't define their own scopes.
/// Therefore, usage of the control variables outside of the block will be the the value
/// from the last iteration until re-assigned.
///
/// While this mistake is easy to spot in small examples, it can be hidden in larger
/// blocks of code, where the the loop and downstream usage may not be visible at the
/// same time.
///
/// ## Example
/// ```python
/// for x in range(10):
/// pass
///
/// print(x) # prints 9
/// ```
#[violation]
pub struct ForVariableUsedAfterBlock {
control_var_name: String,
}

impl Violation for ForVariableUsedAfterBlock {
#[derive_message_formats]
fn message(&self) -> String {
let ForVariableUsedAfterBlock { control_var_name } = self;

format!("`for` loop variable {control_var_name} is used outside of block")
}
}

/// Based on wemake-python-styleguide (WPS441) to forbid control variables after the block body.
pub(crate) fn for_variable_used_after_block(
checker: &Checker,
scope: &Scope,
diagnostics: &mut Vec<Diagnostic>,
) {
// Find for-loop variable bindings
let loop_var_bindings = scope
.all_bindings()
.map(|(name, binding_id)| (name, checker.semantic().binding(binding_id)))
.filter_map(|(name, binding)| binding.kind.is_loop_var().then_some((name, binding)));

for (name, binding) in loop_var_bindings {
// This is safe to unwrap because we're only dealing with loop var bindings and
// this can only be `None` for built-ins.
let binding_source_node_id = binding.source.unwrap();
// The node_id of the for-loop that contains the binding
let binding_statement_id = checker.semantic().statement_id(binding_source_node_id);
let binding_statement = checker.semantic().statement(binding_source_node_id);

// Loop over the references of those bindings to see if they're in the same block-scope
'references: for reference in binding.references() {
let reference = checker.semantic().reference(reference);
// This is safe to unwrap because a loop var binding can't be a built-in so
// any reference to the loop var won't be a built-in where this could be `None`.
let reference_node_id = reference.expression_id().unwrap();
MadLittleMods marked this conversation as resolved.
Show resolved Hide resolved

// Skip any reference that come before the control var binding in the source
// order, skip it because people can assign and use the same variable name
// above the block.
if reference.range().end() < binding_statement.range().start() {
continue;
}

// Traverse the hierarchy and look for a block match
let statement_hierarchy = checker.semantic().parent_statement_ids(reference_node_id);
for ancestor_node_id in statement_hierarchy {
if binding_statement_id == ancestor_node_id {
continue 'references;
}
}

// The reference wasn't used in the same block, report a violation/diagnostic
diagnostics.push(Diagnostic::new(
ForVariableUsedAfterBlock {
control_var_name: name.to_owned(),
},
reference.range(),
));
}
}
}
2 changes: 2 additions & 0 deletions crates/ruff_linter/src/rules/ruff/rules/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ pub(crate) use asyncio_dangling_task::*;
pub(crate) use collection_literal_concatenation::*;
pub(crate) use default_factory_kwarg::*;
pub(crate) use explicit_f_string_type_conversion::*;
pub(crate) use for_variable_used_after_block::*;
pub(crate) use function_call_in_dataclass_default::*;
pub(crate) use implicit_optional::*;
pub(crate) use incorrectly_parenthesized_tuple_in_subscript::*;
Expand Down Expand Up @@ -39,6 +40,7 @@ mod collection_literal_concatenation;
mod confusables;
mod default_factory_kwarg;
mod explicit_f_string_type_conversion;
mod for_variable_used_after_block;
mod function_call_in_dataclass_default;
mod helpers;
mod implicit_optional;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
---
source: crates/ruff_linter/src/rules/ruff/mod.rs
---
for_variable_used_after_block.py:6:5: RUF032 `for` loop variable global_var is used outside of block
|
5 | # ❌
6 | _ = global_var
| ^^^^^^^^^^ RUF032
7 |
8 | def foo():
|

for_variable_used_after_block.py:15:9: RUF032 `for` loop variable event is used outside of block
|
14 | # ❌
15 | _ = event
| ^^^^^ RUF032
16 |
17 | for x in []:
|

for_variable_used_after_block.py:28:12: RUF032 `for` loop variable x is used outside of block
|
26 | for y in []:
27 | # ❌ x is used outside of the loop it was defined in (meant to use y)
28 | if x == 5:
| ^ RUF032
29 | pass
|

for_variable_used_after_block.py:40:9: RUF032 `for` loop variable room_id is used outside of block
|
39 | # ❌ After the loop is not ok because the value is probably not what you expect
40 | _ = room_id
| ^^^^^^^ RUF032
41 |
42 | # ❌ Augmented assignment is not allowed because the value is probably not what you expect
|

for_variable_used_after_block.py:43:5: RUF032 `for` loop variable room_id is used outside of block
|
42 | # ❌ Augmented assignment is not allowed because the value is probably not what you expect
43 | room_id += 1
| ^^^^^^^ RUF032
44 |
45 | # Assigning again after the loop is ok
|

for_variable_used_after_block.py:58:9: RUF032 `for` loop variable a is used outside of block
|
57 | # ❌
58 | _ = a
| ^ RUF032
59 | _ = b
60 | _ = c
|

for_variable_used_after_block.py:59:9: RUF032 `for` loop variable b is used outside of block
|
57 | # ❌
58 | _ = a
59 | _ = b
| ^ RUF032
60 | _ = c
|

for_variable_used_after_block.py:60:9: RUF032 `for` loop variable c is used outside of block
|
58 | _ = a
59 | _ = b
60 | _ = c
| ^ RUF032
61 |
62 | # Array destructuring
|

for_variable_used_after_block.py:70:9: RUF032 `for` loop variable d is used outside of block
|
69 | # ❌
70 | _ = d
| ^ RUF032
71 | _ = e
72 | _ = f
|

for_variable_used_after_block.py:71:9: RUF032 `for` loop variable e is used outside of block
|
69 | # ❌
70 | _ = d
71 | _ = e
| ^ RUF032
72 | _ = f
|

for_variable_used_after_block.py:72:9: RUF032 `for` loop variable f is used outside of block
|
70 | _ = d
71 | _ = e
72 | _ = f
| ^ RUF032
73 |
74 | # Nested function and class definitions are fine
|
6 changes: 4 additions & 2 deletions crates/ruff_python_semantic/src/binding.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ pub struct Binding<'a> {
pub scope: ScopeId,
/// The context in which the [`Binding`] was created.
pub context: ExecutionContext,
/// The statement in which the [`Binding`] was defined.
/// The statement in which the [`Binding`] was defined. `None` if the [`Binding`]
/// comes from a built-in.
Comment on lines +28 to +29
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please double-check the accuracy of these comments. In any case, I'd like to document the cases so we don't have to run through this uncertainty in the future.

pub source: Option<NodeId>,
/// The references to the [`Binding`].
pub references: Vec<ResolvedReferenceId>,
Expand Down Expand Up @@ -232,7 +233,8 @@ impl<'a> Binding<'a> {
locator.slice(self.range)
}

/// Returns the statement in which the binding was defined.
/// Returns the statement in which the binding was defined. `None` if the binding
/// comes from a built-in.
pub fn statement<'b>(&self, semantic: &SemanticModel<'b>) -> Option<&'b Stmt> {
self.source
.map(|statement_id| semantic.statement(statement_id))
Expand Down
17 changes: 16 additions & 1 deletion crates/ruff_python_semantic/src/model.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1201,9 +1201,17 @@ impl<'a> SemanticModel<'a> {
/// Return the [`Stmt`] corresponding to the given [`NodeId`].
#[inline]
pub fn statement(&self, node_id: NodeId) -> &'a Stmt {
self.nodes[self.statement_id(node_id)]
.as_statement()
.expect("No statement found")
}

/// Return the statement [`NodeId`] corresponding to the given [`NodeId`].
#[inline]
pub fn statement_id(&self, node_id: NodeId) -> NodeId {
self.nodes
.ancestor_ids(node_id)
.find_map(|id| self.nodes[id].as_statement())
.find(|id| self.nodes[*id].is_statement())
.expect("No statement found")
}

Expand Down Expand Up @@ -1232,6 +1240,13 @@ impl<'a> SemanticModel<'a> {
.nth(1)
}

/// Given a [`NodeId`], return the [`NodeId`] of the parent statement, if any.
pub fn parent_statement_ids(&self, node_id: NodeId) -> impl Iterator<Item = NodeId> + '_ {
self.nodes
.ancestor_ids(node_id)
.filter(|id| self.nodes[*id].is_statement())
}

/// Return the [`Expr`] corresponding to the given [`NodeId`].
#[inline]
pub fn expression(&self, node_id: NodeId) -> Option<&'a Expr> {
Expand Down
1 change: 1 addition & 0 deletions ruff.schema.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading