Forbid slow `in` inside comprehensions #864

orsinium · 2019-10-09T07:28:23Z

Rule request

Thesis

len with comprehensions

Iterators have no len, and sometimes I forgetting it.

Bad:

len(1 for el in a if el in b)

Better:

len([1 for el in a if el in b])

`in` inside comprehensions

Good:

sum(1 for el in a if el in b)

Twice slower, but also ok:

sum(el in b for el in a)

Reasoning

Detect runtime TypeError in advance. We could also detect len from yield-like iterators, but resolving symbols in python always is a non-trivial thing, unfortunately.

The text was updated successfully, but these errors were encountered:

sobolevn · 2019-10-09T08:31:14Z

Good idea!

But, we don't assume types as the best practice. mypy catches this case perfectly:

# ex.py
len(1 for el in a if el in b)

Output:

ex.py:1: error: Argument 1 to "len" has incompatible type "Generator[int, None, None]"; expected "Sized"

But, sum is the whole new case. That's about performance and the best practice.
That's something I want to have.

We can forbid to use x in y inside value part of the comprehension. And force people to write 1 or True and if x in y.

sobolevn · 2019-10-09T08:33:54Z

We need to measure the speed for different types with timeit
And make a decision based on the results.

orsinium · 2019-10-09T09:09:00Z

From worst to best:

from random import randint 
elements = list(randint(-100000, 100000) for _ in range(1000000))  

%timeit sum(a > 0 for a in elements) 
# 130 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(True for a in elements if a > 0) 
# 92.1 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
s = 0
for a in elements:
  if a > 0:
    s += 1
# 73.1 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(1 for a in elements if a > 0)  
# 71.8 ms ± 720 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit len([1 for a in elements if a > 0])  
# 62.4 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

orsinium · 2019-10-09T09:12:07Z

The last one has a big std deviation. So, I'm not sure is it really the best or some optimization for repetitive list creation.

ManishAradwad · 2019-10-09T09:25:40Z

Would like to take this! Can u plz assign this to me

ManishAradwad · 2019-10-13T08:25:46Z

From worst to best:

from random import randint 
elements = list(randint(-100000, 100000) for _ in range(1000000))  

%timeit sum(a > 0 for a in elements) 
# 130 ms ± 3.15 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(True for a in elements if a > 0) 
# 92.1 ms ± 2.62 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
s = 0
for a in elements:
  if a > 0:
    s += 1
# 73.1 ms ± 714 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit sum(1 for a in elements if a > 0)  
# 71.8 ms ± 720 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit len([1 for a in elements if a > 0])  
# 62.4 ms ± 9.78 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

So, which one should be considered as a best practice??

sobolevn · 2019-10-13T09:18:47Z

sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) win!

It means that it is better to use sum(1 for a in elements if a in b) then sum(a in b for a in elements). And this rule is perfectly valid.

ManishAradwad · 2019-10-19T10:56:05Z

So, I'm done with the local setup and ready to implement the changes. But, I'm quite confused about how should I approach the issue...
I suppose I should create a new visitor and violation, or is there another simpler way to do this??
Any help is appreciated, Thanks!

sobolevn · 2019-10-19T11:07:03Z

@ManishAradwad you need to create a new violation in best_practices, then we can create a new visitor here: https://github.com/wemake-services/wemake-python-styleguide/blob/master/wemake_python_styleguide/visitors/ast/loops.py

We need to visit:

ast.ListComp
ast.SetComp
ast.DictComp
ast.GeneratorExp

You can add a new method in the visitor: self._check_slow_in_expression(node)

Here's how our bad node ((a > 0 for a in elements)) looks like (just one example):

GeneratorExp(elt=Compare(left=Name(id='a', ctx=Load(), lineno=1, col_offset=4), ops=[Gt()], comparators=[Num(n=0, lineno=1, col_offset=8)], lineno=1, col_offset=4), generators=[comprehension(target=Name(id='a', ctx=Store(), lineno=1, col_offset=14), iter=Name(id='elements', ctx=Load(), lineno=1, col_offset=19), ifs=[], is_async=0)], lineno=1, col_offset=4)

Then you write the required logic, test it, and submit a PR. I am here to help.

sobolevn · 2019-10-19T14:53:31Z

@ManishAradwad sorry for misleading you. This is a refactoring violation.

ManishAradwad · 2019-10-20T01:46:03Z

sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) win!

We are using both sum(1 for a in elements if a in b) and len([1 for a in elements if a > 0]) to find the length of the list, right??

Then what do u mean by

It means that it is better to use sum(1 for a in elements if a in b) then sum(a in b for a in elements). And this rule is perfectly valid.

ManishAradwad · 2019-10-20T02:24:25Z

Also are you saying that I should add the function self._check_slow_in_expression(node) in the Wrong ComprehensionVisitor??

sobolevn · 2019-10-20T07:33:36Z

We don't check sum or len function. We check a comprehension inside them or inside any other python code: 1 for a in elements if a in b. And yes, we check it inside WrongComprehensionVisitor

sobolevn · 2019-11-04T10:30:26Z

Hi @ManishAradwad, how's it going? Do you need any help?

sobolevn · 2019-11-15T20:00:57Z

This should be joined with #1008

Dreamsorcerer · 2021-02-08T00:22:48Z

We don't check sum or len function.

I think some clarification on exactly what is forbidden is needed here. I've just read through this twice, and I'm still not clear.

Are we forbidding slow sum/len calls (in which case we do need to check the functions), or something in all comprehensions?

Because some of the bad examples don't make sense to forbid outside of a len/sum call. e.g. [a > 0 for a in elements] might be used as a sequence of True/False values, there's no reason to forbid this unless you know only the True values are actually used (e.g. in sum()).

orsinium added the rule request Adding a new rule label Oct 9, 2019

sobolevn changed the title ~~Forbid len from lazy comprehension~~ Forbid slow in inside comprehensions Oct 9, 2019

sobolevn added this to the Version 0.13 milestone Oct 9, 2019

sobolevn added Hacktoberfest Hactoberfest fun! help wanted Extra attention is needed level:starter Good for newcomers labels Oct 9, 2019

orsinium assigned ManishAradwad Oct 9, 2019

ManishAradwad mentioned this issue Oct 20, 2019

Issue#864: Fixes slow in comprehensions #928

Closed

4 tasks

sobolevn unassigned ManishAradwad Nov 12, 2019

sobolevn modified the milestones: Version 0.13, Version 0.15 Nov 15, 2019

ManishAradwad mentioned this issue Dec 16, 2019

Issue#864 #1074

Closed

4 tasks

sobolevn mentioned this issue Apr 3, 2020

Enforce using .setdefault() #1085

Open

sobolevn modified the milestones: Version 0.16, Version 0.15 aka New runtime Oct 20, 2020

sobolevn modified the milestones: Version 0.15 aka Python3.9, Version 0.16 Feb 6, 2021

sobolevn modified the milestones: Version 0.16, Version 0.17.0 aka Python3.10 Dec 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forbid slow `in` inside comprehensions #864

Forbid slow `in` inside comprehensions #864

orsinium commented Oct 9, 2019 •

edited by sobolevn

Loading

sobolevn commented Oct 9, 2019

sobolevn commented Oct 9, 2019

orsinium commented Oct 9, 2019 •

edited

Loading

orsinium commented Oct 9, 2019

ManishAradwad commented Oct 9, 2019

ManishAradwad commented Oct 13, 2019

sobolevn commented Oct 13, 2019 •

edited

Loading

ManishAradwad commented Oct 19, 2019

sobolevn commented Oct 19, 2019

sobolevn commented Oct 19, 2019

ManishAradwad commented Oct 20, 2019

ManishAradwad commented Oct 20, 2019

sobolevn commented Oct 20, 2019 •

edited

Loading

sobolevn commented Nov 4, 2019

sobolevn commented Nov 15, 2019

Dreamsorcerer commented Feb 8, 2021 •

edited

Loading

Forbid slow in inside comprehensions #864

Forbid slow in inside comprehensions #864

Comments

orsinium commented Oct 9, 2019 • edited by sobolevn Loading

Rule request

Thesis

len with comprehensions

in inside comprehensions

Reasoning

sobolevn commented Oct 9, 2019

sobolevn commented Oct 9, 2019

orsinium commented Oct 9, 2019 • edited Loading

orsinium commented Oct 9, 2019

ManishAradwad commented Oct 9, 2019

ManishAradwad commented Oct 13, 2019

sobolevn commented Oct 13, 2019 • edited Loading

ManishAradwad commented Oct 19, 2019

sobolevn commented Oct 19, 2019

sobolevn commented Oct 19, 2019

ManishAradwad commented Oct 20, 2019

ManishAradwad commented Oct 20, 2019

sobolevn commented Oct 20, 2019 • edited Loading

sobolevn commented Nov 4, 2019

sobolevn commented Nov 15, 2019

Dreamsorcerer commented Feb 8, 2021 • edited Loading

Forbid slow `in` inside comprehensions #864

Forbid slow `in` inside comprehensions #864

orsinium commented Oct 9, 2019 •

edited by sobolevn

Loading

`in` inside comprehensions

orsinium commented Oct 9, 2019 •

edited

Loading

sobolevn commented Oct 13, 2019 •

edited

Loading

sobolevn commented Oct 20, 2019 •

edited

Loading

Dreamsorcerer commented Feb 8, 2021 •

edited

Loading