-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Codon slower than CPython in task with pattern matching #624
Comments
Hi Rik, thanks for the report. Haven't tried your code yet but just checking if you used the |
No, I didn't know about using the -release flag. I just rebuilt my script
with
codon build -release report.py
and the results are the same (near enough)
***@***.***:~/Reports/GR$ time report > t2
real 0m6.599s
user 0m8.175s
sys 0m1.650s
My assumption is that the 50% increase in time has to do with pattern
matching, although your docs do mention possible issues with IO. The web
log file is a large one at 3.2 GB and seven million lines.
Rik
…On Wed, Feb 5, 2025 at 9:41 AM A. R. Shajii ***@***.***> wrote:
Hi Rik, thanks for the report. Haven't tried your code yet but just
checking if you used the -release flag when compiling (e.g. codon build
-release report.py)? That flag is required to enable optimizations.
—
Reply to this email directly, view it on GitHub
<#624 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADFU6NSX6SK2UOGWWMXSJID2OI5J5AVCNFSM6AAAAABWPIGU7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZXGQ2TGMJXGA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Here is a benchmark using a sample file: import re
import time
# with open('e', 'w') as w:
# for i in range(1000000):
# w.write("""192.168.1.1 - - [05/Feb/2025:10:15:32] "/login user=admin H"
# 192.168.1.3 - - [05/Feb/2025:10:17:12] "/home page=dashboard H"
# 192.168.1.1 - - [05/Feb/2025:10:18:07] "/login user=guest H"
# 192.168.1.4 - - [05/Feb/2025:10:19:55] "/login user=admin destination=somewhere H"
# 192.168.1.2 - - [05/Feb/2025:10:20:22] "/login user=root H"
# """)
def main():
with open('e') as file:
d = {}
# total number of hits; Codon [int]
total = 0
regexString = re.compile(r'/login.* H')
for line in file:
m = regexString.findall(line)
if len(m) > 0: # make sure there was a match
if re.search(r'destination=', m[0]):
continue # skip lines containing destination=
y = re.split(r' ', m[0])
# add 1 to each dict entry, or init as 1
d[y[0]] = d.get(y[0], 0) + 1
total = total + 1
# sorted_count = sorted(d.items(), key=lambda kv: -kv[1])
# for item in sorted_count:
# print(item[0], item[1])
# print("Total number of hits: ", total)
if __name__ == '__main__':
t = time.time()
main()
print(time.time() - t) Result:
|
Interesting. With the sort and print commented out, not a lot more time
spent. But still slower on my system:
$ python3 t.py # where t.py is the program included in the email
5.171091318130493
$ codon run -release t.py
6.33556
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 140
model name : 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
stepping : 1
microcode : 0xb8
cpu MHz : 1247.248
cache size : 8192 KB
...
I'm running a fully patched Debian. Just weird that the one test I tried is
slower for me than it is for you. What I did like is that compilation just
worked. When I played with codon in 2023, I had lots of trouble getting my
little scripts to compile.
I'll try some of your synthetic examples from the blog post next.
Rik
…On Wed, Feb 5, 2025 at 3:14 PM Elis Byberi ***@***.***> wrote:
Here is a benchmark using a sample file:
import reimport time
# with open('e', 'w') as w:# for i in range(1000000):# w.write("""192.168.1.1 - - [05/Feb/2025:10:15:32] "/login user=admin H"# 192.168.1.3 - - [05/Feb/2025:10:17:12] "/home page=dashboard H"# 192.168.1.1 - - [05/Feb/2025:10:18:07] "/login user=guest H"# 192.168.1.4 - - [05/Feb/2025:10:19:55] "/login user=admin destination=somewhere H"# 192.168.1.2 - - [05/Feb/2025:10:20:22] "/login user=root H"# """)
def main():
with open('e') as file:
d = {}
# total number of hits; Codon [int]
total = 0
regexString = re.compile(r'/login.* H')
for line in file:
m = regexString.findall(line)
if len(m) > 0: # make sure there was a match
if re.search(r'destination=', m[0]):
continue # skip lines containing destination=
y = re.split(r' ', m[0])
# add 1 to each dict entry, or init as 1
d[y[0]] = d.get(y[0], 0) + 1
total = total + 1
# sorted_count = sorted(d.items(), key=lambda kv: -kv[1])
# for item in sorted_count:
# print(item[0], item[1])
# print("Total number of hits: ", total)
if __name__ == '__main__':
t = time.time()
main()
print(time.time() - t)
Result:
10.817 # Codon 0.18.0
12.429415702819824 # Python 3.10
—
Reply to this email directly, view it on GitHub
<#624 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADFU6NUUZE24CP2BJDK2QRD2OKEM3AVCNFSM6AAAAABWPIGU7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE3DAMRSGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
I tried the first two examples in the blog post, forgetting to include
-release. And they both run slower than python. I thought perhaps having
the very large (500_000_000) arrays might be a problem in the approximate
pi script, so I divided both x and y by ten and tried again. Now codon is
even slower than python.
I was about to give up, but decided to at least let you know what's
happening when I try the examples in your blog post. In one of your emails,
you mention needing to include -release when running or building using
codon. When I tried that, the codon version of calculating pi does run ten
times faster.
Your blog post never mentions using -release. And why is that not the
default? Imagine what happens when other people try what I just did and
discover crappy performance when using codon, compared to python3?
As an editor, I constantly have to point out to authors that they have left
out critical information because that information is so familiar to the
authors that they unconsciously assume everybody knows about it. I think
you are doing that with -release.
Rik
…On Wed, Feb 5, 2025 at 4:01 PM Rik Farrow ***@***.***> wrote:
Interesting. With the sort and print commented out, not a lot more time
spent. But still slower on my system:
$ python3 t.py # where t.py is the program included in the email
5.171091318130493
$ codon run -release t.py
6.33556
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 140
model name : 11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz
stepping : 1
microcode : 0xb8
cpu MHz : 1247.248
cache size : 8192 KB
...
I'm running a fully patched Debian. Just weird that the one test I tried
is slower for me than it is for you. What I did like is that compilation
just worked. When I played with codon in 2023, I had lots of trouble
getting my little scripts to compile.
I'll try some of your synthetic examples from the blog post next.
Rik
On Wed, Feb 5, 2025 at 3:14 PM Elis Byberi ***@***.***>
wrote:
> Here is a benchmark using a sample file:
>
> import reimport time
> # with open('e', 'w') as w:# for i in range(1000000):# w.write("""192.168.1.1 - - [05/Feb/2025:10:15:32] "/login user=admin H"# 192.168.1.3 - - [05/Feb/2025:10:17:12] "/home page=dashboard H"# 192.168.1.1 - - [05/Feb/2025:10:18:07] "/login user=guest H"# 192.168.1.4 - - [05/Feb/2025:10:19:55] "/login user=admin destination=somewhere H"# 192.168.1.2 - - [05/Feb/2025:10:20:22] "/login user=root H"# """)
>
> def main():
> with open('e') as file:
> d = {}
> # total number of hits; Codon [int]
> total = 0
> regexString = re.compile(r'/login.* H')
>
> for line in file:
> m = regexString.findall(line)
> if len(m) > 0: # make sure there was a match
> if re.search(r'destination=', m[0]):
> continue # skip lines containing destination=
>
> y = re.split(r' ', m[0])
> # add 1 to each dict entry, or init as 1
> d[y[0]] = d.get(y[0], 0) + 1
> total = total + 1
>
> # sorted_count = sorted(d.items(), key=lambda kv: -kv[1])
> # for item in sorted_count:
> # print(item[0], item[1])
>
> # print("Total number of hits: ", total)
>
> if __name__ == '__main__':
> t = time.time()
> main()
> print(time.time() - t)
>
> Result:
>
> 10.817 # Codon 0.18.0
> 12.429415702819824 # Python 3.10
>
> —
> Reply to this email directly, view it on GitHub
> <#624 (comment)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/ADFU6NUUZE24CP2BJDK2QRD2OKEM3AVCNFSM6AAAAABWPIGU7CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMZYGE3DAMRSGM>
> .
> You are receiving this because you authored the thread.Message ID:
> ***@***.***>
>
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Codon is slower than CPython in a task with pattern matching
rik@nuke:~/Reports/GR$ time report > t2
real 0m6.443s
user 0m8.682s
sys 0m1.122s
rik@nuke:~/Reports/GR$ time report.py > t1
real 0m4.947s
user 0m4.419s
sys 0m0.510s
rik@nuke:~/Reports/GR$ wc -l weblogs
7282338 weblogs
report is "codon build report.py";
rik@nuke:~/Reports/GR$cat ~/bin/report.py
#!/usr/bin/python3
import re
with open('e') as file:
d = {}
# total number of hits; Codon [int]
total = 0
regexString = re.compile('/login.* H')
for line in file:
# Codon disliked: m = re.findall(regexString,line)
m = regexString.findall(line)
if len(m) > 0: # make sure there was a match
if re.search('destination=', m[0]):
continue # skip lines containing destination=
y = re.split(' ',m[0])
# add 1 to each dict entry, or init as 1
d[y[0]] = d.get(y[0], 0) + 1
# I took this from Karpathy on makemore, and the sort
# lambda where making -kv means a reverse order sort
# https://www.youtube.com/watch?v=PaCmpygFfXo 12:36
total = total+1
sorted_count=sorted(d.items(), key = lambda kv: -kv[1])
for item in sorted_count:
print(item[0], item[1])
print("Total number of hits: ", total)
The text was updated successfully, but these errors were encountered: